boodebr.org
MYOML: First Steps Towards Your Own Markup Language
I vaguely remember writing my first webpage, and thinking that HTML was the greatest invention ever. That enthusiam wore off by about the middle of the second page. I've never liked writing raw HTML because it forces you to spend a large amount of time on presentation, which takes time away from writing the content itself.

With the advent of Cascading Style Sheets (CSS), the presentation can be more easily separated from the content, but one more step is needed to boil it down to a true content-only format.

This series of articles will show you how to create your own "mini" markup language, using standards and tools like XML, XSLT, CSS and (optionally) a little PHP. If those four acronyms mean nothing to you, don't worry - everything you need to know will be covered here.

In this first article, I'm just going to give an overview of some of the terms and technologies. The real work will start next time.

By the way, I don't claim to be an expert in any of these areas, in fact I'm writing these articles as I learn. I welcome any comments or suggestions if I'm overlooking an easier and/or better way to do something.
1. The Ultimate Markup Language
As I started getting serious about doing some writing here, the process of churning out handcoded HTML (even with CSS) quickly became a chore. I kept thinking to myself about what the "ultimate" markup language for writing articles would look like. It became apparent that the style of articles I was writing required only a few basic types of blocks:

  1. Normal text.
  2. Blocks of code, program output, or other examples in a fixed font.
  3. "Warning" sections to highlight important topics.

The more I thought along these lines, I realized that what I wanted to do was to be able to write articles like this:
<article title="A Nice Article about Things">

    <text>
        Here is a normal section of text.
    </text>

    <code>
    # This is a code section
    import os
    print os.name
    </code>

    <text>
    And here is some more normal text.
    </text>
</article>
So I started exploring how to do that ...
2. All the XML you need to know.
Much of what follows will require some knowledge of XML. In case you don't know XML, here is all you need to know to understand these articles:
  • Tags (like <text>) are containers that can hold text or other tags.
  • You open (start) a container like this: <article>.
  • You close (end) a container like this: <article>.
  • A tag can have attributes, like this: <article title="A Title">
That was easy, huh?
3. Converting XML -> XML
Inventing your own "mini" XML markup language is great way to waste an afternoon, but to make it useful, it needs to be readable by everyone else. To do that, you have to be able to convert from "your" XML format into a common format, say HTML. Here is a minimal example for demonstration purposes:
XML Input
<article title="A Nice Article About Things">
    <text>
        Here is some text.
    </text>
</article>

Here is one possible way to turn that into HTML:
HTML Output
<html>
    <body>
        <h2>A Nice Article About Things</h2>
        <hr>
        Here is some text.
    </body>
</html>
You may wonder "what's the big deal?", but remember this is just a minimal example. The important things to note at this point are:
  1. The XML file contains only content and information on how the content itself is structured (ordering of sections, etc.)
  2. There are no definitions of how that content is to be displayed to the reader.
  3. Most importantly, there are no limitations on how the content can be presented. It can be correctly rendered on any number of devices, and in any number of ways, without having to modify the content itself.
The last point is of particular importance: If you were to write dozens of articles in hardcoded HTML, it would be difficult to change them all to a new color scheme, or site presentation. If, on the other hand, you write your articles in a content-only XML format, you could change the look of all your articles by changing one or two files, instead of editing each and every content file.
4. Introducing XSLT
How would you perform the above conversion, taking an article in your "mini" XML language and turning it into HTML? The essential question is: How do you transform one set of XML tags and content to another set of XML tags and content?

One way to do that would be to write your own XML parser, and do the appropriate conversion of the tags and content. There are various pros and cons to this method, but the big con I want to focus on here is that this method requires that you know how to write code.

XSLT provides a way to perform XML->XML conversion without having to write your own XML parser, making it more accessible for non-programmers. For programmers, XSLT saves you from having to write yet another special-purpose XML parser (think of XSLT as somewhat like lex and yacc of the XML world - why write your own if you don't have to?). Ironically, the XSLT reference at the W3C is (in my opinion) the most technically complicated of the other major references in this area (HTML, CSS, XML), which I think would tend to scare away a lot of people that otherwise could make good use of it. Of course, the W3C documents are specifications, not manuals, but sometimes they aren't the easiest source to learn from.

One of my motivations in writing these articles is to help others get started by showing easy ways to do basic things, and slowly adding complexity over time.
5. How the pieces fit together.
The diagram below shows how all of the pieces fit together, from the the author writing their content, to a reader viewing the final result in their browser:
Verbosely:
  1. The author writes their content as an XML file, according to whatever format they've chosen (i.e. the "mini" markup language).
  2. The XML is converted to HTML via the XSLT file. This can happen in a variety of ways, as noted below.
  3. The user's browser takes the HTML ("content") and CSS file ("style") and combines them to create the visual presentation of the content.
  4. The final styled content is presented to the user.

The XML-to-HTML conversion process (via XSLT) is interesting, because there are several ways this can be accomplished:
  1. The author can run a program like xsltproc to perform the conversion on their own machine.
  2. A script (e.g. PHP) on the webserver can perform the conversion.
  3. The user's browser can take the XML directly and do the conversion itself (on the user's machine). (Note: Not all browsers support this!)

There are pros and cons to each method, but for now, just be aware that you have several options.

Remember that XSLT allows you convert any XML format to any other XML format. I'm focusing on HTML for these articles, but that's just one example of how you can use XSLT.
Conclusion
You've now seen an overview of the tools that will be used in coming installments to create your own "mini" markup language.

All of the articles at boodebr.org are written using a "mini" XML markup format. I'll be recreating that format step-by-step in the coming articles. Of course, you are welcome to reuse my format, but by the end of this series, you should be able to create your own that does what you want it to do.