boodebr.org
MYOML: Validation and Conclusion
The custom markup language has been designed to have a certain structure. That is, there is supposed to be one <article> tag, followed by a <section> container, followed by a <text> container. The CSS stylesheet has been designed with this in mind. However, nothing actually enforces this structure. There is nothing stopping you from writing an article like this:
A badly structured article
<article>
    <warn>
        <text>
            <section>
                <text>
                    <section>
                    ...
If you ran this through the XSLT processor, you would not get an error, and after applying the CSS stylesheet, the user's browser would display something. However, there are two problems with allowing this to happen:
  1. Both the XSLT and CSS stylesheets were designed with a certain nesting order in mind, and allowing a different structure risks having a document that doesn't look quite right when displayed.
  2. Not having a consistent structure makes it hard or even impossible to write an XML query tool for indexing articles, etc., since there is no fixed structure to parse.
For these reasons, it is desirable to have a tool automatically check the XML document for correct structure while processing. This eliminates the chance of having a badly formatted document slip through unnoticed. Although you could attempt to perform validation within XSLT itself (by being more strict in matching nodes), XSLT isn't really meant for that.

The correct way to do it is to create a Document Type Definition (DTD) that an XML-checking tool can use to validate the document structure. There are several good tutorials on writing a DTD so instead of repeating that work, I'll just present the article DTD:
article.dtd
<?xml version="1.0" encoding="UTF-8"?>

<!--
    DTD for checking validity of articles.
    
    For background see:
        http://boodebr.org/series/myoml
-->

<!-- allow these simple tags to nest arbitrarily -->
<!ENTITY % COMMON_SUBTAGS "p|i|b|tt|c|a">

<!-- An article has one or more sections -->
<!ELEMENT article (section+)>

<!-- Required metadata attrs -->
<!ATTLIST article title CDATA #REQUIRED>
<!ATTLIST article series CDATA #REQUIRED>
<!ATTLIST article series-url CDATA #REQUIRED>
<!ATTLIST article footer-text CDATA #REQUIRED>
<!ATTLIST article series-url-desc CDATA #REQUIRED>

<!-- section has one inner text tag -->
<!ELEMENT section (text)>

<!-- optional section title -->
<!ATTLIST section title CDATA "">

<!-- <text> can have any other tags inside of it -->
<!ELEMENT text (#PCDATA|code|note|warn|ul|ol|li|tr|th|td|img|table|%COMMON_SUBTAGS;)*>

<!-- code is a block of text with a title -->
<!ELEMENT code (#PCDATA)>
<!ATTLIST code title CDATA "">

<!-- <c><![CDATA[ holds text only, no attributes -->
<!ELEMENT c (#PCDATA)>

<!-- <note> can have most other tags plus code blocks -->
<!ELEMENT note (#PCDATA|code|%COMMON_SUBTAGS;)*>
<!ATTLIST note title CDATA "">

<!-- <warn> can have most other tags plus code blocks -->
<!ELEMENT warn (#PCDATA|code|%COMMON_SUBTAGS;)*>
<!ATTLIST warn title CDATA "">

<!-- HTML-like tags, can nest arbitrarily with similar tags -->
<!ELEMENT p (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT i (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT b (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT tt (#PCDATA|%COMMON_SUBTAGS;)*>

<!ELEMENT a (#PCDATA|%COMMON_SUBTAGS;)*>
<!ATTLIST a href CDATA #REQUIRED>

<!ELEMENT img EMPTY>
<!ATTLIST img src CDATA #REQUIRED>
<!ATTLIST img title CDATA "">

<!-- lists -->
<!ELEMENT ul (li*)>
<!ELEMENT ol (li*)>
<!ELEMENT li (#PCDATA|%COMMON_SUBTAGS;)*>

<!-- tables -->
<!ELEMENT table (tr)*>
<!ATTLIST table title CDATA "">

<!ELEMENT tr (td)*>
<!ELEMENT td (#PCDATA|%COMMON_SUBTAGS;)*>
In order to perform automatic validity checking, I'm going to modify the XSLT-serving PHP script a little bit. I'm first going to add a function that does the validity checking:
XML validity checking
/* Validate the given XML file. 
   Returns 0 if file is valid, < 0 if not. */
function xml_validate($xml_file) {
    
    $pipe = popen("xmllint --valid $xml_file 2>&1","r");
    $buf = "";
    while(!feof($pipe)) {
        $buf .= fread($pipe, 1024)."\n";
    }
    $rval = pclose($pipe);
    if ($rval != 0) {
        echo "*** XML validity error(s) ***<hr>";
        echo "<pre>";
        echo $buf;
        echo "</pre>";
        return -1;
    }
    else
        return 0;
}
I could not find a built-in PHP function to perform validity checking, so instead I'm calling the xmllint utility (part of the libxml2 library). Note that this assumes you are running on a POSIX-based system that can redirect stderr to stdout ("2>&1"). There may be a better cross-platform way to do this in PHP, but I couldn't find it. libxml2 should be installed on most modern POSIX hosts.

The validity check is performed after the CDATA sections are escaped (i.e. so the source file is now valid XML), and before the XSLT processing is performed:
Calling the validity checker
/* do CDATA escaping on <code> tags to make a valid XML file */
add_cdata_to_code($xml_file, $xml_p_file);

/* validate XML */
if (xml_validate($xml_p_file) < 0) {
    return;
}

/* perform XSLT processing */
$xslt = xslt_create();

...
If an error occurs during XML validity checking, the error messages are sent to the user's browser. xmllint produces much clearer error messages than PHP's XSLT processor, so an additional benefit of XML validation is that it makes it easier to debug your documents.

I made one more change the the PHP code: I'm now passing the name of the XSLT template instead of hardcoding it in the PHP file. This makes the script more generic since it can work with any arbitrary XSLT file now.

I also decided to drop the stylesheet declaration from the XML files. The reason for this is that the raw articles are not valid XML, so the stylesheet was dropped to prevent browsers from trying to process it. After adding the DTD declaration, the standard XML article header becomes:
Simplified article header
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE article SYSTEM "/usr/www/users/boodebr/styles/article.dtd">
In conclusion ...
At this point, I've finished everything I wanted to do in this series. I may add a few notes now and then if I come up with a better way to do something, but for now I'm done.

Here is complete set of files for the series: The final result: Sample Article