A badly structured article
<article>
<warn>
<text>
<section>
<text>
<section>
...
- Both the XSLT and CSS stylesheets were designed with a certain nesting order in mind, and allowing a different structure risks having a document that doesn't look quite right when displayed.
- Not having a consistent structure makes it hard or even impossible to write an XML query tool for indexing articles, etc., since there is no fixed structure to parse.
article.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!--
DTD for checking validity of articles.
For background see:
http://boodebr.org/series/myoml
-->
<!-- allow these simple tags to nest arbitrarily -->
<!ENTITY % COMMON_SUBTAGS "p|i|b|tt|c|a">
<!-- An article has one or more sections -->
<!ELEMENT article (section+)>
<!-- Required metadata attrs -->
<!ATTLIST article title CDATA #REQUIRED>
<!ATTLIST article series CDATA #REQUIRED>
<!ATTLIST article series-url CDATA #REQUIRED>
<!ATTLIST article footer-text CDATA #REQUIRED>
<!ATTLIST article series-url-desc CDATA #REQUIRED>
<!-- section has one inner text tag -->
<!ELEMENT section (text)>
<!-- optional section title -->
<!ATTLIST section title CDATA "">
<!-- <text> can have any other tags inside of it -->
<!ELEMENT text (#PCDATA|code|note|warn|ul|ol|li|tr|th|td|img|table|%COMMON_SUBTAGS;)*>
<!-- code is a block of text with a title -->
<!ELEMENT code (#PCDATA)>
<!ATTLIST code title CDATA "">
<!-- <c><![CDATA[ holds text only, no attributes -->
<!ELEMENT c (#PCDATA)>
<!-- <note> can have most other tags plus code blocks -->
<!ELEMENT note (#PCDATA|code|%COMMON_SUBTAGS;)*>
<!ATTLIST note title CDATA "">
<!-- <warn> can have most other tags plus code blocks -->
<!ELEMENT warn (#PCDATA|code|%COMMON_SUBTAGS;)*>
<!ATTLIST warn title CDATA "">
<!-- HTML-like tags, can nest arbitrarily with similar tags -->
<!ELEMENT p (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT i (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT b (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT tt (#PCDATA|%COMMON_SUBTAGS;)*>
<!ELEMENT a (#PCDATA|%COMMON_SUBTAGS;)*>
<!ATTLIST a href CDATA #REQUIRED>
<!ELEMENT img EMPTY>
<!ATTLIST img src CDATA #REQUIRED>
<!ATTLIST img title CDATA "">
<!-- lists -->
<!ELEMENT ul (li*)>
<!ELEMENT ol (li*)>
<!ELEMENT li (#PCDATA|%COMMON_SUBTAGS;)*>
<!-- tables -->
<!ELEMENT table (tr)*>
<!ATTLIST table title CDATA "">
<!ELEMENT tr (td)*>
<!ELEMENT td (#PCDATA|%COMMON_SUBTAGS;)*>
XML validity checking
/* Validate the given XML file.
Returns 0 if file is valid, < 0 if not. */
function xml_validate($xml_file) {
$pipe = popen("xmllint --valid $xml_file 2>&1","r");
$buf = "";
while(!feof($pipe)) {
$buf .= fread($pipe, 1024)."\n";
}
$rval = pclose($pipe);
if ($rval != 0) {
echo "*** XML validity error(s) ***<hr>";
echo "<pre>";
echo $buf;
echo "</pre>";
return -1;
}
else
return 0;
}
Calling the validity checker
/* do CDATA escaping on <code> tags to make a valid XML file */
add_cdata_to_code($xml_file, $xml_p_file);
/* validate XML */
if (xml_validate($xml_p_file) < 0) {
return;
}
/* perform XSLT processing */
$xslt = xslt_create();
...
Simplified article header
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE article SYSTEM "/usr/www/users/boodebr/styles/article.dtd">