Taylor University
Computing and System Sciences
Bill Toll - Spring 2003
COS 382 - Language Structures
PHP XML Tutorial

The purpose of this tutorial is to provide basic information and examples to process XML documents using PHP.

PHP includes an XML parser that will verify that the document is well formed, but will not validate a document.

An XML document is well formed and syntactically correct if

An XML document is valid if the structure and order of the tags meets a definition appropriate for the application. Such definitions can be expressed either using a DTD (document type definition) or a Schema. In these examples, we will use the PHP XML parser and will assume that the document is valid.

The PHP XML parser is based on the expat parser. This parser follows the SAX (Simple API for XML) model. It does not build a tree data structure as a DOM parser, but creates an abstract tree through the sequence of function calls. The functions that are called by the parser respond to start tags, end tags, or data between a start tag and an end tag. Each of these classes generates a function call to a "registered" function in a call-back, event-driven processing paradigm.

The following example and data files are available:
schedule.php - PHP code to generate course schedule
schedule.xml - XML data file
showxml.php - from PHP manual - prints structure of XML file - uses schedule.xml
xml2html.php - from PHP manual - simple tag converter
html.xml - simple file for xml2html
execute xml2html.php

XML Parser Syntax

Syntax and structure for PHP code to parse XML is available on the PHP manual pages. Example code with comments is presented below.

Course Schedule Example

The course schedule used for this course is an XML document with the following format:

  <schedule>
    <head>
      COS 382 - Language Structures 
    </head>
    <entry>                         // may be repeated
      <month>
        Feb
      </month>                  
      <date>
        3
      </date>
      <day>
        M
      </day>
      <topic>                       // optional
        Overview and History
        <note>                      // optional - bold text
          Important
        </note>
      </topic> 
      <reference>                   // optional
        Chapter 1
      </reference>                       
      <assign>                      // optional
        First Assignment
        <due>                       // optional - special format
          Assignment 0
        </due>
      </assign>
    </entry>
  </schedule>

The PHP code used to process a file with the above format and build a table containing the course schedule is:


<?

// define handler functions

function startElement($parser,$name,$attrs='') {
    global $open_tags,$current_tag,$last_tag;
    $last_tag=$current_tag;
    $current_tag=$name;
    if($format=$open_tags[$name]) {
      switch($name) {
        case 'head' :
	  echo "<html><head><title>Course Schedule</title>\n";
	  echo "<link rel=\"stylesheet\" href=\"../style.css\" \n";
	  echo "type=\"text/css\"></head><body>\n";
	  break;
	case 'entry' :
	  echo "<tr>"; 
	  break;
	case 'month' :
	  echo "<td width=\"5%\">";
	  break;
	case 'date' :
	  if($last_tag!='month')
	    echo "<td width=\"5%\"> </td>";
	  echo "<td class=\"r5\">";
	  break;
	case 'day' :
	  echo "<td class=\"c5\">";
	  break;
	case 'topic' :
	  echo "<td class=\"l40\">";
	  break;
	case 'reference' :
	  if($last_tag=='day')
	    echo "<td class=\"l40\"> </td>";
	  echo "<td width=\"25%\">";
	  break;
	case 'assign' :
	  if($last_tag=='day')
	    echo "<td class=\"l40\"> </td><td width=\"25%\"> </td>";
	  if($last_tag=='topic')
	    echo "<td width=\"25%\"> </td>";
	  echo "<td width=\"20%\">";
	  break;
	case 'note' :   // only allowed inside <topic></topic> with no topic text
	  echo "<b>";
	  break;
	case 'due' :     // only allowed inside <assign></assign>
	  echo "<b>";
	  break;
	case 'due2' :    // due with assign on same date
	  echo "<br><b>";
	  break;
	default:
	  break;
      }
    }
    else {
      echo "ERROR - invalid tag $name<br>\n";
    }
    
}

function endElement($parser,$name,$attrs='') {
    global $close_tags,$current_tag,$last_tag;
    if($format=$close_tags[$name]) {
      switch($name) {
        case 'schedule' :
	  echo "</table></body></html>\n";
	  break;
	case 'head' :
	  echo "<br>Tentative Schedule<br>Spring 2003</p>";
	  echo "<p class=\"clabel\">**** indicates modification</p>\n";
	  echo "<table border=\"1\" summary=\"Schedule Table\">\n";
	  break;
        case 'entry' :
	  switch($current_tag) {
	    case 'day' :
	      echo "<td> </td>";
	    case 'topic' :
	      echo "<td> </td>";
	    case 'reference' :
	      echo "<td> </td>";
	      break;
	    default:
	      break;
	  }
	  echo "</tr>\n";
	  break;
	case 'month' :
	  echo " </td>\n";
	  break;
	case 'date' :
	  echo "</td>\n";
	  break;
	case 'day' :
	  echo "</td>";
	  break;
	case 'topic' :
	  echo " </td>";
	  break;
	case 'reference' :
	  echo " </td>";
	  break;
	case 'assign' :
	  echo " </td>";
	  break;
  	case 'note' :
	  echo "</b>";
  	  break;
  	case 'due' :
	  echo " DUE</b></td>";
  	  break;
	case 'due2' :
	  echo " DUE</b></td>";
	  break;
	default:
	  break;
      }
    }
    else {
      echo "ERROR - invalid tag $name<br>\n";
    }
}

function characterData($parser,$chardata) {
    global $current_tag,$headflag;
    if($current_tag=='head' && $headflag==0) {
      echo "<p class=\"bclabel\">";
      $headflag=1;
    }
    // strip leading and trailing whitespace
    if($chardata!="\n" && $chardata!=" " && $chardata!="\t") {
      echo $chardata;
    }
}



$open_tags=array (
    "schedule" => "<schedule>",
    "entry" => "<entry>",
    "head" => "<head>",
    "month" => "<month>",
    "date" => "<date>",
    "day" => "<day>",
    "topic" => "<topic>",
    "reference" => "<reference>",
    "assign" => "<assign>",
    "due" => "<due>",
    "due2" => "<due2>",
    "note" => "<note>"
    );

$close_tags=array (
    "schedule" => "</schedule>",
    "entry" => "</entry>",
    "head" => "</head>",
    "month" => "</month>",
    "date" => "</date>",
    "day" => "</day>",
    "topic" => "</topic>",
    "reference" => "</reference>",
    "assign" => "</assign>",
    "due" => "</due>",
    "due2" => "</due2>",
    "note" => "</note>"
    );

$current_tag="";

$xml_parser=xml_parser_create();

xml_set_element_handler($xml_parser,'startElement','endElement');
xml_set_character_data_handler($xml_parser,'characterData');
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING,false);
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE,true);


$fp=fopen("schedule.xml","r");

$headflag=0;

while($data=fread($fp,80)) {

  if(!(xml_parse($xml_parser,$data,feof($fp)))) {
    echo "ERROR - parser failure: ";
    echo xml_error_string(xml_get_error_code($xml_parser));
    echo " at line number ".xml_get_current_line_number($xml_parser)."<br>\n";
    exit();
  }
}

xml_parser_free($xml_parser);


exit();
?>

Partial output of the above:

Syntax Errors

If the input file has a syntax error, such as:

<schedule>
<head>
COS 382 - Language Structures
</head>
<entry>
<month>
Feb
<date>
</month>
4
</date>

The output of the parser will indicate an error:

ERROR - parser failure: mismatched tag at line number 9

Note that this parser does not generate an error for the following XML since the error is not a syntax error in XML, but is an error in the language defined for a schedule.

<schedule>
<head>
COS 382 - Language Structures
</head>
<entry>
<month>
Feb
<date>
4
</date>
</month>

Additional Examples - from the PHP HTML documentation

Print XML Structure

The following PHP program will print the tag structure of an XML file by printing start tags indented according to nesting structure.


<html>
<body>
<pre>

<?

$file = "schedule.xml";
$depth = array();

function startElement($parser, $name, $attrs) {
    global $depth;
    for ($i = 0; $i < $depth[$parser]; $i++) {
        print "  ";
    }
    print "$name\n";
    $depth[$parser]++;
}

function endElement($parser, $name) {
    global $depth;
    $depth[$parser]--;
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
    die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die(sprintf("XML error: %s at line %d",
                    xml_error_string(xml_get_error_code($xml_parser)),
                    xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
       
?>

</pre>
</body>
</html>

Partial output of above:


SCHEDULE
  HEAD
  ENTRY
    MONTH
    DATE
    DAY
    TOPIC
    REFERENCE
  ENTRY
    DATE
    DAY
  ENTRY
    DATE
    DAY
    TOPIC
    ASSIGN

XML to HTML

The following PHP program translates a simple set of tags.


<html>

<?

$file = "html.xml";
$map_array = array(
    "BOLD"     => "b",
    "EMPHASIS" => "i",
    "CENT" => "center",
    "SECTION" => "p",
    "DOCUMENT" => "body",
    "LITERAL"  => "tt"
);

function startElement($parser, $name, $attrs) {
    global $map_array;
    //print "START TAG: $name <br>";
    if ($htmltag = $map_array[$name]) {
        print "<$htmltag>";
    }
}

function endElement($parser, $name) {
    global $map_array;
    //print "END TAG: $name <br>";
    if ($htmltag = $map_array[$name]) {
        print "</$htmltag>";
    }
}

function characterData($parser, $data) {
    print $data;
}

$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
    die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die(sprintf("XML error: %s at line %d",
                    xml_error_string(xml_get_error_code($xml_parser)),
                    xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
?>    
</html>
which converts
<DOCUMENT>
<CENT>Sample Page</CENT>
This is a sample XML document using <BOLD>Strange</BOLD> tags.
<SECTION>
This is a new section
</SECTION>
</DOCUMENT>
to
<html>
<body>
<center>Sample Page</center>
This is a sample XML document using <b>Strange</b> tags.
<p>
This is a new section
</p>
</body> </html>