The purpose of this tutorial is to provide basic information and examples to process XML documents using PHP.
PHP includes an XML parser that will verify that the document is well formed, but will not validate a document.
An XML document is well formed and syntactically correct if
An XML document is valid if the structure and order of the tags meets a definition appropriate for the application. Such definitions can be expressed either using a DTD (document type definition) or a Schema. In these examples, we will use the PHP XML parser and will assume that the document is valid.
The PHP XML parser is based on the expat parser. This parser follows the SAX (Simple API for XML) model. It does not build a tree data structure as a DOM parser, but creates an abstract tree through the sequence of function calls. The functions that are called by the parser respond to start tags, end tags, or data between a start tag and an end tag. Each of these classes generates a function call to a "registered" function in a call-back, event-driven processing paradigm.
The following example and data files are available:
schedule.php - PHP code to generate course
schedule
schedule.xml - XML data file
showxml.php - from PHP manual - prints structure
of XML file - uses schedule.xml
xml2html.php - from PHP manual - simple tag
converter
html.xml - simple file for xml2html
execute xml2html.php
XML Parser Syntax
Syntax and structure for PHP code to parse XML is available on the PHP manual pages. Example code with comments is presented below.
Course Schedule Example
The course schedule used for this course is an XML document with the following format:
<schedule>
<head>
COS 382 - Language Structures
</head>
<entry> // may be repeated
<month>
Feb
</month>
<date>
3
</date>
<day>
M
</day>
<topic> // optional
Overview and History
<note> // optional - bold text
Important
</note>
</topic>
<reference> // optional
Chapter 1
</reference>
<assign> // optional
First Assignment
<due> // optional - special format
Assignment 0
</due>
</assign>
</entry>
</schedule>
The PHP code used to process a file with the above format and build a table containing the course schedule is:
<?
// define handler functions
function startElement($parser,$name,$attrs='') {
global $open_tags,$current_tag,$last_tag;
$last_tag=$current_tag;
$current_tag=$name;
if($format=$open_tags[$name]) {
switch($name) {
case 'head' :
echo "<html><head><title>Course Schedule</title>\n";
echo "<link rel=\"stylesheet\" href=\"../style.css\" \n";
echo "type=\"text/css\"></head><body>\n";
break;
case 'entry' :
echo "<tr>";
break;
case 'month' :
echo "<td width=\"5%\">";
break;
case 'date' :
if($last_tag!='month')
echo "<td width=\"5%\"> </td>";
echo "<td class=\"r5\">";
break;
case 'day' :
echo "<td class=\"c5\">";
break;
case 'topic' :
echo "<td class=\"l40\">";
break;
case 'reference' :
if($last_tag=='day')
echo "<td class=\"l40\"> </td>";
echo "<td width=\"25%\">";
break;
case 'assign' :
if($last_tag=='day')
echo "<td class=\"l40\"> </td><td width=\"25%\"> </td>";
if($last_tag=='topic')
echo "<td width=\"25%\"> </td>";
echo "<td width=\"20%\">";
break;
case 'note' : // only allowed inside <topic></topic> with no topic text
echo "<b>";
break;
case 'due' : // only allowed inside <assign></assign>
echo "<b>";
break;
case 'due2' : // due with assign on same date
echo "<br><b>";
break;
default:
break;
}
}
else {
echo "ERROR - invalid tag $name<br>\n";
}
}
function endElement($parser,$name,$attrs='') {
global $close_tags,$current_tag,$last_tag;
if($format=$close_tags[$name]) {
switch($name) {
case 'schedule' :
echo "</table></body></html>\n";
break;
case 'head' :
echo "<br>Tentative Schedule<br>Spring 2003</p>";
echo "<p class=\"clabel\">**** indicates modification</p>\n";
echo "<table border=\"1\" summary=\"Schedule Table\">\n";
break;
case 'entry' :
switch($current_tag) {
case 'day' :
echo "<td> </td>";
case 'topic' :
echo "<td> </td>";
case 'reference' :
echo "<td> </td>";
break;
default:
break;
}
echo "</tr>\n";
break;
case 'month' :
echo " </td>\n";
break;
case 'date' :
echo "</td>\n";
break;
case 'day' :
echo "</td>";
break;
case 'topic' :
echo " </td>";
break;
case 'reference' :
echo " </td>";
break;
case 'assign' :
echo " </td>";
break;
case 'note' :
echo "</b>";
break;
case 'due' :
echo " DUE</b></td>";
break;
case 'due2' :
echo " DUE</b></td>";
break;
default:
break;
}
}
else {
echo "ERROR - invalid tag $name<br>\n";
}
}
function characterData($parser,$chardata) {
global $current_tag,$headflag;
if($current_tag=='head' && $headflag==0) {
echo "<p class=\"bclabel\">";
$headflag=1;
}
// strip leading and trailing whitespace
if($chardata!="\n" && $chardata!=" " && $chardata!="\t") {
echo $chardata;
}
}
$open_tags=array (
"schedule" => "<schedule>",
"entry" => "<entry>",
"head" => "<head>",
"month" => "<month>",
"date" => "<date>",
"day" => "<day>",
"topic" => "<topic>",
"reference" => "<reference>",
"assign" => "<assign>",
"due" => "<due>",
"due2" => "<due2>",
"note" => "<note>"
);
$close_tags=array (
"schedule" => "</schedule>",
"entry" => "</entry>",
"head" => "</head>",
"month" => "</month>",
"date" => "</date>",
"day" => "</day>",
"topic" => "</topic>",
"reference" => "</reference>",
"assign" => "</assign>",
"due" => "</due>",
"due2" => "</due2>",
"note" => "</note>"
);
$current_tag="";
$xml_parser=xml_parser_create();
xml_set_element_handler($xml_parser,'startElement','endElement');
xml_set_character_data_handler($xml_parser,'characterData');
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING,false);
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE,true);
$fp=fopen("schedule.xml","r");
$headflag=0;
while($data=fread($fp,80)) {
if(!(xml_parse($xml_parser,$data,feof($fp)))) {
echo "ERROR - parser failure: ";
echo xml_error_string(xml_get_error_code($xml_parser));
echo " at line number ".xml_get_current_line_number($xml_parser)."<br>\n";
exit();
}
}
xml_parser_free($xml_parser);
exit();
?>
Partial output of the above:
Syntax Errors
If the input file has a syntax error, such as:
<schedule> <head> COS 382 - Language Structures </head> <entry> <month> Feb <date> </month> 4 </date>
The output of the parser will indicate an error:
Note that this parser does not generate an error for the following XML since the error is not a syntax error in XML, but is an error in the language defined for a schedule.
<schedule> <head> COS 382 - Language Structures </head> <entry> <month> Feb <date> 4 </date> </month>
Additional Examples - from the PHP HTML documentation
Print XML Structure
The following PHP program will print the tag structure of an XML file by printing start tags indented according to nesting structure.
<html>
<body>
<pre>
<?
$file = "schedule.xml";
$depth = array();
function startElement($parser, $name, $attrs) {
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
print " ";
}
print "$name\n";
$depth[$parser]++;
}
function endElement($parser, $name) {
global $depth;
$depth[$parser]--;
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
</pre>
</body>
</html>
Partial output of above:
SCHEDULE
HEAD
ENTRY
MONTH
DATE
DAY
TOPIC
REFERENCE
ENTRY
DATE
DAY
ENTRY
DATE
DAY
TOPIC
ASSIGN
XML to HTML
The following PHP program translates a simple set of tags.
<html>
<?
$file = "html.xml";
$map_array = array(
"BOLD" => "b",
"EMPHASIS" => "i",
"CENT" => "center",
"SECTION" => "p",
"DOCUMENT" => "body",
"LITERAL" => "tt"
);
function startElement($parser, $name, $attrs) {
global $map_array;
//print "START TAG: $name <br>";
if ($htmltag = $map_array[$name]) {
print "<$htmltag>";
}
}
function endElement($parser, $name) {
global $map_array;
//print "END TAG: $name <br>";
if ($htmltag = $map_array[$name]) {
print "</$htmltag>";
}
}
function characterData($parser, $data) {
print $data;
}
$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
</html>
which converts
<DOCUMENT> <CENT>Sample Page</CENT> This is a sample XML document using <BOLD>Strange</BOLD> tags. <SECTION> This is a new section </SECTION> </DOCUMENT>to
<html> <body> <center>Sample Page</center> This is a sample XML document using <b>Strange</b> tags. <p> This is a new section </p> </body> </html>