Course Name: Technology in Business, Technical, and Professional Communication

Introduction to XML

In addition to our regular readings and class discussion during the first part of the semester, we will also be reading some short, introductory articles about XML technology as listed on the course calendar.

As you will soon discover, XML is an expansive suite of various technologies that are most often represented as a series of acronyms, explanations of which are shown in the text box in the left column of this page. While it is not necessary (and perhaps impossible) to have a working knowledge of all these technologies, it fairly important to have a conceptual understanding of various XML concepts.

XML is simply a system of tags that are used to semantically define data for a particular use. An example of such tags within an XML file would look something like this:

<address type="home">
   <name>
      <first>Lee</first>
      <last>Honeycutt</last>
   </name>
   <street>3938 Christytown Road</street>
   <city>Story City</city>
   <state>Iowa</state>
   <zip length="9">50248-1234</zip>
</address>

Unlike HTML, which has a prescribed set of tags that authors must use in order for the browser to render the text, authors can just make up any tags they want to define the data and content within their document. Of course, these tags must be defined structurally for use with various applications, which is where schemas come in.

Schemas

In order for various computer programs to make sense of tagged data, each XML file must be supported by a separate descriptive model of all possible tags used in the document. Generally, this description or schema takes one of three forms:

  1. Document Type Definition (*.dtd) - though the most widely supported schema language, DTDs have some technical limitations, such as lack of specificity and not being written in XML.
  2. XML Schema (*.xsd) - much more specific than DTDs and are written in XML, but have some slight technical limitations.
  3. RELAX NG (*.rng) - though used less than DTDs and XML Schemas, RELAX NG has a great deal of promise because of its versatility and simplicity. It is being adopted by many as the new standard.

For the purposes of our class project, we will be dealing with only DTDs, and even then, one that already been constructed for us to express tags for the DocBook authoring system. The newest version of DocBook, however, is being written with four different schema options, include XML Schema and RELAX NG.

Regardless of which type you use, the schema is linked to the primary XML data file via a one-line reference at the very beginning of the file. For example, the newest language for web pages is XHTML, which is actually a form of XML with a specific DTD maintained by the World Wide Web Consortium. The opening lines of any XHTML page must look like this in order for the web browser to parse the code correctly:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

In fact, if you click on the link I've inserted in this DTD, you can download the entire XHTML 1.0 Transitional DTD and open it with a text editor to see what it looks like. You can also view the DTD of other XML implementations at LiveDTD.

In terms of business and technical publishing, the promise of XML is that it allows the separation of content and style and supports a modular form of writing in which content can be repurposed for a variety of individual contexts. XML files contain the data and content, while styling is supported in a separate file written in another form of XML known as XSL (eXtensible Stylesheet Language). This method of publishing—often referred to as single-sourcing—promises an authoring future where information is written once to an XML file and then reused for a variety of different purposes using specifically tailored XSL files.

XML Transformations

If XML were nothing but a bag of tags, it wouldn't be of much use in business and technical communication. However, repurposing is supported by a series of XML technologies known as "transformations," which take the styleless information of a basic XML file and give it form, layout, and styling using files written in XSL. These transformations are generally divided into two different formats:

  • XSL Transformations (XSLT) - a type of transformation generally used to convert content from XML into another form of XML. It is most widely used to convert XML to XHTML, and many current web browsers support such transformations on the fly.
  • XSL-Formating Objects (XSL-FO) - a page-based transformation process that converts XML files into PDF, PostScript, WordML, OpenDocument, and other print formats.

In our course project, we will be experimenting with both types of transformations, though the quality of our transformations may be fairly low as we will be using publicly available transformation files instead of higher quality commercial ones.

Other Uses of XML

The uses of XML are much broader than support of single-source publishing. Few areas of computing today are not touched by XML in some way. Such uses include:

  • RSS web feeds - the live bookmarks supported by various browsers today to make information freely available in standard format.
  • Web searching - XML is at the heart of many search applications used today on the web, including such future projects as the Semantic Web.
  • Metadata support - portable use of descriptive metadata for digital objects such as iTune songs.
  • E-business support - makes electronic data interchange more accessible for general information interchange, business-to-business transactions, and business-to-consumer transactions.
  • Database development - many of today's enterprise databases use XML in some way, such as for import and export.

These are only highlights of how XML is being used in today's information economy. Its uses 10 years from now may be quite different from today, as the standards community steers its evolution to meet immediate and future needs. The future of computing lies in the hope of open standards and not in the proprietary technologies of Microsoft and other behemoths. Or you can take the attitude of Susan Glinert Stevens, author of XML Can Go to H***.

 
Lee Honeycutt (honeyl@iastate.edu) - 1/5/07