XML

From BC$ MobileTV Wiki
Jump to: navigation, search
eXtensible Markup Language

eXtensible Markup Language (commonly abbreviated as XML) is the leading data exchange format. Started as an offshoot of the SGML standard, it adds a greater level of syntax control with self-policing through validation of schemas/DTDs which provide a strict set of structural rules for a given XML document. These features of XML make it easier to later extract and reuse the data itself, the data's associated metadata (if any), or additional semantics from the structure or hierarchy of the original application data.


XML was initially envisioned as a complement to the widely adopted and successful HTML, but rather than being concerned with presentation, XML would be focused on providing a markup language for data, content and information itself, for easier sharing and reuse between applications of all types. The development of XML lead to the concept of remotely accessing and sharing not just documents or specific pieces of data between applications, but actually remotely communicating between disparate application's code (specific instances, functions or methods), or more formally, providing direct access to an application's API via structured markup. This lead to the development of a formal standard for remote application communication and API interchange known as XML-RPC.


For some, XML-RPC was not enough as it only provided extremely limited possibilities for types of application functionality it could encapsulate; for example, it had trouble with certain authentication scenarios, real-time transactional operations, as well as push message-driven or event-based responsive systems (among others). Web Services were invented as a contract-based API invocation definition via well-defined XML data operation endpoints, and mainly went a few steps further than XML-RPC by defining WSDL as a binding language to describe available operations (functions/methods) and proved the need for a more formal declaration since XML was such a permissive technology in terms of how a document can actually be structured (i.e. HTML has a limited number of possible tags, but with XML the total number of possible tags is virtually infinite and constrained only by the imagination of the developer and need only be syntactically correct and well-formed).


SOAP, the first of the major Web Service technologies was initially developed jointly by the W3C and Oasis; te SOAP specification along with the associated WS-* standards was the end result of several years of research from volunteer representatives from leading IT firms at the time (starting with mainly Microsoft, but eventually getting contributions from IBM, Oracle, Sun Microsystems, Nokia, Google, Yahoo, AOL, Canon, SAP, BEA Systems, Software AG/webMethods, etc). The existence of both Web Services and more informal XML APIs enabled a new boom in application development, centered around the mashup of data between disparate applications, companies, domains, services and websites. This lead to the creation of AJAX, formalization of the informal XML API's into REST (as a new type of Web Services unto their own), and several related web application development technologies (i.e. Flex) which provided interesting new ways of interacting with and creating insightful views on data that had been available separately on the web already, but which could not be very easily mashed up or combined/presented in visually appealing or unique ways. The advent and rapid adoption of these new web technologies is often referred to as the Web 2.0 revolution or second boom period, a term whose meaning is rarely agreed upon but generally refers to a reinvigorating of the web development, E-Commerce and IT industries after the Dot-com bust.


One famous example of a pre-XML mashup which used scraping to painfully pull data out of malformed HTML is HousingMaps[1]. In the post-XML, Web 2.0 era it has become much easier to integrate the many services and applications which provide some form of Web Service or API to access their application's data and functionality, some examples of rich mashups include: TripperMap (AJAX, Flickr + TripIt + GoogleEarth), GeoBestOfYouTube (AJAX, YouTube + GMapify/GoogleMaps), PlayMyMusicVideos (AJAX, Last.FM + YouTube), PopURLs (AJAX, Flickr + YouTube + Digg + Reddit + Delicious + RSS News Aggregator), Musicovery (FLEX, using MusicGenome + Last.FM + Amazon).



Specifications


Sample XML Document Structure


XML Structure

Header

The standard XML header should look like this:

 <?xml version="1.0" encoding="UTF-8"?>


NameSpaces -
xmlns

XML namespaces are used for providing uniquely named elements and attributes in an XML instance. They are defined by a W3C recommendation called Namespaces in XML. A namespace is declared using the reserved XML attribute xmlns, the value of which must be a Uniform Resource Identifier (URI) reference. [2] It is included as an attribute into the root of the actual parent (first element) of the XML, for example, this code includes the XHTML namespace from W3C:

xmlns="http://www.w3.org/1999/xhtml"


Elements

What is an XML Element?

An XML element is everything from (including) the element's start tag to (including) the element's end tag.

An element can contain other elements, simple text or a mixture of both. Elements can also have attributes.


 <bookstore>
   <book category="CHILDREN">
     <title>Harry Potter</title>
     <author>J K. Rowling</author>
     <year>2005</year>
     <price>29.99</price>
   </book>
   <book category="WEB">
     <title>Learning XML</title>
     <author>Erik T. Ray</author>
     <year>2003</year>
     <price>39.95</price>
   </book>
 </bookstore>

In the example above, <bookstore> and <book> have element contents, because they contain other elements. <author> has text content because it contains text.

[3]


Attributes

What are XML ATTRIBUTES?

From HTML you will surely remember this (even if you've only ever glanced at a "View Source" HTML display of a webpage before): <img src="computer.gif">. The "src" attribute provides additional information about the <img> element.

In HTML (and in XML) attributes provide additional information about elements.

Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:


XML Attributes Must be Quoted

Attribute values must always be enclosed in quotes, but either single or double quotes can be used. For a person's sex, the person tag can be written like this:

<person sex="female">

or like this:

<person sex='female'>


XML Elements vs. Attributes

Take a look at these examples:

<person sex="female">
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>
<person>
  <sex>female</sex>
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

In the first example sex is an attribute. In the last, sex is an element. Both examples provide the same information.

There are no rules about when to use attributes and when to use elements. Attributes are handy in HTML. In XML my advice is to avoid them. Use elements instead.


W3C Recommended Approach

The following 3 XML documents contain exactly the same information:

A date attribute is used in the first example:

<note date="10/01/2008">
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

A date element is used in the second example:

<note>
  <date>10/01/2008</date>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

An expanded date element is used in the third: (THIS IS MY FAVORITE):

<note>
  <date>
    <day>10</day>
    <month>01</month>
    <year>2008</year>
  </date>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>


Avoid XML Attributes?

Some of the problems with using attributes are:

   * attributes cannot contain multiple values (elements can)
   * attributes cannot contain tree structures (elements can)
   * attributes are not easily expandable (for future changes)

Attributes are difficult to read and maintain. Use elements for data. Use attributes for information that is not relevant to the data.

Don't end up like this:

<note day="10" month="01" year="2008" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!">
</note>


XML Attributes for Metadata

Sometimes ID references are assigned to elements. These IDs can be used to identify XML elements in much the same way as the ID attribute in HTML. This example demonstrates this:

<messages>
  <note id="501">
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
  </note>
  <note id="502">
    <to>Jani</to>
    <from>Tove</from>
    <heading>Re: Reminder</heading>
    <body>I will not</body>
  </note>
</messages>

The ID above is just an identifier, to identify the different notes. It is not a part of the note itself.

What I'm trying to say here is that metadata (data about data) should be stored as attributes, and that data itself should be stored as elements.

[4]




Tools

XML Parsers

To read and update - create and manipulate - an XML document, you will need an XML parser. Parsers vary by language and implementation, but are typically an interface to a library of reusable code which can be customized to grab the structure and contents of XML data or XML documents.

There are two basic types of XML parsers:

  • Tree-based parser: This parser transforms an XML document into a tree structure. It analyzes the whole document, and provides access to the tree elements. e.g. the Document Object Model (DOM)
  • Event-based parser: Views an XML document as a series of events. When a specific event occurs, it calls a function to handle it

Java


SAX

[10] [11] [12] [13] [14]

DOM

[16] [17] [18] [19] [20] [21]

jDom


XPath


JAXB

[22] [23] [24] [25] [26]

JAXP

[27]

JiBX

[28]


Testing

JavaScript

jQuery

ActionScript

PHP

Expat

Expat is a non-validating XML parser, and ignores any DTDs. It is also the oldest approach for XML paring in PHP.

SAX

Create XML parsers and then define handlers for different XML events

DOM

The DOM extension allows you to operate on XML documents through the DOM API with PHP.

XmlReader

The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.

SimpleXML

The easiest, most developer-friendly methods and libraries for parsing and developing XML, yet not necessarily the most efficient (performs fine on occasional small XML documents or message. The SimpleXML extension provides conversion of XML to an object that can be processed with normal property selectors and array iterators.

Python

Perl

[30]

C#

[31] [32] [33] [34] [35] [36] [37]

Objective-C

C++

C

XML Editors

XML Databases

XML Security


Resources

Data Sources


Tutorials


External Links

References

  1. HousingMaps - a mashup of Craigslist real estate (apartments/houses for sale) + Google Maps (visually plotting locations on an interactive map): http://www.housingmaps.com/
  2. wikipedia:Xmlns
  3. XML Elements: http://www.w3schools.com/xml/xml_elements.asp
  4. W3schools on XML Attributes: http://www.w3schools.com/xml/xml_attributes.asp
  5. XML Viewer/Editor: http://xmlia.com/XMLBrowser.aspx (was the go-to online XML viewer, but now seems to be DOWN)
  6. View HTML Selection Source for Opera: http://blog.webkitchen.cz/view-selection-source
  7. Using the Castor source code generator: http://castor.codehaus.org/sourcegen.html
  8. Tutorial - First Steps with XMLBeans: http://xmlbeans.apache.org/documentation/tutorial_getstarted.html
  9. SAX
  10. JSP SAX Parser: http://www.java2s.com/Tutorial/Java/0360__JSP/JSPSAXParser.htm
  11. JAVA Technology and XML - Part 1 -- An Introduction to APIs for XML Processing: http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML/
  12. Java SAX Parsing Example: http://tutorials.jenkov.com/java-xml/sax-example.html
  13. Java and XML Basics, Part 3: http://www.devarticles.com/c/a/XML/Java-and-XML-Basics-3/
  14. Tip -- Validation and the SAX ErrorHandler interface: http://www.ibm.com/developerworks/library/x-tipeh.html
  15. DOM
  16. Java DOM parser tutorial: http://tutorials.jenkov.com/java-xml/dom.html
  17. Parse an XML string - Using DOM and a StringReader: http://www.java2s.com/Code/Java/XML/ParseanXMLstringUsingDOMandaStringReader.htm
  18. XML and Java Tutorial, Part 1: http://developerlife.com/tutorials/?p=25
  19. Listing the Contents of Parse Tree Nodes - Using the DOM Parser to Extract XML Document Data: http://www.java2s.com/Tutorial/Java/0440__XML/ListingtheContentsofParseTreeNodesUsingtheDOMParsertoExtractXMLDocumentData.htm
  20. Converting an XML Fragment into a DOM Fragment: http://www.java2s.com/Code/Java/XML/ConvertinganXMLFragmentintoaDOMFragment.htm
  21. Simple XML Parsing (in Java) with SAX and DOM: http://onjava.com/pub/a/onjava/2002/06/26/xml.html
  22. wikipedia:JAXB
  23. JAXB hello world XML reader/writer example: http://www.mkyong.com/java/jaxb-hello-world-example/
  24. Simple and efficient XML parsing using JAXB 2.0: http://www.javarants.com/2006/04/30/simple-and-efficient-xml-parsing-using-jaxb-2-0/
  25. JAXB & Namespaces: http://blog.bdoughan.com/2010/08/jaxb-namespaces.html * How to parse an object to and from XML using JAXB : http://www.javaprogrammingforums.com/file-input-output-tutorials/4062-how-parse-object-xml-using-jaxb.html
  26. Binding Map to XML -- Dynamic Tag Names with JAXB: http://dzone.com/articles/map-to-xml-dynamic-tag-names-with-jaxb
  27. wikipedia:JAXP
  28. wikipedia:JiBX
  29. Webscraping with Python and BeautifulSoup: http://blog.dispatched.ch/webscraping-with-python-and-beautifulsoup/
  30. Perl & Simple::XML: http://www.perlmonks.org/?node_id=643008
  31. Learning C# XML: http://www.xml.com/pub/a/2002/03/06/csharpxml.html
  32. Basic XML parsing in C# (downloadable example): http://www.go4expert.com/forums/showthread.php?t=1484
  33. Simple XML Parser in C# (downloadable example): http://www.c-sharpcorner.com/UploadFile/shehperu/SimpleXMLParser11292005004801AM/SimpleXMLParser.aspx
  34. Easy XML Parsing in C# (downloadable example): http://www.codeproject.com/KB/cs/nicexmlparsing.aspx
  35. How to read XML from a file by using Visual C# : http://support.microsoft.com/kb/307548
  36. Parsing HTML in Microsoft C#: http://www.developer.com/net/csharp/article.php/2230091/Parsing-HTML-in-Microsoft-C.htm
  37. .NET XML Best Practices: http://support.softartisans.com/kbview.aspx?id=673
  38. Moncton, NB example: http://dd.weatheroffice.ec.gc.ca/citypage_weather/xml/NB/s0000654_e.xml
  39. Weather Underground: http://www.wunderground.com
  40. Geolocation Weather Mashup: http://cookbooks.adobe.com/post_Geolocation_Weather_Mash_Up-19143.html
  41. SportsMLT -- SportsML Transformation Library: http://www.sportsstandards.org/sm/sportsmlt
  42. XMLSTATS -- Using SportsML: http://erikberg.com/xmlstats/
  43. IPTC - SportsML (G2) examples: http://www.iptc.org/site/News_Exchange_Formats/SportsML-G2/Examples/
  44. MLB's own proprietary XML components: http://gd2.mlb.com/components/
  45. LIVE STATS (from Stats,inc. and ChalkGaming): http://stats.justbet.com/default.aspx

See Also

XSL | XSLT | DTD | XSD | RDF