Since its early beginnings in 1998, the eXtensible Markup Language, XML, has grown into a standard markup language family for portable data formats. The major document formats, such as the Open Document Format (ODF) known from OpenOffice, or Microsoft's so-called OpenXML format, are based on XML, just like many application level networking protocols such as XML-RPC, SOAP or Jabber/XMPP. Many interfaces of business applications use either standardized, proprietary or ad-hoc XML formats, and their configuration files are often written in XML, too. And clearly, XML has left its fingerprint on the web through RSS/Atom feeds, Ajax interfaces and configurable browser GUIs (XHTML/XUL).
The support of XML in programming languages has constantly improved over the last decade. Today, developers can grab very efficient tools from their tool box that substantially simplify XML handling. Not surprisingly, the Python programming language has some very powerful tools available that often even beat their main contenders from the Java world in terms of performance, and easily in terms of usability.
The objective of this course if to get an understanding of important XML technologies, and to learn how to use the available tools by example.
The course targets medium level to experienced Python programmers who want to generate and/or process XML (and, to some extend, HTML) content efficiently. A basic understanding of XML is helpful but not required.
Initially, the course will build up a common understanding of XML (specifically the XML Infoset) and some of its applications. The main theme then deals with efficient processing of XML (and a bit of HTML) in Python.
The presented tool set includes the ElementTree library that comes with Python since version 2.5, and the freely available lxml library that combines a compatible Python API with a large set of additional XML features.
INTRODUCTION TO XML
XML and the XML Infoset
Dealing with XML formats
FAST XML PROCESSING
Parsing and serializing XML files
Extracting information from XML documents (tree navigation, XPath, CSS selectors)
Processing and transforming XML documents in main memory
Generating XML documents
Stream processing of large XML files that do not fit into main memory
Creating proprietary XML formats
Validating XML formats with schema languages (e.g. RelaxNG, Schematron)
Binding XML documents to Python objects (lxml.objectify)
Creating application specific XML APIs with lxml
Introduction to stylesheet transformations (XSLT processing)
Note that the advanced topics are subject to time constraints. A choice will be made based on the interest of the participants.