Sunday, March 28, 2010

XML & XSL

What is XML?
a meta language that allows you to create and format your own document markups
a method for putting structured data into a text file; these files are
- easy to read
- unambiguous
- extensible
- platform-independent

What is XML?
a family of technologies:
- XML 1.0
- Xlink
- Xpointer & Xfragments
- CSS, XSL, XSLT
- DOM
- XML Namespaces
- XML Schemas

XML Facts
officially recommended by W3C since 1998
a simplified form of SGML (Standard Generalized Markup Language)
primarily created by Jon Bosak of Sun Microsystems

XML Facts
important because it removes two constraints which were holding back Web developments:
dependence on a single, inflexible document type (HTML);
the complexity of full SGML, whose syntax allows many powerful but hard-to-program options

Quick Comparison
HTML
- uses tags and attributes
- content and formatting can be placed together

text
- tags and attributes are pre-determined and rigid
XML
- uses tags and attributes
- content and format are separate; formatting is contained in a stylesheet
- allows user to specify what each tag and attribute means

Importance of being able to define tags and attributes
document types can be explicitly tailored to an audience
the linking abilities are more powerful
bidirectional and multi-way link
link to a span of text, not just a single point

The pieces
there are 3 components for XML content:
- the XML document
- DTD (Document Type Declaration)
- XSL (Extensible Stylesheet Language)
The DTD and XSL do not need to be present in all cases

A well-formed XML document
elements have an open and close tag, unless it is an empty element
attribute values are quoted
if a tag is an empty element, it has a closing / before the end of the tag
open and close tags are nested correctly
there are no isolated mark-up characters in the text (i.e. < > & ]]>)
if there is no DTD, all attributes are of type CDATA by default
A valid XML document
has an associated DTD and complies with the constraints in the DTD

XML basics
the XML declaration
- not required, but typically used
- attributes include:
version
encoding – the character encoding used in the document
standalone –if an external DTD is required



XML basics
to specify a DTD for the document
2 forms:



XML basics
comments
- contents are ignored by the processor
- cannot come before the XML declaration
- cannot appear inside an element tag
- may not include double hyphens
XML basics
text an element
- can contain text, other elements or a combination
- element name:
-must start with a letter or underscore and can have any number of letters, numbers, hyphens, periods, or underscores
- case-sensitive;
- may not start with xml

XML basics
Elements (continued)
can be a parent, grandparent, grandchild, ancestor, or descendant
each element tag can be divided into 2 parts – namespace:tag name

XML basics
Namespaces:
- not mandatory, but useful in giving uniqueness to an element
- help avoid element collision
- declared using the xmlns:name=value attribute; a URI is recommended for value
- can be an attribute of any element; the scope is inside the element’s tags

XML basics
Namespaces (continued):
- may define more than 1 per element
- if no name given after xmlns prefix, uses the default namespace which is applied to all elements in the defining element without their own namespace
- can set default namespace to an empty string to ensure no default namespace is in use within an element

XML basics
key=”value” an attribute
- describes additional information about an element
text
- value must always be quoted
- key names have same restrictions as element names
- reserved attributes are
- xml:lang
- xml:space
XML basics
OR empty element
- has no text
- used to add nontextual content or to provide additional information to parser
processing instruction
- for attributes specific to an outside application
XML basics

- to define special sections of character data which the processor does not interpret as markup
- anything inside is treated as plain text



XSLT
eXtensible Stylesheet Language Transformations

Agenda
XSLT overview
Understanding XPath notation
Processing elements in XSLT templates

XSLT Overview
W3C technology for transforming an XML document into some other text-based form: XML, HTML, WML, etc.
XSLT Versions
– XSLT 1.0 (Nov 1999)
– XSLT 2.0 (Nov 2002)
Official web site:
http://www.w3c.org/Style/XSL


eXtensible Stylesheet Language (XSL)
XSL is a language for expressing stylesheets.
– XSLT
• Transforming XML document
– Xpath
• Expression language used by XSLT to locate elements and attributes in an XML doc.
– XSL-FO (Formatting Objects)
• Specifies formatting properties for rendering the doc to some other format.
XSLT Advantages & Disadvantages
Advantages:
Easy display formatted XML data in browser.
Easier to modify when XML data format changes than to modify DOM and SAX parsing code.
Can be used with database queries that return XML.
Disadvantages:
Memory intensive, performance hit.
Difficult to implement complex business rules.
Have to learn new language.


XSLT Processors
Apache Xalan
http://xml.apache.org/xalan

SAXON
http://saxon.sourceforge.net

Microsoft’s XML Parser 4.0 (MSXML)
http://www.microsoft.com/xml
XSLT and Java
JDK 1.4 contains all necessary classes
– See javax.xml.transform package.
Lower versions require downloading XSLT processor and SAX parser.
XML Transformation Example

Running Example
we can run an XML which includes XSL into IE.

Run Xalan from the command line:
java org.apache.xalan.xslt.Process -in hello.xml -xsl hello.xsl
Run our Transform.java from the command line.
Java Transform hello.xml hello.xsl

Note: - Here Transform is Java Class.
Hello.xml—is sample xml
hello.xsl---- is sample xsl


XPath
XPath is an expression language used to:
– Find nodes and attributes (location paths) in the XML file
– Test boolean conditions
– Manipulate strings
– Perform numerical calculations




Location Paths
Example:




Can be relative or absolute
Each step in path separated by / or //
Evaluated in reference to the current node

Location Paths (cont.)
Match root node





Match all children





Location Paths (cont.)
Match an element
– Use // to indicate zero or more elements may occur between slashes









Location Paths (cont.)
Match a specific element
– Use […] as a predicate filter to select a particular element













Location Paths (cont.)
Match a specific attribute
– Use @attribute to select a particular attribute










XSLT Stylesheet Elements
Matching and selection templates
– xsl:template
– xsl:apply-templates
– xsl:value-of
Branching elements
– xsl:for-each
– xsl:if
– xsl:choose
XSLT template Element
xsl:template match=“XPath”
– Defines a template rule for producing output
– Is applied only to nodes that match the pattern
– Invoked by using








Your name is

XSLT value-of Element
xsl:value-of select=“expression”
– Evaluates the expression as a string and outputs the result
– Applied only to the first match
– “.” selects the text value of the current node


John Doe


Your name is

XSLT for-each Element
xsl:for-each select=“expression”
– Processes each node selected by the XPath expression
– Applied only to the first match
– “.” selects the text value of the current node


Kim Smith
Jack Black







XSLT if Element
xsl:if test=“expression”
– Evaluates the expression and if true applies the template
– No if-else, use choose instead








XSLT choose Element
xsl:choose
– Selects any number of alternatives
– Use instead of if-else, or switch statement used in other programming languages



Missing value!





XSLT Functions
Large set of utility functions
Examples:
– count : returns the number of nodes in a node set

select="count(/addressbook/entry)"/>

– starts-with : returns true if string starts with a given character


Starts with ‘h’