a meta language that allows you to create and format your own document markups
a method for putting structured data into a text file; these files are
- easy to read
- unambiguous
- extensible
- platform-independent
What is XML?
a family of technologies:
- XML 1.0
- Xlink
- Xpointer & Xfragments
- CSS, XSL, XSLT
- DOM
- XML Namespaces
- XML Schemas
XML Facts
officially recommended by W3C since 1998
a simplified form of SGML (Standard Generalized Markup Language)
primarily created by Jon Bosak of Sun Microsystems
XML Facts
important because it removes two constraints which were holding back Web developments:
dependence on a single, inflexible document type (HTML);
the complexity of full SGML, whose syntax allows many powerful but hard-to-program options
Quick Comparison
HTML
- uses tags and attributes
- content and formatting can be placed together
text
- tags and attributes are pre-determined and rigid
XML
- uses tags and attributes
- content and format are separate; formatting is contained in a stylesheet
- allows user to specify what each tag and attribute means
Importance of being able to define tags and attributes
document types can be explicitly tailored to an audience
the linking abilities are more powerful
bidirectional and multi-way link
link to a span of text, not just a single point
The pieces
there are 3 components for XML content:
- the XML document
- DTD (Document Type Declaration)
- XSL (Extensible Stylesheet Language)
The DTD and XSL do not need to be present in all cases
A well-formed XML document
elements have an open and close tag, unless it is an empty element
attribute values are quoted
if a tag is an empty element, it has a closing / before the end of the tag
open and close tags are nested correctly
there are no isolated mark-up characters in the text (i.e. < > & ]]>)
if there is no DTD, all attributes are of type CDATA by default
A valid XML document
has an associated DTD and complies with the constraints in the DTD
XML basics
the XML declaration
- not required, but typically used
- attributes include:
version
encoding – the character encoding used in the document
standalone –if an external DTD is required
XML basics
to specify a DTD for the document
2 forms:
XML basics
comments
- contents are ignored by the processor
- cannot come before the XML declaration
- cannot appear inside an element tag
- may not include double hyphens
XML basics
- can contain text, other elements or a combination
- element name:
-must start with a letter or underscore and can have any number of letters, numbers, hyphens, periods, or underscores
- case-sensitive;
- may not start with xml
XML basics
Elements (continued)
can be a parent, grandparent, grandchild, ancestor, or descendant
each element tag can be divided into 2 parts – namespace:tag name
XML basics
Namespaces:
- not mandatory, but useful in giving uniqueness to an element
- help avoid element collision
- declared using the xmlns:name=value attribute; a URI is recommended for value
- can be an attribute of any element; the scope is inside the element’s tags
XML basics
Namespaces (continued):
- may define more than 1 per element
- if no name given after xmlns prefix, uses the default namespace which is applied to all elements in the defining element without their own namespace
- can set default namespace to an empty string to ensure no default namespace is in use within an element
XML basics
key=”value” an attribute
- describes additional information about an element
- value must always be quoted
- key names have same restrictions as element names
- reserved attributes are
- xml:lang
- xml:space
XML basics
- has no text
- used to add nontextual content or to provide additional information to parser
processing instruction
- for attributes specific to an outside application
XML basics
- to define special sections of character data which the processor does not interpret as markup
- anything inside is treated as plain text
XSLT
eXtensible Stylesheet Language Transformations
Agenda
XSLT overview
Understanding XPath notation
Processing elements in XSLT templates
XSLT Overview
W3C technology for transforming an XML document into some other text-based form: XML, HTML, WML, etc.
XSLT Versions
– XSLT 1.0 (Nov 1999)
– XSLT 2.0 (Nov 2002)
Official web site:
http://www.w3c.org/Style/XSL
eXtensible Stylesheet Language (XSL)
XSL is a language for expressing stylesheets.
– XSLT
• Transforming XML document
– Xpath
• Expression language used by XSLT to locate elements and attributes in an XML doc.
– XSL-FO (Formatting Objects)
• Specifies formatting properties for rendering the doc to some other format.
XSLT Advantages & Disadvantages
Advantages:
Easy display formatted XML data in browser.
Easier to modify when XML data format changes than to modify DOM and SAX parsing code.
Can be used with database queries that return XML.
Disadvantages:
Memory intensive, performance hit.
Difficult to implement complex business rules.
Have to learn new language.
XSLT Processors
Apache Xalan
http://xml.apache.org/xalan
SAXON
http://saxon.sourceforge.net
Microsoft’s XML Parser 4.0 (MSXML)
http://www.microsoft.com/xml
XSLT and Java
JDK 1.4 contains all necessary classes
– See javax.xml.transform package.
Lower versions require downloading XSLT processor and SAX parser.
XML Transformation Example
Running Example
we can run an XML which includes XSL into IE.
Run Xalan from the command line:
java org.apache.xalan.xslt.Process -in hello.xml -xsl hello.xsl
Run our Transform.java from the command line.
Java Transform hello.xml hello.xsl
Note: - Here Transform is Java Class.
Hello.xml—is sample xml
hello.xsl---- is sample xsl
XPath
XPath is an expression language used to:
– Find nodes and attributes (location paths) in the XML file
– Test boolean conditions
– Manipulate strings
– Perform numerical calculations
Location Paths
Example:
…
Can be relative or absolute
Each step in path separated by / or //
Evaluated in reference to the current node
Location Paths (cont.)
Match root node
…
Match all children
…
Location Paths (cont.)
Match an element
– Use // to indicate zero or more elements may occur between slashes
Location Paths (cont.)
Match a specific element
– Use […] as a predicate filter to select a particular element
Location Paths (cont.)
Match a specific attribute
– Use @attribute to select a particular attribute
XSLT Stylesheet Elements
Matching and selection templates
– xsl:template
– xsl:apply-templates
– xsl:value-of
Branching elements
– xsl:for-each
– xsl:if
– xsl:choose
XSLT template Element
xsl:template match=“XPath”
– Defines a template rule for producing output
– Is applied only to nodes that match the pattern
– Invoked by using
Your name is
XSLT value-of Element
xsl:value-of select=“expression”
– Evaluates the expression as a string and outputs the result
– Applied only to the first match
– “.” selects the text value of the current node
Your name is
XSLT for-each Element
xsl:for-each select=“expression”
– Processes each node selected by the XPath expression
– Applied only to the first match
– “.” selects the text value of the current node
XSLT if Element
xsl:if test=“expression”
– Evaluates the expression and if true applies the template
– No if-else, use choose instead
XSLT choose Element
xsl:choose
– Selects any number of alternatives
– Use instead of if-else, or switch statement used in other programming languages
Missing value!
XSLT Functions
Large set of utility functions
Examples:
– count : returns the number of nodes in a node set
– starts-with : returns true if string starts with a given character
Starts with ‘h’