DTD - XML Building Blocks
The main building blocks of XML and HTML documents are element tags.
XML Document Building Blocks
All XML documents (and HTML documents) are made up of the following simple building blocks:
- Elements
- Attributes
- Entities
- PCDATA
- CDATA
Elements
Elements are the main building blocks of XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements are "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br", and "img".
Example:
Attributes provide additional information about elements.
Attributes are always placed in the start tag of an element. Attributes always appear in name/value pairs. The following "img" element has additional information about the source file:
The element name is "img". The attribute name is "src". The attribute value is "computer.gif". Since the element itself is empty, it is closed with a "/" .
Entities
Entities are used to define variables for common text. Entity references are references to entities.
Most people are familiar with this HTML entity reference: " ". This "non-breaking space" entity is used in HTML to insert an extra space in a document.
When the document is parsed by an XML parser, entities are expanded.
Entity Reference | Character |
---|---|
< | < |
> | > |
& | & |
" | " |
' | ' |
PCDATA
PCDATA stands for Parsed Character Data.
Think of character data as the text between the start tag and end tag of an XML element.
PCDATA is text that will be parsed by the parser. This text will be checked for entities and markup.
Tags within the text will be treated as markup, and entities will be expanded.
However, parsed character data should not contain any &, <, or > characters; these need to be replaced with the &, <, and > entities respectively.
CDATA
CDATA stands for Character Data.
CDATA is text that will not be parsed by the parser. Tags within this text will not be treated as markup, and entities will not be expanded.