<li class="tocline"><a href="#dtds">Appendix A. DTDs</a></li>
<li class="tocline"><a href="#prohibitions">Appendix B. Element
Prohibitions</a></li>
<li class="tocline"><a href="#guidelines">Appendix C. HTML Compatibility Guidelines</a></li>
<li class="tocline"><a href="#acks">Appendix D. Acknowledgements</a></li>
<li class="tocline"><a href="#refs">Appendix E. References</a></li>
</ul>
</div>
<!--OddPage-->
<h1><a name="xhtml" id="xhtml">1. What is XHTML?</a></h1>
<p>XHTML is a family of current and future document types and modules that
reproduce, subset, and extend HTML 4.0 <a href="#ref-html4">[HTML]</a>. XHTML family document types are <abbr title="Extensible Markup Language">XML</abbr> based,
and ultimately are designed to work in conjunction with XML-based user agents.
The details of this family and its evolution are
discussed in more detail in the section on <a href="#future">Future
Directions</a>. </p>
<p>XHTML 1.0 (this specification) is the first document type in the XHTML
family. It is a reformulation of the three HTML 4.0 document types as
applications of XML 1.0 <a href="#ref-xml"> [XML]</a>. It is intended
to be used as a language for content that is both XML-conforming and, if some
simple <a href="#guidelines">guidelines</a> are followed,
operates in HTML 4.0 conforming user agents. Developers who migrate
their content to XHTML 1.0 will realize the following benefits:</p>
<ul>
<li>XHTML documents are XML conforming. As such, they are readily viewed,
edited, and validated with standard XML tools.</li>
<li>XHTML documents can be written to
to operate as well or better than they did before in existing
HTML 4.0-conforming user agents as well as in new, XHTML 1.0 conforming user
agents.</li>
<li>XHTML documents can utilize applications (e.g. scripts and applets) that rely
upon either the HTML Document Object Model or the XML Document Object Model <a
href="#ref-dom">[DOM]</a>.</li>
<li>As the XHTML family evolves, documents conforming to XHTML 1.0 will be more
likely to interoperate within and among various XHTML environments.</li>
</ul>
<p>The XHTML family is the next step in the evolution of the Internet. By
migrating to XHTML today, content developers can enter the XML world with all
of its attendant benefits, while still remaining confident in their
content's backward and future compatibility.</p>
<h2><a name="html4" id="html4">1.1 What is HTML 4.0?</a></h2>
<p>HTML 4.0 <a href="#ref-html4">[HTML]</a> is an <abbr title="Standard
<p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
</body>
</html></pre>
</div>
<p>Note that in this example, the XML declaration is included. An XML
declaration like the one above is
not required in all XML documents. XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required
when the character encoding of the document is other than the default UTF-8 or
UTF-16.</p>
<h3><a name="well-formed" id="well-formed">3.1.2 Using XHTML with
other namespaces</a></h3>
<p>The XHTML namespace may be used with other XML namespaces
as per <a href="#ref-xmlns">[XMLNAMES]</a>, although such
documents are not strictly conforming XHTML 1.0 documents as
defined above. Future work by W3C will address ways to specify
conformance for documents involving multiple namespaces.</p>
<p>The following example shows the way in which XHTML 1.0 could
be used in conjunction with the MathML Recommendation:</p>
<!-- make HTML the default namespace for a hypertext commentary -->
<p xmlns='http://www.w3.org/1999/xhtml'>
This is also available <a href="http://www.w3.org/">online</a>.
</p>
</notes>
</book>
</pre>
</div>
<h2><a name="uaconf" id="uaconf">3.2 User Agent
Conformance</a></h2>
<p>A conforming user agent must meet all of the following
criteria:</p>
<ol>
<li>In order to be consistent with the XML 1.0 Recommendation <a
href="#ref-xml">[XML]</a>, the user agent must parse and evaluate
an XHTML document for well-formedness. If the user agent claims
to be a validating user agent, it must also validate documents
against their referenced DTDs according to <a href="#ref-xml">
[XML]</a>.</li>
<li>When the user agent claims to support <a href="#facilities">
facilities</a> defined within this specification or required by
this specification through normative reference, it must do so in
ways consistent with the facilities' definition.</li>
<li>When a user agent processes an XHTML document as generic XML,
it shall only recognize attributes of type
<code>ID</code> (e.g. the <code>id</code> attribute on most XHTML elements)
as fragment identifiers.</li>
<li>If a user agent encounters an element it does not recognize,
it must render the element's content.</li>
<li>If a user agent encounters an attribute it does not
recognize, it must ignore the entire attribute specification
(i.e., the attribute and its value).</li>
<li>If a user agent encounters an attribute value it doesn't
recognize, it must use the default attribute value.</li>
<li>If it encounters an entity reference (other than one
of the predefined entities) for which the User Agent has
processed no declaration (which could happen if the declaration
is in the external subset which the User Agent hasn't read), the entity
reference should be rendered as the characters (starting
with the ampersand and ending with the semi-colon) that
make up the entity reference.</li>
<li>When rendering content, User Agents that encounter
characters or character entity references that are recognized but not renderable should display the document in such a way that it is obvious to the user that normal rendering has not taken place.</li>
<li>
The following characters are defined in [XML] as whitespace characters:
<ul>
<li>Space (&#x0020;)</li>
<li>Tab (&#x0009;)</li>
<li>Carriage return (&#x000D;)</li>
<li>Line feed (&#x000A;)</li>
</ul>
<p>
The XML processor normalizes different system's line end codes into one
single line-feed character, that is passed up to the application. The XHTML
user agent in addition, must treat the following characters as whitespace:
</p>
<ul>
<li>Form feed (&#x000C;)</li>
<li>Zero-width space (&#x200B;)</li>
</ul>
<p>
In elements where the 'xml:space' attribute is set to 'preserve', the user
agent must leave all whitespace characters intact (with the exception of
leading and trailing whitespace characters, which should be removed).
Otherwise, whitespace
is handled according to the following rules:
</p>
<ul>
<li>
All whitespace surrounding block elements should be removed.
</li>
<li>
Comments are removed entirely and do not affect whitespace handling. One
whitespace character on either side of a comment is treated as two white
space characters.
</li>
<li>
Leading and trailing whitespace inside a block element must be removed.
</li>
<li>Line feed characters within a block element must be converted into a
space (except when the 'xml:space' attribute is set to 'preserve').
</li>
<li>
A sequence of white space characters must be reduced to a single space
character (except when the 'xml:space' attribute is set to 'preserve').
</li>
<li>
With regard to rendition,
the User Agent should render the content in a
manner appropriate to the language in which the content is written.
In languages whose primary script is Latinate, the ASCII space
character is typically used to encode both grammatical word boundaries and
typographic whitespace; in languages whose script is related to Nagari
(e.g., Sanskrit, Thai, etc.), grammatical boundaries may be encoded using
the ZW 'space' character, but will not typically be represented by
typographic whitespace in rendered output; languages using Arabiform scripts
may encode typographic whitespace using a space character, but may also use
the ZW space character to delimit 'internal' grammatical boundaries (what
look like words in Arabic to an English eye frequently encode several words,
e.g. 'kitAbuhum' = 'kitAbu-hum' = 'book them' == their book); and languages
in the Chinese script tradition typically neither encode such delimiters nor
use typographic whitespace in this way.
</li>
</ul>
<p>Whitespace in attribute values is processed according to <a
href="#ref-xml">[XML]</a>.</p>
</li>
</ol>
<!--OddPage-->
<h1><a name="diffs" id="diffs">4. Differences with HTML
4.0</a></h1>
<p>Due to the fact that XHTML is an XML application, certain
practices that were perfectly legal in SGML-based HTML 4.0 <a
href="#ref-html4">[HTML]</a> must be changed.</p>
<h2><a name="h-4.1" id="h-4.1">4.1 Documents must be
well-formed</a></h2>
<p><a href="#wellformed">Well-formedness</a> is a new concept
introduced by <a href="#ref-xml">[XML]</a>. Essentially this
means that all elements must either have closing tags or be
written in a special form (as described below), and that all the
elements must nest.</p>
<p>Although overlapping is illegal in SGML, it was widely
legal values for attributes of type <code>ID</code> is much smaller than
for those of type <code>CDATA</code>, the type of the <code>name</code>
attribute has been changed to <code>NMTOKEN</code>. This attribute is
constrained such that it can only have the same values as type
<code>ID</code>, or as the <code>Name</code> production in XML 1.0 Section
2.5, production 5. Unfortunately, this constraint cannot be expressed in the
XHTML 1.0 DTDs. Because of this change, care must be taken when
converting existing HTML documents. The values of these attributes
must be unique within the document, valid, and any references to these
fragment identifiers (both
internal and external) must be updated should the values be changed during
conversion.</p>
<p>Finally, note that XHTML 1.0 has deprecated the
<code>name</code> attribute of the <code>a</code>, <code>applet</code>, <code>frame</code>, <code>iframe</code>, <code>img</code>, and <code>map</code>
elements, and it will be
removed from XHTML in subsequent versions.</p>
<h2>C.9 Character Encoding</h2>
<p>To specify a character encoding in the document, use both the
encoding attribute specification on the xml declaration (e.g.
<code class="greenmono"><?xml version="1.0"
encoding="EUC-JP"?></code>) and a meta http-equiv statement