Content negotiation at its simplest is a conversation between your web server and a user agent (browser, search engine bot etc) to determine the preferred format or version of a resource to serve. In this, the second in my article series “From the Top” I will introduce you to the web (head) waiter that knows how to correctly serve your web page to a user agent.
MIME and Content Negotiation
Content negotiation at its simplest is a conversation between your web server and a user agent (browser, search engine bot etc) to determine the preferred format or version of a resource to serve. To achieve this, a user agent will send a Hypertext Transfer Protocol (HTTP) “Accept” header to the web server with a list of preferred Multipurpose Internet Mail Extension (MIME) types and a ranking or weighting (the Quality Value) of how well it understands a particular MIME type (the ranking is from 0 to 1 to three decimal places). If no Quality Value (q) is defined for a MIME type then q=1.0 is assumed. MIME as the name suggests was first used as an extension to email but is also used by HTTP. It is simply a way to define what type of media (resource) is being sent — be it an image, Flash, text etc. Each resource has a MIME type consisting of two parts separated by a forward-slash, “/”. The first part is called the top-level media type and the second is the subtype. For example, the MIME type for a Graphics Interchange Format (GIF) image would be image/gif.
A Mozilla accept header may look like this:
Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, image/jpeg, image/gif;q=0.2, */*;q=0.1
For Hypertext Markup Language (HTML) the MIME type is text/html. User agents treat this in a very forgiving way but this tag soup rendering mode is slower to display as a result.
If you are using Extensible Hypertext Markup Language (XHTML) markup the correct MIME type is application/xhtml+xml. This is because XHTML is HTML reformulated as an Extensible Markup Language (XML) application and as such must be well-formed with no overlapping elements, properly closed tags, attribute values enclosed in quotes and care taken with case-sensitivity (all elements and attributes written in lowercase). Due to these quality checks, user agents are able to handle the markup more efficiently and render quicker than tag soup.
User agent support for application/xhtml+xml
Unfortunately, Internet Explorer 6 Service Pack 2 (IE 6 SP2) and below does not understand this particular MIME type and will attempt to download the page as an XML file. The MIME type is also buggy in several other user agents although as time goes by compatibility will naturally improve. Until such time then, there are two methods to get around this showstopper. The first is to use XHTML 1.0 in what is termed backwards compatibility mode and the other is through content negotiation.
XHTML 1.0 Backwards Compatibility Mode
By following the World Wide Web Consortium (W3C) guidelines XHTML 1.0 can be served with the MIME type of text/html. This mode considers such techniques as including a space before the trailing /> when closing a tag, avoiding white space and line breaks in attribute values and encoding ampersands in content including Universal Resource Indicators (URIs) referenced in hyperlinks. A lot of web developers coding to web standards (including the author and this website at time of writing) work in this mode and it is a matter of hot debate about the correctness of this method. Although we may not be utilising XML within a particular website at launch, XHTML 1.0 for me at least offers forward compatibility with future XML applications I may be asked to implement — HTML cannot offer that. I can and do serve XHTML 1.0 as text/html without specific content negotiation and it works (obviously) but I want to do this right and so does this article series.
Gotchas of Serving application/xhtml+xml
Before we embark on serving our XHTML with the correct MIME type there are several issues that you must consider — remember you might be coding for a Content Management System (CMS) / multi-author environment. The text editor needs to be capable and configured properly.
- Code must be well-formed. Remember XHTML is a reformulation of HTML as an application of XML. As such must it be well-formed.
- The XML Declaration is required for character sets other than UTF-8 and UTF-16 and is referenced as part of the XML Prolog on line 1 of your code in the format
<xml version="1.0" encoding="yourChosenCharset" ?>
- Stylesheets may be referenced with an XML stylesheet Processing Instruction (PI) as part of the XML Prolog (along with the DOCTYPE). The Processing Instruction
<?xml-stylesheet href="myStyle.css" type="text/css" ?>
is written much the same way as the HTML 4 <link rel="stylesheet">
. If you are serving alternative stylesheets then the link href="myStyle.css" title="Medium" rel="alternate stylesheet" type="text/css"
becomes <?xml-stylesheet alternate="yes" href="myStyle.css" title="Medium" type="text/css"?>
as a style sheet PI. The W3C have written a very clear normative recommendation on associating style sheets with XML documents including several more examples.
- Only five named character entities are “safe”: <, >, &, " and '. It should be noted however that ' is undefined in HTML 4 and unsupported in Internet Explorer. You will need to ensure that all other character references are numeric in nature. Lachy’s log explains character references in greater detail.
- Anything within style or script tags are treated as XML so you must wrap content using < or & in a Character Data (CDATA) section.
- No elements are inferred, for example, tbody.
- Scripting with
document.write
doesn’t work, you must use the Document Object Model (DOM) core methods. If you use Google’s AdSense on your website then you may need to apply a Google-approved workaround (if they haven’t fixed it already).
- Cascading Style Sheets (CSS) are applied slightly differently. For example, to apply a background colour to the body element would require the html element to be styled also as the body element doesn’t cover the whole viewport when using XHTML.
- HTML comments in scripts or styles for example
<script type="text/javascript"></script>
will result in a fatal error (and the page won’t display as a result) in an XHTML document served as application/xhtml+xml
. This is due to the fact that in XML the last pair of hyphens causes a well-formedness error. The correct way to write script or style blocks for XHTML when served as application/xhtml+xml
is in the format <script type="text/javascript"><![CDATA[ { // do something } //]]></script>
. Of course, the easiest way to avoid all this is to put your scripts and styles into external files in the first place. Lachlan Hunt has a great in-depth article that goes into the why’s and where for’s of HTML comments in scripts.
Doing the “Right Thing” ™
Rather than letting the server decide whether to serve a page as index.xhtml or index.html (note these would be separate files) based on the preferences sent in the accept header, content negotiation should be configured on the web server if you have access, or through scripting in your template. If you have an Apache server, I’ll send you off to read the manual now, as I want to concentrate on providing an overview of scripting a solution in this article.
Irrespective of using asp.NET, PHP, etc the following thought process is required:
- Lower the Quality of Source (qs) parameter for application/xhtml+xml on the server to account for possibly incomplete accept headers.
- Specifically test for the W3C validator as it doesn’t send a complete accept header.
- Parse the
http_accept
header - find out the user agent’s preference.
- Send the preferred MIME type.
- Send a Vary header to inform proxy servers that content negotiation is taking place.
- Send the correct DOCTYPE.
- Send the appropriate opening html tag — this will be discussed further in next week’s article.
- If application/xhtml+xml is preferred, send the XML Declaration — do not include it with text/html as this will put IE for Windows into Quirks Mode.
- If application/xhtml+xml is preferred, send the XML Stylesheet declaration(s).
- If text/html is preferred, the closing of tags with ” />” needs to be changed to “>”.
- For
text/html
it is best to define the character encoding in the HTTP header rather than hard code <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
into your pages or templates. Again, the W3C have a very straightforward document explaining server configuration techniques.
Code Example — PHP
The following code snippet is a small modification to Neil Crosby’s original work. Test and refine on a development server please, I offer no warranties on it working straight off the bat. Write a simple include at the top of your web page (or template) to reference this external file.
<?php
$charset = "utf-8";
$mime = "text/html";
function fix_code($buffer) {
return (str_replace(" />", ">", $buffer));
}
if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) {
# if there's a Q value for "application/xhtml+xml" then also
# retrieve the Q value for "text/html"
if(preg_match("/application\\/xhtml\\+xml;q=0(\\.[1-9]+)/i",
$_SERVER["HTTP_ACCEPT"], $matches)) {
$xhtml_q = $matches[1];
if(preg_match("/text\\/html;q=0(\\.[1-9]+)/i",
$_SERVER["HTTP_ACCEPT"], $matches)) {
$html_q = $matches[1];
# if the Q value for XHTML is greater than or equal to that
# for HTML then use the "application/xhtml+xml" mimetype
if($xhtml_q >= $html_q) {
$mime = "application/xhtml+xml";
}
}
# if there was no Q value, then just use the
# "application/xhtml+xml" mimetype
} else {
$mime = "application/xhtml+xml";
}
}
# special check for the W3C_Validator
if (stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {
$mime = "application/xhtml+xml";
}
# set the prolog_type according to the mime type which was determined
if($mime == "application/xhtml+xml") {
$prolog_type = "<?xml version=\"1.0\" encoding=\"$charset\" ?>
<?xml-stylesheet type=\"text/css\" href=\"/styles/initial.css\" media=\"all\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en-GB\" lang=\"en-GB\" dir=\"ltr\">\\n";
} else {
ob_start("fix_code");
$prolog_type = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
<html lang=\"en-GB\" dir=\"ltr\">\\n";
}
# finally, output the mime type and prolog type
header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");
print $prolog_type;
?>
Code Example — asp.NET
Via Roger Johansson’s article Content Negotiation, Justin Perkins provides an asp.NET example (same disclaimer as before):
string http_accept = Request.ServerVariables["HTTP_ACCEPT"];
string http_user_agent = Request.ServerVariables["HTTP_USER_AGENT"];
if (((http_accept != null) && (http_accept.ToLower().IndexOf("application/xhtml+xml") > 0)) || ((http_user_agent != null) && (http_user_agent.ToLower().IndexOf("w3c_validator") > -1))){
Response.ContentType = "application/xhtml+xml";
Response.Write("<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\\n");
}
else{
Response.ContentType = "text/html";
}
Response.Charset = "iso-8859-1";
Response.AddHeader("Vary", "Accept");
Conclusion
Developing now with XHTML 1.0 allows for forward compatibility but many developers only deploy websites with XHTML 1.0 in backwards compatibility mode (using the text/html MIME type). In order to get the most benefit from XHTML 1.0, developers need to properly consider the issues surrounding the application/xhtml+xml MIME type and implement content negotiation accordingly. It is both worthwhile and achievable and once the solution has been written it is available for re-use within your quality development framework.
References
Next in “From the Topâ€
Next week will (thankfully) be a shorter article (perhaps!) explaining why the HTML element is actually required.
Technorati tags: mime, content negotiation.
The Complete “From the Top” Series