From the Top: Defining Content Language
January 23rd, 2006
The opening html
tag when expanded with a few attributes is a real boon for increasing the accessibility of your web pages. In this the third article in my “From the Top” series, I’ll introduce each of those attributes and explain the benefits of their use.
- The lang Attribute
- Language Tags and Localisation
- Reading Order
- Why it’s Important to Define the Language
- The xmlns Attribute
- Conclusion
- References
The lang Attribute
Declaring the language of a web page occurs at two levels. Firstly, the primary language of a document may be used by search engines to return only web pages in a specific language and is declared in either the Hypertext Transport Protocol (HTTP) header or as a content-language
meta
tag. Access to server settings or ability to perform content negotiation will affect your decision on which method to use.
The second, and more specific level defines the default text-processing language a specific range of text is written in. For Hypertext Markup Language (HTML) this is achieved using the lang
attribute and for Extensible Markup Language (XML), in our case as Extensible Hypertext Markup Language (XHTML) using xml:lang
. Where both are present (i.e. in backwards compatibility mode) the xml:lang
takes precedence.
For both flavours of markup the text-processing or natural language is inherited along the document hierarchy so to apply your main language to the entire document the html
tag is ideal. The default text-processing language can be changed further along the heirarchy by applying the lang attribute to a more specific element.
Language Tags and Localisation
The value of the lang
(or xml:lang
) attribute is referred to as a language tag. It comprises the primary subtag optionally followed by further subtags separated by a hyphen. Language tags use two or three letter language codes such as en
for English, de
for German and fr
for French. in such cases where a two and three letter code exists for the same language, the two letter code should be used. By including a subtag the natural language of the document can be localised further for dialect or region so en-GB
would identify British English text and fr-CA
would indicate content written in Canadian French. Subtags are case-insensitive. There are special-case primary subtags of i-
and x-
but these will be outside your normal usage (unless you have a killer site for Klingons that is) so I will leave those for you to look at another day.
Reading Order
It is easy to forget that whereas “western” languages are read from left to right, there are also major languages such as Chinese and Arabic that read in the opposite direction. The reading order is not necessarily inherited from the chosen language tag so we will add the dir
attribute to the html
tag and assign the value “ltr” to it. For languages read from right to left the attibute value would be “rtl”. These are the only two options, imagine the fun to be had with “ttb” (top to bottom) for authentic Japanese writing (and yes, “rtl” also).
Why it’s Important to Define the Language
Declaring the text-processing, or “natural” language of a page is beneficial for many purposes:
- To assist screen readers and braille translators.
- To meet World Wide Web Consortium (W3C) Web Accessibility Initiative (WAI) guidelines - specifically checkpoints 4.1 and 4.3.
- To meet legislative requirements, for example the Disability Discrimination Act (DDA) in the UK.
- To provide authoring tools with the ability to check spelling and grammar.
- To identify the correct language of a section of text for translation tools.
- To style information in a specified language using the Cascading Style Sheets (CSS)
:lang
pseudo class. - To filter search engine results based on the user’s language preference.
- To assist the parsing of the text of the document with XSL or some other scripting by other people / devices.
The xmlns Attribute
If your markup is XHTML another attribute you must include is the xmlns
declaration for the XHTML namespace. Remembering that XHTML is a reformulation of HTML as an application of XML, an XML namespace is a collection of names, identified by a Universal Resource Indicator (URI) reference, that are used in XML documents as element types and attribute names. You need to declare the namespace so that a user agent knows which elements belong to which language. The namespace is declared using the attribute xmlns
followed by the URI, which for our purposes is http://www.w3.org/1999/xhtml
.
Conclusion
To maximise the universal accessibility of our pages we should always include language information in our pages. We can identify the natural language of the content by using the lang
attribute and/or the xml:lang
attribute for XHTML and must always include the XML namespace if using XHTML. Additionally, we can specify the primary language of the document using HTTP headers or the content-language
meta
tag. Examples of the opening html
tag include:
For XHTML 1.0 in backwards compatibility mode:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
For XHTML as application/xhtml+xml
:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr">
For HTML 4:
<html lang="en" dir="ltr">
References
- W3C: Specifying the Language of Content
- W3C: FAQ: Why Use the language attribute?
- W3C: HTML and XHTML Frequently Asked Questions
- W3C: Language Tags in HTML and XML
- W3C: Normative Definition of XHTML 1.0
- W3C: Namespaces in XML
Next in “From the Topâ€
Next week I’ll cover the title tag and provide a few tips from around the Internet to writing effective page titles.
The Complete “From the Top” Series
- Document Type Definitions.
- MIME and Content Negotiation.
- Defining Content Language — this article.
- The Head Element.
- The Title Element.
- The Meta Element.
- The Link Element.
It’s important to understand that there is a distinct difference between the meaning of the
Content-Language
HTTP header and thelang
andxml:lang
attributes. The HTTP header indicates the language of the intended audience, whereas the attributes indicate the actual language of the content. In most cases, the langauge of the document and the intended audience language will be the same, but not always. For example, a tutorial comprsing predeominately French text for teaching to an English speaking reader may containlang="fr"
, but be served withContent-Languge: en
.January 23rd, 2006 at 10:03 am
One more thing, it doesn’t make any sence to discuss
xmlns
in an article about natual languages. Can you explain why that was included?January 23rd, 2006 at 10:05 am
Thanks Lachlan, The article is about writing a complete tag. If you are writing a document in XHTML then you need to include the xml namespace too. I added it for completeness.
I am also trying to protect novice or busy developers from the really technical details by pitching a concise summary of “what”, “why” and “how”. The references can then lead the keen into the technical background.
January 23rd, 2006 at 10:18 am
A couple of comments, in addition to Lachlan’s:
The dir=”ltr” on the html tag is redundant. By default a x/html document is “ltr” unless otherwise specified.
An interesting chase for language tags, are Unicode documents containing hanzi/kanji/hanji (CJK ideographs). Simplified Chinese, Traditional Chinese, japanese and Korean can share a codepoint, but have different glyph variations for the character represented by that glyph. If there is no appropriate font declaration in the style sheet, and no language tagging, then web browsers will use a default language for rendering. This may result in an incorr3ect font being used, e.g. a Japanese font used to render Simplified Chinese text. Appriopriate langauge tagging of CJK text is important in order for culturally appropriate glyphs to be used.
January 23rd, 2006 at 10:53 pm
The use of the letter direction isn’t always needed as it can be implied by DTD, charset, language, etc. But I don’t feel its use is a boo-boo per se. Being redundant isn’t bad until its redundancy interferes with something. In this case adding it to this article is good “general” advice, especially since this article isn’t confined to XHTML alone (or it is and I missed that part).
January 25th, 2006 at 6:44 pm
Andrew / Mike,
Yes, I can’t remember the source of my opinion on including dir=”ltr” it goes back at least a year I think, it must have been based on something more than redundancy of function but if it’s not in my del.icio.us I’m doomed to forget. In the research around this article, I didn’t even discover the default behaviour info as obviously that would have changed my mind immediately. Thanks for posting about it, I’m always ready to modify an opinion I may hold about something.
Mike: No, you didn’t miss anything - this article caters for both markup languages.
January 25th, 2006 at 7:38 pm