An idea for multilingual webpages: the Content-Languages response header

Much content on the web is provided in multiple languages. Think about corporate websites, Wikipedia and various sites running on popular content management systems. The issue I would like to address is the display and selection of the languages the content is available in.

If you take a naive approach, on every page you visit you have to wonder:

  1. In what language is the content presented?
  2. Do I want to view the content in that language? (assume not)
  3. How do I notice the content is available in other languages?
  4. How do I view the content in another language?
  5. What quality is the content in other languages?

Assuming the user needs to read the content in a language he or she is adequate in, there arises a problem if this is not the case. As the Language Icon initiative tries to promote, especially point 3 and 4 need addressing: there are as many ways to do this as there are websites, and flags probably aren’t the right way to do this.

What I think is wrong with the “one icon” approach is that it still needs to be implemented on each multilingual website, and until this is a widespread practice, the language icon will not be recognized as such.

Now what if browsers were to implement a language selection user interface, so the browser and the webserver can negotiate about the content? Then one generic interface can solve the problem of language display (showing whether the content is known to be available in different languages, and if so which) and selection (do I have to click a Dutch flag, or do I need to read through a list of languages for “Dutch”, or perhaps “Nederlands” to read the page in my native language?).

The Google Chrome extension I threw together to demonstrate my idea displays two things, all based on HTTP request and response headers: the language the presented content is in and the languages it is also available in, and below that a list of user configurable preferred languages. If the user clicks any of the presented languages in the menu of the extension, the current tab will be re-requested with a modified Accept-Language header, solely specifying the language the user requested.

Accept-Language

While using the Accept-Language request header seems like a logical idea, it is rarely implemented by web applications. Web server Apache does show localized pages based on this header, most applications running on top of it don’t. So to succesfully create public support for this proposal, it’d be required for, for example, some major CMS releases to implement it.

Content-Languages

Besides showing what language the current content is in, a web application could also emit information about other languages the content is available in, using the Content-Languages header. It has the same syntaxis as the Accept-Language-header, so a typical request-response pair looks like this:

GET /test/content-languages HTTP/1.1
Host: apache.nas.local
Accept-Language: en-US,en;q=0.8,nl;q=0.6

HTTP/1.1 200 OK
Content-Language: en-US
Content-Languages: nl-NL,en-US,de-DE;q=0.8,fr-FR;q=0.5

Upon receiving this response the client knows the server has at least the four languages mentioned available and can display those to the user. When the server receives a request for any Accept-Language, it may display or redirect to that language when available. When receiving a request for multiple languages, the server should calculate the best match for the language, the algorithm for which is to be determined. It should at least take the QValue for each requested and present language in account.

The extension works like this:

When no Content-Language or Content-Languages headers are found, the extension’s icon remains grayed out. You can try to request the page with a different Accept-Language header, but on sites that don’t send a Content-Language header you can expect that to not work.

Wikipedia sends a Content-Language header, but it does not respond to Accept-language header. The subdomain (nl.) decides the language of the content. In order for this proposal to work, each Wikipedia-site has to know each other language the article is available in, and an URL to redirect to each language. This can in time perhaps be implemented by writing a custom crawler for the extension, because this information is present on each page (the list of languages in the bottom left menu).

Chrome by default sends some Accept-Language values according to your system locale. You can alter these settings under chrome://settings/languages. The PHP script has detected three languages, calculated a best fit and returned that in the same request. So the same resource, identified by the same URL, can be displayed in different languages without altering the path. It may of course use redirects though, in order to keep existing URL structuring alive, like using an /en(-US)/ subpath, or a de. subdomain.

When this information is known to the web application, it can return this in the aforementioned Content-Languages header. The extension reacts to this by displaying in the top menu which languages were presented and which QValue they have assigned: [0..1] becomes [0..100%].

As the user selects a different language, the server calculates the preferred language again and returns the content and appropriate headers with it:

The strength of this proposal is also its weakness: it has to be implemented by all browser vendors, or through add-ons, plugins, extensions or the likes. Websites have to be adjusted as well, to send the appropriate content-headers.

Also when content is delivered from various locations that aren’t synchronized in the means of knowledge of each others’ content (see Wikipedia, where the links to different languages of the same lemma have to be edited in the article itself, manually or by bots), content negotiation might become something you don’t want the web server to be responsible for. This I think can also be solved by using redirects, so the server can simply keep a key-value store using languages as keys for URL’s which can be updated externally.

So while I don’t pretend this to be the solution to all language problems, it sure can help (begin) tackle a few of them, and I’d love to hear your comments.

Download

You can view the Chrome Extension and a PHP script that demonstrates the behaviour on CodeCasterNL/content-languages · GitHub. You can load the extension in Chrome via Settings, Extensions, “Load unpacked extension…”.

This entry was posted in Tech. Bookmark the permalink.

Leave a Reply