Common Tag: as useless as Meta Keywords?

For as long as I can remember, the semantic web has been like strong AI, nuclear fusion, or the Moller Flying Car – always a year or two away, but never actually here.  Despite the best efforts of some of the brightest people on the planet, universally understandable information is always just out of reach.  Machines and the algorithms that they run are simply not advanced enough to understand the nuances of language.

Getting the point across…

The problem is always context.  In a binary world, there is no simple way to differentiate between what a word like “bass” relates to.  In isolation, “Bass” could mean the guitar, the fish, the British Alpine Ski School, the musical register, or the brewer.  Within a web page there are clues to meaning.  If the page includes the word “beer” then the chances are that “bass” refers to the brewer; however there is still a chance that the subject is an alcoholic fish.

While there is no doubt that search engines are getting smarter at “understanding” the meanings of web pages based on their content, this is not an exact science.  We often see results cropping up in Google that defy logic.  We are still some way from machines that can read at anything above the most basic level.

Things are getting better.  Google recently announced that they would be starting to incorporate the richer information delivered by using microformats such as the vCard standard into their results so that an address or phone number could be extracted from a web page and displayed in the results.  This is helpful when trying to find a business or using a mobile device to browse the web; however the technology is inelegant and takes time – which means it is expensive and people don’t use it.

A better solution on the horizon?

Yahoo and some of their partners have released an alternative solution for general content. Common Tagging is designed to be used at a page level rather than highlighting specific data, and essentially requires the addition of a few lines of code into a web page that link to information about the subject in Freebase.  Going back to our “bass” example, if I wanted to indicate that my page was about the fish, I would need to use the following code to surround the content on the page:

<body xmlns:ctag="http://commontag.org/ns#" rel="ctag:tagged">
<span typeof="ctag:Tag" rel="ctag:means"
resource="http://rdf.freebase.com/ns/en.fish"
property="ctag:label" content="Bass"/>
CONTENT ABOUT THE BASS FISH GOES HERE
</span>
</body>

Despite the fact that the technology has been led by Yahoo with their small and dwindling market share, it shows promise.  Ease of implementation is essential if a new technology is going to have any real impact, and adding a couple of lines of code is much easier than the kind of complex mark up required for a vCard. It can even be integrated into publishing platforms like Wordpress, which takes almost all the effort out of the process.

In theory, common tagging should make it easier for search engines to categorize content, and provide users with more relevant results based on an authority source.

Haven’t we seen this before?

Search marketers have been doing something similar for a long time using Meta Keywords to provide information about a page.  The problem is that SEOs have been abusing it for years to the point where it is barely considered by the major search engines.

In fact, you name pretty much any development from the search engines over the last decade, and the SEO community has found a way to exploit it fairly quickly in order to make websites rank.

So, will it work?

The Common Tag, like all the other RFDa tags and microformat protocols is doomed to the same fate as the Meta Tag.  It relies on a webmaster’s good behaviour for implementation, so it’s open to abuse. Soon enough, it will be discovered that by adding content about Vanessa Hudgens to a page tagged “High School Musical” it is possible to rank adult material quickly in Yahoo.  That’s when it all goes wrong.

The Common Tag like all xml formatting of data is no more than a step on the way to genuine semantic indexing of content.  That will require computers with a processing power an order of magnitude above what is currently available.  I for one forecast that the technology for this is still a  year or two away :-).

2 Comments

Share this post

Sphinn   StumbleUpon   Reddit   Del.icio.us   Twitter   Digg

RELATED ARTICLES

ADD A NEW COMMENT

At 16:29

Jamie Taylor

James - I appreciate your coverage of Common Tag and think your explanation of the of the benefits from strong semantics are exactly right.

As part of the group that helped put Common Tag together I thought I would add two points of clarification. It was important to us that Common Tag could not only be used at the page/document level, but also at the level of a document fragment. Because Common Tag uses RDFa you can easily tag any part of the DOM that provides an “id” attribute (see the section “Defining Tags for sections of content” in the Quick Start Guide: http://commontag.org/QuickStartGuide.)

I also agree that tag spam has been a huge problem in the past. Because Common Tag connects content to strongly identified semantic concepts, the structured data provided by the reference (as available in Freebase, for example) allow tag harvesters (Search engines, RDFa applications, etc.) to easily determine if the concepts described by the Common Tags are appropriate in the context of the document. Something that was not possible with simple key word tagging.

-Jamie

Hi Jamie
Thanks for commenting on the post, and I appreciate the clarification that you made regarding tagging at a fragment level of the document.
I do believe that strong semantics are the way to go in terms of demonstrating the underlying concepts of a web page to search engines to help them better understand the content is important for users.
I think that the common tag itself and the underlying structure of relating the content back to the Freebase authority is fairly robust, however my nagging doubts come from the fact this relationship could in some way legitimise unsuitable content and allow it to be presented inappropriately.

FOLLOW DIGITAL MARKETING MATTERS

LAST LATITUDE TWEETS