No Invisible Metadata

Yesterday I tweeted (link expanded):

Google officially ignoring keywords meta tags is a nice validation of @microformats “no invisible metadata” principle: http://tantek.com/log/2005/06.html#d03t2359

The link is to a 2005 post on Tantek Çelik’s blog where he expands on the microformats principle that “visible data is much better for humans than invisible metadata.” Google’s announcement the other day that they do not use the keywords meta tag in web rankings didn’t surprise anyone that knows anything about search engine optimization. We’ve known for years that Google ignores the keywords meta tag (Google’s Webmasters/Site owners Help has a page about various meta tags that doesn’t say anything about the keywords meta tag) but, until now, it’s never been official. Still, I think it’s a nice validation of the principles of microformats and will hopefully give people pause when considering hidden metatdata schemes in the future.

On a silo website or within a trusted network, hidden metadata can be useful. In fact, in Google’s announcement they mention that the Google Search Appliance has the ability to match on the keywords meta tag. At web scales, hidden metadata is critically flawed. How can you trust that the hidden metadata is in parity with the visible data? The hidden metadata may be intentionally inaccurate (e.g. keyword stuffing) or simply have fallen out of sync with the visible data. Within a silo website or a closed network you can trust the metadata to be true to the visible data it describes and you can enact policies to keep your metadata up-to-date. However, there are no trust models yet that would make this work at web scales.

Microformats are designed for “humans first, machines second” (another principle). This makes a lot of sense since all machines eventually serve humans, even if indirectly through many layers. If the machine doesn’t ultimately serve a human need then there is not much point in the machine’s existence (unless we are taking about sentient artificial intelligence). For direct human consumption, hidden metadata is completely useless. For machine consumption, hidden metadata can be useful. However, hidden metadata must at some point be transformed into visible data, even if in a completely different context than its associated visible data.

This was the case with search engines that did use the keywords meta tag in rankings: the original context for the visible data was the indexed document and the new context in which the hidden metadata was transformed into visible data was the rankings of engine results. As history tells us, this scheme didn’t work so well. Instead, Google used the visible data that is hyperlinks to determine rankings. From Tantek’s blog entry (emphasis added):

Lesson learned: hyperlinks, being visible by default, proved more reliable and persistently accurate for many reasons. Authors readily saw mistakes themselves and corrected them (because presentation matters). Readers informed authors of errors the authors missed, which were again corrected. This feedback led to an implied social pressure to be more accurate with hyperlinks thus encouraging authors to more often get it right the first time. When authors/sites abused visible hyperlinks, it was obvious to readers, who then took their precious attention somewhere else. Visible data like hyperlinks with the positive feedback loop of user/market forces encouraged accuracy and accountability. This was a stark contrast from the invisible metadata of meta keywords, which, lacking such a positive feedback loop, through the combination of gaming incentives and natural entropy, deteriorated into useless noise.