Some ideas to improve tags use in social software
Flat hierarchy versus categories in social softwares
If you're using some social bookmarking web-application like del.icio.us or blogmarks.net; you may have used tags, keywords attached to a content as a metadata label. It's an easy, user-friendly, quick way to index a content for your own use and/or sharing it. It's a somewhat new feature in data indexation software, as far as i know, and there's a bit of buzz about it actually (see links at the end).
As opposed to the "old" way to index content, hierarchical categories, it's often seen as the cheap metadata for the masses, categories being the land of taxonomy experts. I do think that flat-hierarchy is more human brain-friendly, imitating the word-as-a-label-for-things way it works. It's probably the reason why tags have kicked in so strongly. (The more an user interface follows users thought schemas, the less obstacles it throws between the user and the actions to perform, the better it is) But this is also the main drawback of the tags: human language-structured thought can make hyperspace jumps between concepts, the same word can have several totally different meanings, and last but not least : each humain has its own world-experience, its own tagging-system. Not mentioning the language divides.
Where categories are managed by specialists in order to achieve the best classification, tags are users rough approximation of classification for a practical use. Here is a small proposal to improve flat hierarchy efficiency by patching it with some bits of categorization, while keeping the tags theirselves simple and natural; avoiding the use of categorized tags like technology>computer>programming
or web.design.css
as users tend to use to overcome tags limitations.
Tags are metadata, but are also data
- A map is not the territory. (Words are not the things they represent.)
- A map covers not all the territory. (Words cannot cover all they represent.)
- A map is self-reflexive. (In language we can speak about language.)
-- Alfred Korzybski, the role of language in perceptual processes
Tags are as natural to users as the use of language. But language has structures itself, and words are related to each other in a web (or webs ?). Why not add semantic relations between tags to flat hierarchy ? It would just be another depth of metadata, metadata over metadata. No need to change current tags systems :
- Users tag their content with whatever tags they think appropriate, just as actually.
- Shared tags between a community become a thesaurus, a
cloud
to be strucutured. - Community experts pick tag couples and define freely the appropriate semantic relation between them, like
synonym
,is-a-kind-of
,translation
,word-variation
etc. - Specialized features and search engines can be built, using tags semantic enhancements, improving content sharing.
- Communities could share their tags maps, to benefit from each other classification abilities, provided there is an accurate, flexible standard defined for tags structuration and exchange.
As a draft, let's imagine tags web structure as a many-to-many relation table linking tags like this :
- tag A (alphanumeric)
- tag B (alphanumeric)
- commutative (boolean) is the relation order-independant ?
- weight (numeric) if relevant, strength scale of the relation
- insert other relevant metadata here...
And for exchange... let's say some kind of xml-rpc web service standard to retrieve ponctual metadata on a per tag use or to import complete tag maps from a community, with/or content feeds. I think that there are some intersting potential web applications to be built on community, human-indexed content.
Some links
In no particular order :
- Problèmes des folksonomies
- Remove Forebrain and Serve: Tag Clouds II
- How do we overcome our tagging interface challenges?
- http://tagsonomy.com/
- http://www.technorati.com/
- http://www.flickr.com/
- a tag-based message board
Post-Scriptum, 2007-07-29 : I've copied this entry to my wiki so that people can reference and discuss it.
J'ai déjà lu cet article. Il y a deux idées en fait : raffiner les tags avec un préfixe (ce qu'il appelle les semtags), ce qui peut s'apparenter à de la catégorisation; et s'appuyer sur l'indexation humaine du contenu par tags pour améliorer les moteurs de recherche.
A ce sujet, il existe déjà un moteur de recherche de contenus taggés, qui a été créé par Stéphane Lee d'ailleurs. ;o)
We did wrote a wiki that implements all these ideas. A demo will be soon online + a paper presented at the "semantic wiki" workshop at the next ESCW 06 conference. Here is the abstract that presents our wiki :
Abstract. Wikis are social web sites enabling a potentially large number of par-ticipants to modify any page or create a new page using their web browser. As they grow, wikis suffer from a number of problems (anarchical structure, large number of pages, aging navigation paths, etc.). In SweetWiki we investigate the design of a wiki built around a semantic web server i.e.the use of semantic web technologies to support and ease the life cycle of the wiki. The very model of wikis was declaratively described using semantic web frameworks: an OWL schema captures concepts such as WikiWord, wiki page, forward and back link, author, date of modification, version, etc. This ontology is then exploited by an existing semantic search engine (Corese) embedded in our server: using RDF/S and OWL descriptions the engine solves SPARQL queries ranging from classic WikiWord resolution to the dynamic production of indexes or "see also" rec-ommendations. In addition, SweetWiki integrates a standard WYSIWYG editor (Kupu) that we extended to directly support semantic annotation following the "social tagging" approach made popular by web sites such as flickr.com or del.icio.us and by the technorati.com search engine. When editing a page, the user can freely enter some keywords in an AJAX-powered textfield. An auto-completion mechanism proposes existing keywords issuing SPARQL queries to identify existing concepts with compatible labels and shows the number of other pages sharing these concepts. With this approach, tagging is both easy (keyword-like) and motivating (real-time display of the number of pages linking to).Thus concepts are collected and used as in folksonomies. In order to main-tain and re-engineer the folksonomy, SweetWiki reuses web-based editors available in the underlying semantic web server to edit semantic web ontologies and annotations. Another distinctive feature of SweetWiki is its persistence mechanism: unlike other wikis, its pages are stored directly in XHTML thus ready to be served to browsers. Semantic annotations are located in the wiki pages themselves using the RDF/A syntax under specification at W3C. This embedded RDF is extracted using a standard GRDDL XSLT stylesheet thus providing semantic annotations directly to the semantic search engine. There-fore, if someone sends a wiki page to someone else the annotations follow it, and if an application crawls the wiki site it can extract the metadata and reuse them. SweetWiki also supports a powerful set of macros using JSP tags offered by the underlying semantic web server. These can be inserted directly at editing time for instance to include the result of a SPARQL query in a page and display it as a table with sortable columns. To summarize, the overall scenario is that of regular users just editing wiki pages and tagging them with keywords like they do using other tools. IT managers, editors or administrators check the folkso-nomy being built, look at the keywords and concepts proposed by the users and may (re)organize them, by for example adding new relationships (e.g. subClas-sOf, seeAlso).The annotations that users entered are not changed, but faceted navigation and search based on semantic queries are improved by these new links. The latest version of SweetWiki is currently being tested on a corpus of pages documenting the Java API.
The prototype is operational, let me know if you want to beta test it.
PS : I belong to CNRS Mainline research team, and I'm currently a visiting scientist in the ACACIA team of INRIA.
jabber