Categorisation - a few thoughts
Jan. 6th, 2005 07:53 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Various sites, including bookmarking site del.icio.us, email site GMail, photo site Flickr, and our very own Livejournal Photo Hosting allow users to categorise the items they store. This is generally done through the use of 'tags', allowing us to label photos as "Family", emails as "To-Do" and bookmarks as "Porn" so that we can find what we need when we need it.
Rather than impose arbitrary 'top-down' categories on users these sites allow us to define our own tags, to use the labels that we would naturally use, making it much simpler to use the filing system we create.
However, there are problems with allowing this. The tags can conflict both with other users (for instance one user could use "Mac" to refer to items related to Apple Macintoshes while another uses "Macintosh" and a third uses "Apple") and with themselves (when a user's nomenclature changes or they mistype). This can make sharing information difficult and even make it hard to find all of the information you've stored yourself.
There are a few obvious solutions to this:
1) Reuse: Help the user to re-use their old tags by offering them a list of previously used tags - this will prevent typos and unintentional changes.
2) Synonyms: Help users to lump tags together by stating that "Mac" and "Macintosh" mean the same, as far as they are concerned. When they look for tags in the same category as "Mac" the search will automatically be broadened to include similar ones.
3) Build categories from the most commonly used tags. This returns to the top-down imposition of structure, but builds it from the tags that people actually use. If a tag is used by more than x% of the population then categorise it and assign it a detailed description. For instance, if more than 1% of people are using "Mac" as a tag, then "Apple Macintosh Computer" could be assigned as a detailed description. Users could then choose to use the 'official' tag. Synonyms would also exist, so that "Macintosh" and "Apple" would both link to this single 'anchor'.
The use of more-defined descriptions would allow multiple meanings for the same tag to exist, so that someone using "Apple" as a tag could be offered the choice of attaching that tag to the definition "Apple Macintosh Computer", "Apple Fruit" or "Apple Music Corporation". The user could obviously also attach it to any other definition or leave it definitionless.
I am, of course, assuming that most people would find utility in using common definitions, as it would allow them to find things that used the same tags, whilst leaving them the freedom to use any tag they like for their own use.
Rather than impose arbitrary 'top-down' categories on users these sites allow us to define our own tags, to use the labels that we would naturally use, making it much simpler to use the filing system we create.
However, there are problems with allowing this. The tags can conflict both with other users (for instance one user could use "Mac" to refer to items related to Apple Macintoshes while another uses "Macintosh" and a third uses "Apple") and with themselves (when a user's nomenclature changes or they mistype). This can make sharing information difficult and even make it hard to find all of the information you've stored yourself.
There are a few obvious solutions to this:
1) Reuse: Help the user to re-use their old tags by offering them a list of previously used tags - this will prevent typos and unintentional changes.
2) Synonyms: Help users to lump tags together by stating that "Mac" and "Macintosh" mean the same, as far as they are concerned. When they look for tags in the same category as "Mac" the search will automatically be broadened to include similar ones.
3) Build categories from the most commonly used tags. This returns to the top-down imposition of structure, but builds it from the tags that people actually use. If a tag is used by more than x% of the population then categorise it and assign it a detailed description. For instance, if more than 1% of people are using "Mac" as a tag, then "Apple Macintosh Computer" could be assigned as a detailed description. Users could then choose to use the 'official' tag. Synonyms would also exist, so that "Macintosh" and "Apple" would both link to this single 'anchor'.
The use of more-defined descriptions would allow multiple meanings for the same tag to exist, so that someone using "Apple" as a tag could be offered the choice of attaching that tag to the definition "Apple Macintosh Computer", "Apple Fruit" or "Apple Music Corporation". The user could obviously also attach it to any other definition or leave it definitionless.
I am, of course, assuming that most people would find utility in using common definitions, as it would allow them to find things that used the same tags, whilst leaving them the freedom to use any tag they like for their own use.
no subject
Date: 2005-01-06 08:06 pm (UTC)no subject
Date: 2005-01-06 08:19 pm (UTC)XML
Date: 2005-01-06 11:50 pm (UTC)EDI has standard tags which are set by an authority (e.g. EDIFACT are set by UN teams, ANSI X12 is set by (unsurprisingly) the ANSI team (American National Standards Institute I think))
But there is not the same thing for XML (unless you count things like AS2, Rosettanet, Commerce One etc.) and so every trading partner comes up with their own tags for addresses, for describing items, for putting together packs/kits/sets, for identifying currencies and quantities etc.
But people are like ventriloquists dummies, they don't want to go back in the box :-) they don't like being fenced in, but when they then want to search on Google, they expect webpages to have the tags they would use, which is why some of us are better at internet searches than others, we can guess better what other people may have put on the pages we want to find.
Re: XML
Date: 2005-01-07 06:33 am (UTC)Are there open source EDI parsers?
That may have something to do with it too.
But we're not talking about EDI taxonomy, we're talking about non-domain specialists using them.
Re: XML
Date: 2005-01-07 09:02 am (UTC)An XML schema _could_ be a replacement for EDI. But you'd need an industry consortium to agree it, so it was consistent.
I don't know why people confuse XML and the different schema you can create using it.
no subject
Date: 2005-01-07 06:35 am (UTC)http://www.flickr.com/photos/tags/
no subject
Date: 2005-01-07 08:55 am (UTC)Oh, by the way
Date: 2005-01-07 08:57 am (UTC)Re: Oh, by the way
Date: 2005-01-07 09:00 am (UTC)People who think there should be a perfect labelling system don't understand ontology.