andrewducker: (wanking)
[personal profile] andrewducker
When I encountered the online tagging of data (first on del.icio.us, then on LJ) it really brought brought it home to me that most ontologies do not fit into heirarchies or simple groups. When it comes to most of the things we deal with in day to day life, what we actually have are interlocking sets of observations and categorisations that are best described using a series of independent labels.

However, I think the _next_ stage is going to be the really tricky (and interesting) bit - some of the labels can themselves be part of heirarchies, or synonyms for other labels. I have no wish, for instance, to tag something as being a "short film" and a "film", when short films _are_ films. But current tagging technology doesn't allow me to set up that link. Nor does it allow me to say that things tagged with "humor" by Americans are equivalent to things tagged as "humour" by brits. What we currently have is a way to produce a mish-mash of data, with some interesting trends in it. The ability to construct thing from it will make it a lot more powerful.

Date: 2006-06-30 08:14 am (UTC)
From: [identity profile] sbisson.livejournal.com
That's the problem with so-caled "folksonomies". Of course, things get worse when you consider the mutability of our experience of the world...

Date: 2006-06-30 08:23 am (UTC)
From: [identity profile] sbisson.livejournal.com
I tend to thik that they don't. I find myself looking at my tags on Flickr or on Del.icio.us or LJ and thinking: "What the hell did I mean by *that*?".

It's not the obvious tags, like place or some commonly shared hierarchy, it's the personal ones that fail...

Date: 2006-06-30 08:39 am (UTC)
From: [identity profile] figg.livejournal.com
Re: Humour/humor
Word stemming is a mostly solved problem for english, there is even some free code in the indexing server Xapian[1]. We could even use soundex to group similar items.

Ideally, we imagine the tag space as a metric space, and define a metric to see how related two items are based on a variety of categories. (Although, searching in a metric space is still an ongoing problem, but it's a nice mathematical model of the space all the same).

Re: (short) film.
So, we need adjectives in tags then.

Tags are a bad solution to the problem of data organization. They are simple to implement though, and easy to use. I would rather have full text indexing, and automatic extraction of metadata than tags.

[1] http://www.xapian.org/
[2] http://en.wikipedia.org/wiki/Soundex

Date: 2006-06-30 11:31 am (UTC)
From: [identity profile] figg.livejournal.com
Firstly: Manual tagging doesn't solve any of the problems you mentioned either

Full text indexing isn't the same as every word is a tag, as word order is important.

Automatic tagging should be user defined and extensible. (Perhaps) In the same way you can train a spam filter.

With regards to the dog/tara - tara is a statistically improbable phrase (Unlike 'and' they' 'blog' etc), and dog isn't that common. So with automatic relations, you can see that tara and dog come up together frequently, and thus a search for one may incurr a search for another.

When I said full text indexing, I meant to say indexing of all the data, including metadata. (Like Date, Time, GPS, etc). To me, this includes correlations too. How often words appear together,

Anyway:
This goes back to the metric space idea. You can define objects and their relation. (I.e the distance between them). You could probably do this sort of grouping for tags easily too.

Ideally, I would like a situation where I have to do as little organisation as possible, and use the pre-existing relationships between files. (From same device, created at the same time, same type , emailed to same person, etc).

Contrived example: Photos 1-10 have a date embedded in them. They are all close to each other with a small distance. A party in the calendar, also has a date attached to it.

You search for the party, it finds the photos.

I know I will have to add relations at some point, but I'd rather the computer did most of the work for me.

Date: 2006-06-30 09:33 am (UTC)
ext_5856: (Default)
From: [identity profile] flickgc.livejournal.com
Hmm... I actually thought that del.icio.us could cope with humour/humor, but I must have been wrong!

Date: 2006-06-30 10:19 am (UTC)
From: [identity profile] ninox.livejournal.com
Now you have hit on some of the core issues of cataloguing. There is no perfect ontology. The american and british thing is a big problem in my field of work but we do have some software that can deal with it (only some). I have a filter for search on the topic of children that is around 36 lines of synonyms (adapted for spelling incongruities). Someone should teach the Americans how to spell . This is the point I should start arguing semantics, punctuation, see also's etc. I am not a catloguer. I have the opinion that life is too short. Especially after meeting several people fustrated at their own subject, that set out to improve the systems and years later have gained very little ground.

Language is a dynamic entity. It evolves, it changes and people use it with different perceptions (american buzzard = british vulture). LJ is something that is dependant on natural language rather than controlled vocabularies, attributing any decent form of tagging sturcture is a herculean task.

Date: 2006-06-30 11:53 am (UTC)
From: [identity profile] surliminal.livejournal.com
Isn't this basically the same problem I askeda while back about the semantic web - about there being no one ontology to rule em all?

Or , life is P2P not top down.

Date: 2006-06-30 12:50 pm (UTC)
From: [identity profile] octopoid-horror.livejournal.com
"Nor does it allow me to say that things tagged with "humor" by Americans are equivalent to things tagged as "humour" by brits"

*makes the obvious comment that they are not, necessarily, equivalent*

I've seen quite a few Americans online using the phrase "British humor"... you'd think they could just type "humour" and use the different spelling to give the word a more specific meaning.

Date: 2006-07-01 03:32 am (UTC)
From: [identity profile] garunya.livejournal.com
That doesn't quite work, because there is also "Canadian humour", "Australian humour", and I'm sure a few more...

Date: 2006-06-30 12:54 pm (UTC)
From: [identity profile] octopoid-horror.livejournal.com
I've never browsed through LJ Tags much. I don't use them in the generally accepted way, so it's not something I'm curious about.

Flickr tags have useful "clusters" so if you search for something then as well as coming up with anything tagged with that, it can suggest clusters within that search. Searching for "school", for example brings up one cluster that's pictures of kids, one that's pictures of buildings, one that's pictures of buses, and one that's pictures to do with universities/colleges..

Date: 2006-07-01 04:54 am (UTC)
From: [identity profile] stillcarl.livejournal.com
That's partly the key, I think. One person's tags will only be as organised as that person is, but large groups of people should throw up common usages.

flickr's clusters for 'dog' does throw up 'animal'...

http://www.flickr.com/photos/tags/dog/clusters/

and ditto 'dog' for the 'animal' clusters...

http://www.flickr.com/photos/tags/animal/clusters/

I don't know if flickr does it, but given the above clusters it'd be possible for software to search an individual's photos for 'animal' and find their dog photos without them having an 'animal' tag against any of them.

February 2026

S M T W T F S
1 2 3 4 567
891011121314
15161718192021
22232425262728

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 5th, 2026 12:26 pm
Powered by Dreamwidth Studios