Metadata

Mar. 17th, 2004 12:11 pm
andrewducker: (Default)
[personal profile] andrewducker
You may have noticed that you're drowning in data at the moment. Your hard drive is full of it, your inbox is full of it, you almost certainly get swamped by a huge wave of it from the internet every time you open your browser, newsreader or RSS-reader. Sure, you're interested in some of it, but how do you find the bits you want?

The answer is metadata.

Metadata is the data that's associated with your data. It varies from the very simple (the filename of your picture) to the extremely complicated (the relationships held in customer relationship management systems) via things like ID tags for MP3 files.

If you use any kind of halfway decent MP3 player then you can search for songs by artist, album and track. If you use Windows Media Player you can also rate songs and then search by rating. There are programs that allow you to search by mood as well, or by beat frequency. If you think of your music collection as a big mishmash of data, the metadata allows you to slice through it any way you like, turning it this way and that until you find the songs you want.

The same cannot be said for your photo collection. If you want to find the photo of uncle Bob at Jane and Guy's wedding last year, unless your photo collection is carefully sorted into folders so that you can go to pictures\2003\Weddings\JaneAndGuy\UncleBob3.jpg, you're never going to be able to find it without trawling through dozens of files called MyPic0001.jpg.

And if you want to look at all photos taken at weddings, or all photos taken of Uncle Bob, you're shit out of luck. Unless, of course, you have metadata! Which is where Microsoft's Next Gen file system comes in. It will allow you to tag any file with huge swathes of information - where it was taken, when it was taken, who took it, who's in it, what they're doing, why they're doing it, etc. This will allow you to instantly find every photo of you taken between 2001 and 2003 where you were drunk and happy. Now isn't that a great use of technology?

Which is where the conversation started with [livejournal.com profile] whumpdotcom last night. We basically agreed that metadata isn't going to work. Because when you've taken 30 pictures, the last thing you want to do is sit down for half an hour and apply a dozen keywords to each one. Let's put it another way - you just aren't going to do it. If you're lucky you'll call each one something halfway meaningful like UncleBob.jpg and just assume you'll be able to find it later. And you're pretty smart aren't you? Imagine what the _average_ person will do. He'll never find his files, because they'll still be called myPic001.jpg.

The answer, of course, is a Wizard that pops up whenever you upload a new photo and asks you a dozen questions about your photo. "Hi! It looks like you're trying to save a photo. Would you like to categorise it?" "First question - is it porn?", because nobody likes a disorganised porn collection. But how many times will people be willing to go through that process before they click the "never speak to me again, you evil paperclip from hell!" button?

Of course, it's just a photo collection. You'll either be organised, or you won't, and it doesn't really matter that much, does it? But what happens when the metadata is being applied to business documentation? Are we going to be able to search for "All specifications for the Bumstead Project that impact on regulatory requirements"? Or are we still going to be pressing F3 and searching files for the forseeable future?

On a larger scale - what about the Semantic Web? This takes the idea of metadata and applies it to the wole of the Intarweb. Which would be great, wouldn't it? You'd be abble to ask meaningful questions and get back reasonable results that contained the answers all neatly categorised and sorted.

If only someone would go through the internet and categorise every web page with details of what it does, what it applies to, who wrote it, when it was written, why it was written and what it might be useful for. And then keep that information up to date.

Any volunteers?

Anyone?

Date: 2004-03-17 12:30 pm (UTC)
From: [identity profile] tisme.livejournal.com
Jane and Guy got married? Lesbians mating with romantics-in-denail. Where's the reality tv crew?

Date: 2004-03-17 12:31 pm (UTC)
From: [identity profile] drainboy.livejournal.com
That might be tricky, but only until we have some decent methods of autogenerating meta-data, like object recognition to track down those photos of uncle bob and a bit of intelligent partioning to realise that if photo X was taken on June 18th 1994 and so was picture Y and picture X is of Bob's wedding then picture Y probably is as well (until you tell it differently and then it recategorises all other objects similarly categorised via the same presumed meta-data). Of course if you can figure out context as well (picture with some subset of the K people so far known to be at the wedding) then it will get it right even more often.

To be honest we're not _that_ worried about precise categories, especially if you can track down one photo that's kind of close you can sort on the meta data of that photo i.e. which camera/time/GPS position it had. So you find one picture with three people you remember from the wedding then you can track the other wedding photos down by different meta data on the one photo you have.

You just need a foot in the door, which is what it appears MS is about to give us. Of course that means you can track down all the pictures of uncle bob on the internet, including the regretful ones when he tried to break into Hollywood. They promised to destroy the footage, he swears they did...

Date: 2004-03-17 12:55 pm (UTC)
From: [identity profile] chillies.livejournal.com
unless your photo collection is carefully sorted into folders so that you can go to pictures\2003\Weddings\JaneAndGuy\UncleBob3.jpg, you're never going to be able to find it without trawling through dozens of files called MyPic0001.jpg.

That's exactly the problem. Those directory names are a subset of the metadata one would want for a photo. And we all know how jumbled directories get when one's not paying attention ...

Date: 2004-03-17 12:56 pm (UTC)
ext_52479: (Default)
From: [identity profile] nickys.livejournal.com
> If only someone would go through the internet and categorise every web page with details of what it does, what it applies to, who wrote it, when it was written, why it was written and what it might be useful for. And then keep that information up to date.
Any volunteers?
Anyone?


Yes, there's me for a start.

Well, actually I wouldn't volunteer, but I am prepared to do it since that's what they pay me for...

Date: 2004-03-17 01:03 pm (UTC)
From: [identity profile] bohemiancoast.livejournal.com
We're just about to buy iView Media Pro. It doesn't solve any of this, but it helps a whole lot. It dates based on the EXIF data (which you can amend to some extent), so as long as your digicam doesn't say 01-01-98, your photos are sorted by date, which helps a lot. You can assign various other labels -- and although it's still a pain, it is tons easier to select 70 photos and drag them all at once onto the label 'Tony and Jean's wedding' than it is to edit keyword data in (eg) iPhoto. It also manages sensibly sized catalogues (up to 128k items) with a choice of thumbnail sizes. Some labels (date, event) are exclusive, others (people) can have multiples assigned to them. As well as the formal metadata, it allows you to separately organise photos into sets, eg 'good photos' or '3d anaglyph slideshow'. It's not just for photos either, though all we have in our catalogue is photos and digicam movies. When I've finished the photos I'm proposing to set up a second catalogue for my photoshops -- a much trickier cataloguing proposition, but I'm pretty sure it's do-able.

iView Media Pro is expensive for a cataloguing program. It has a cheaper, less featured version, which is also available for Windows, unlike the bells-and-whistles edition.

As for document storage and metadata, it's not quite a solved problem, but there are companies who will do much of what you want for an enormous fee. I believe the intelligence agencies are very big customers.

Date: 2004-03-17 02:16 pm (UTC)
From: [identity profile] nirikina.livejournal.com
I'd do it if I got paid to do it. What, surf the web and get paid for it? Hell yeah!

Date: 2004-03-17 03:41 pm (UTC)
From: [identity profile] octopoid-horror.livejournal.com
Is there not a problem with this in the same way as the "meta" html tag is overused, or the way that porn sites put lists of keywords for search engines to tag.

If this was to work, then it would need to avoid sites that were trying to catch its attention.

For instance, if you type "trendylabelthatispopularandcoolandexpensive" into ebay... you get a lot of entries that have "NOT trendylabel.. etc" almost as if they're trying to make you find their item..

and similarly, if you type in "insert name of popular pornstar" when you just happen to want to read their personal journal because you care about their life, you might get a lot of other sites that have nothing to do with that pornstar... they just think "hey, this guy likes porn, if we lure him here, I bet he'll stay"

and so forth.

so if I typed "pictures of Nick getting drunk", the evil site of AlcoholManufacturer-X might lure me in by having enough key phrases that any search about being drunk would bring up their site, since if you were often drunk, you might need a drink...

Date: 2004-03-18 03:02 am (UTC)
From: [identity profile] hitchhiker.livejournal.com
I've been thinking about this too - the best I've come up with so far is to hook in voice recognition, so that the user can speak a word or two of metadata into the system, preferably when he creates/saves the file. Sort of analogous to tossing your secretary a sheet of paper and telling him/her where to file it. For photos, the system could bring the pictures up one by one in a slideshow, and you could tell it what each one was - could easily do that at 10-30 snaps a minute, which is not too shabby

Date: 2004-03-20 02:33 pm (UTC)
From: [identity profile] allorin.livejournal.com
Just wanted to let you know I ignored this.... ;+)

October 2025

S M T W T F S
    1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 1718
19202122232425
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 18th, 2025 03:09 am
Powered by Dreamwidth Studios