![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
You may have noticed that you're drowning in data at the moment. Your hard drive is full of it, your inbox is full of it, you almost certainly get swamped by a huge wave of it from the internet every time you open your browser, newsreader or RSS-reader. Sure, you're interested in some of it, but how do you find the bits you want?
The answer is metadata.
Metadata is the data that's associated with your data. It varies from the very simple (the filename of your picture) to the extremely complicated (the relationships held in customer relationship management systems) via things like ID tags for MP3 files.
If you use any kind of halfway decent MP3 player then you can search for songs by artist, album and track. If you use Windows Media Player you can also rate songs and then search by rating. There are programs that allow you to search by mood as well, or by beat frequency. If you think of your music collection as a big mishmash of data, the metadata allows you to slice through it any way you like, turning it this way and that until you find the songs you want.
The same cannot be said for your photo collection. If you want to find the photo of uncle Bob at Jane and Guy's wedding last year, unless your photo collection is carefully sorted into folders so that you can go to pictures\2003\Weddings\JaneAndGuy\UncleBob3.jpg, you're never going to be able to find it without trawling through dozens of files called MyPic0001.jpg.
And if you want to look at all photos taken at weddings, or all photos taken of Uncle Bob, you're shit out of luck. Unless, of course, you have metadata! Which is where Microsoft's Next Gen file system comes in. It will allow you to tag any file with huge swathes of information - where it was taken, when it was taken, who took it, who's in it, what they're doing, why they're doing it, etc. This will allow you to instantly find every photo of you taken between 2001 and 2003 where you were drunk and happy. Now isn't that a great use of technology?
Which is where the conversation started with
whumpdotcom last night. We basically agreed that metadata isn't going to work. Because when you've taken 30 pictures, the last thing you want to do is sit down for half an hour and apply a dozen keywords to each one. Let's put it another way - you just aren't going to do it. If you're lucky you'll call each one something halfway meaningful like UncleBob.jpg and just assume you'll be able to find it later. And you're pretty smart aren't you? Imagine what the _average_ person will do. He'll never find his files, because they'll still be called myPic001.jpg.
The answer, of course, is a Wizard that pops up whenever you upload a new photo and asks you a dozen questions about your photo. "Hi! It looks like you're trying to save a photo. Would you like to categorise it?" "First question - is it porn?", because nobody likes a disorganised porn collection. But how many times will people be willing to go through that process before they click the "never speak to me again, you evil paperclip from hell!" button?
Of course, it's just a photo collection. You'll either be organised, or you won't, and it doesn't really matter that much, does it? But what happens when the metadata is being applied to business documentation? Are we going to be able to search for "All specifications for the Bumstead Project that impact on regulatory requirements"? Or are we still going to be pressing F3 and searching files for the forseeable future?
On a larger scale - what about the Semantic Web? This takes the idea of metadata and applies it to the wole of the Intarweb. Which would be great, wouldn't it? You'd be abble to ask meaningful questions and get back reasonable results that contained the answers all neatly categorised and sorted.
If only someone would go through the internet and categorise every web page with details of what it does, what it applies to, who wrote it, when it was written, why it was written and what it might be useful for. And then keep that information up to date.
Any volunteers?
Anyone?
The answer is metadata.
Metadata is the data that's associated with your data. It varies from the very simple (the filename of your picture) to the extremely complicated (the relationships held in customer relationship management systems) via things like ID tags for MP3 files.
If you use any kind of halfway decent MP3 player then you can search for songs by artist, album and track. If you use Windows Media Player you can also rate songs and then search by rating. There are programs that allow you to search by mood as well, or by beat frequency. If you think of your music collection as a big mishmash of data, the metadata allows you to slice through it any way you like, turning it this way and that until you find the songs you want.
The same cannot be said for your photo collection. If you want to find the photo of uncle Bob at Jane and Guy's wedding last year, unless your photo collection is carefully sorted into folders so that you can go to pictures\2003\Weddings\JaneAndGuy\UncleBob3.jpg, you're never going to be able to find it without trawling through dozens of files called MyPic0001.jpg.
And if you want to look at all photos taken at weddings, or all photos taken of Uncle Bob, you're shit out of luck. Unless, of course, you have metadata! Which is where Microsoft's Next Gen file system comes in. It will allow you to tag any file with huge swathes of information - where it was taken, when it was taken, who took it, who's in it, what they're doing, why they're doing it, etc. This will allow you to instantly find every photo of you taken between 2001 and 2003 where you were drunk and happy. Now isn't that a great use of technology?
Which is where the conversation started with
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
The answer, of course, is a Wizard that pops up whenever you upload a new photo and asks you a dozen questions about your photo. "Hi! It looks like you're trying to save a photo. Would you like to categorise it?" "First question - is it porn?", because nobody likes a disorganised porn collection. But how many times will people be willing to go through that process before they click the "never speak to me again, you evil paperclip from hell!" button?
Of course, it's just a photo collection. You'll either be organised, or you won't, and it doesn't really matter that much, does it? But what happens when the metadata is being applied to business documentation? Are we going to be able to search for "All specifications for the Bumstead Project that impact on regulatory requirements"? Or are we still going to be pressing F3 and searching files for the forseeable future?
On a larger scale - what about the Semantic Web? This takes the idea of metadata and applies it to the wole of the Intarweb. Which would be great, wouldn't it? You'd be abble to ask meaningful questions and get back reasonable results that contained the answers all neatly categorised and sorted.
If only someone would go through the internet and categorise every web page with details of what it does, what it applies to, who wrote it, when it was written, why it was written and what it might be useful for. And then keep that information up to date.
Any volunteers?
Anyone?
no subject
Date: 2004-03-17 12:30 pm (UTC)no subject
Date: 2004-03-17 12:31 pm (UTC)To be honest we're not _that_ worried about precise categories, especially if you can track down one photo that's kind of close you can sort on the meta data of that photo i.e. which camera/time/GPS position it had. So you find one picture with three people you remember from the wedding then you can track the other wedding photos down by different meta data on the one photo you have.
You just need a foot in the door, which is what it appears MS is about to give us. Of course that means you can track down all the pictures of uncle bob on the internet, including the regretful ones when he tried to break into Hollywood. They promised to destroy the footage, he swears they did...
no subject
Date: 2004-03-17 12:55 pm (UTC)That's exactly the problem. Those directory names are a subset of the metadata one would want for a photo. And we all know how jumbled directories get when one's not paying attention ...
no subject
Date: 2004-03-17 12:56 pm (UTC)Any volunteers?
Anyone?
Yes, there's me for a start.
Well, actually I wouldn't volunteer, but I am prepared to do it since that's what they pay me for...
no subject
Date: 2004-03-17 01:03 pm (UTC)iView Media Pro is expensive for a cataloguing program. It has a cheaper, less featured version, which is also available for Windows, unlike the bells-and-whistles edition.
As for document storage and metadata, it's not quite a solved problem, but there are companies who will do much of what you want for an enormous fee. I believe the intelligence agencies are very big customers.
no subject
Date: 2004-03-17 02:16 pm (UTC)no subject
Date: 2004-03-17 03:41 pm (UTC)If this was to work, then it would need to avoid sites that were trying to catch its attention.
For instance, if you type "trendylabelthatispopularandcoolandexpensive" into ebay... you get a lot of entries that have "NOT trendylabel.. etc" almost as if they're trying to make you find their item..
and similarly, if you type in "insert name of popular pornstar" when you just happen to want to read their personal journal because you care about their life, you might get a lot of other sites that have nothing to do with that pornstar... they just think "hey, this guy likes porn, if we lure him here, I bet he'll stay"
and so forth.
so if I typed "pictures of Nick getting drunk", the evil site of AlcoholManufacturer-X might lure me in by having enough key phrases that any search about being drunk would bring up their site, since if you were often drunk, you might need a drink...
no subject
Date: 2004-03-17 09:22 pm (UTC)yet another reason for it not to work properly...
no subject
Date: 2004-03-18 03:02 am (UTC)no subject
Date: 2004-03-20 02:33 pm (UTC)