Catalogue and Classify Those Tweets


The recent announcement about The Library of Congress acquiring the back archive of tweets got me thinking again about meaning in Twitter hashtags. I jokingly suggested that L.O.C. might like to classify them all. Thinking about it properly, classifying them might be useful. I’m not thinking about all tweets, but hashtags mentioned in the tweets.

If you are a library holding references to resources on your catalogue, a catalogue record for and linking to a hashtag url for an event eg #jisc10 or a subject eg #rda might be useful. Twitter provides useful & concise information, links to resources, discussions, etc, so why not make use of that? I know you have to read some waffle too on Twitter, but you often have to read through more waffle in a book.

You could index the hashtag in the same way as your normal stock. ie with subject headings and classification codes. I know tweets are lost into the ether after a few days, but for more permanent links you could use a url pointing to a twitter archiving service like Twapperkeeper, if tweets about the hashtag are being stored. These services are more likely to hold onto tweets for much longer.

I wonder if this is the intention of the Library of Congress?

Meaning in Twitter Hashtags


(NB: Originally posted on Library2.0.ning 11/12/09)

I find hashtags on Twitter a useful way of pulling all the information about a particular discussion together. However, sometimes I see a hashtag that is attached to an interesting tweet, but I haven’t got a clue what the background of the hashtag is. If I’ve not been involved in the thread/discussion/conversation from the beginning it can be difficult to back-track to where it was first used. Sometimes keywords are used as hashtags, which can give you a clue (but not always), but in some cases abbreviations are used. Phil Bradley raised the keyword/abbreviation issue recently in his blog. My issue is not necessarily with conference hashtags, as mentioned by Phil though, and the shorter the hashtag the more difficult it can be to work out what that hashtag is about.

Various hashtag search/retrieval sites have popped up as a sidekick to Twitter. They can retrieve tweets by hashtag/keyword, but you can’t always get the gist of a hashtag thread from them. Some of the sites concentrate on how the information is presented, which can be fun/interesting in itself, but isn’t always a useful search tool. For example, Trendsmap has an interesting way of presenting information – it displays hashtags/keywords on a map, based on most common words used in tweets. However, what’s the point when you’ve got vague/non-descript words appearing on it, like ‘appear’, ‘whilst’ and received’ (see below). Nice to see ‘@serafinowicz’ in there though! He must be doing one of his Q&A sessions.

So, I’ve been trying to see if there’s a better service out there to achieve what I want.

I started playing with ‘Yahoo Pipes‘ a couple of months ago. It’s a simple way of getting data from web sites. I realised I could get info out of Twitter via it and then tinker with that data. I created a pipe to search for a hashtag and I used the ‘Term Extractor’ module to pull out useful words in the tweets that mentioned the hashtag I’d searched for. The ‘Term Extractor’ did what it said on the tin – pulls out words it thinks are important. But I found that the terms it extracted weren’t that helpful. I don’t know why it thought some words were more important than others and pulled them out. I also had an intermittent problem with accessing Twitter via the pipe – access to Twitter via this method is limited to a number of searches per hour. So, it worked in theory, but wasn’t ideal.

I also realised last week that @psychemedia had set up a Yahoo Pipe along the same lines, so I needn’t have struggled to work on it. However, saying that, I do get satisfaction working out these ideas for myself, so I hadn’t really wasted my time… all part of the learning curve.

Another alternative is to look on ‘Twubs’ for the hashtag, to see if someone has registered it. When a person registers the hashtag they can also give it a description and indicate other hashtags related to the one they’ve registered. However, this relies on people knowing the service exists and then registering the hashtag. With so many Twitter sidekick applications how can you expect everyone to know every application that is out there?

This leads me onto ‘TweetCloud’. I didn’t know it existed until about a week ago. See, I told you. I can’t keep up with all the Twitter apps! You can type in a hashtag and it gives you a tag cloud showing the most popular words used in tweets related to this hashtag. It’s more useful than the other options above, but it’s still not ideal. For example, I searched for ‘middlemash’ and as well as some useful words, TweetCloud gave me things like ‘anyone’, ‘nice’, ‘many’, ‘trying’, ‘really’. This isn’t helpful, but I think it’s the best I’m going to get at the moment.

In the long run it would be great if someone could create a way of feeding Tweet messages mentioning hashtags into a decent term extractor site/API – one that understands what is and isn’t a useful word or phrase and can categorise them into different types of term eg. person, place. There are a few services available out there already that can be used for this sort of process – AlchemyAPI, Zemanta, Term Extractor by LCL and Terminology Extraction by Translated .net Labs. Both AlchemyAPI and Zemanta provide APIs. I’ve not tinkered with them long enough to be certain, but they do appear to be better than the other options mentioned earlier (ie Yahoo Pipes ‘Term Extractor’, TweetCloud, Twubs, Trendsmap.) I copied and pasted a good portion of results from twapperkeeper/middlemash into AlechemyAPI and you can see the results below.

I’d prefer to be able to do this in a more user friendly way, with a decent interface that presents results in a way that relates to Twitter. I imagine it won’t be long before someone with more brains than myself comes along and achieves what I’m after. I can wait, but in the mean time it’s on my ‘Projects I’d like to have a go at, but might never complete‘ list.