MeaningCloud – Extracting Meaning from Content

Standard

I received an email the other day to inform me that Textalytics had changed its name to MeaningCloud. This was really handy, as it reminded me that it existed – I’d signed up for it during the summer, but hadn’t had much of a chance to use it, as I was working on the Library A to Z.

Anyway, it’s something I want to explore more. It’s an online service that analyses the content of text, documents or web pages you supply it with and it highlights key subject, people, places, things and other entities and concepts for you. As a librarian (specifically with my classification head on) I’ve been interested in the idea of automated classification for some time and have tried various experiments including using Yahoo pipes and my recent WordPress snapshot cards to extract meaning from text.

I’ve tested a few other online services like MeaningCloud, but this was the one that seemed the most straightforward and easy to use. The documentation is clear enough for me to understand all I need to and, as I have only really got my head around working with XML output, having this as one of the 2 output options is important to me. It also helps that it’s free up to a certain amount of use.

The way it works is you submit a url containing all the key parameters to the online service:

  • The text, document file, or url of the web page you want to analyse.
  • The type of results you want returned to you (eg sentiment – positive/negative/neutral; text classification – very broad categories such as “libraries and museums”; topic extraction – more detailed subjects and concepts).
  • The output format (eg json or XML).

You can specify more than this and you can define topic dictionaries that are used.

It then returns the information you requested to the service you sent the request from. So, in my case, it would most likely be sent via a program written in Processing. You can then do whatever you want with that response. So, in theory I can develop my WordPress snapshot cards idea to include the subjects, concepts, people, places etc that it returns.

Even though I recognise that analysis tools don’t always pick up on the finer points of text and lack human understanding that is sometimes needed to make complete sense of a piece of text, I like what they can do, and I hope I can do something useful with MeaningCloud.

If you want to try it out, take a look at the demos and enter your own text into the box. The image below shows some of the results it gave me when I entered the text from this blog post. It threw up a couple of odd things “begging” (request) and “boss” (head), but as I say, if you are using it properly you can take the time to set up a dictionary to overcome these sort of issues.

Screenshot of MeaningCloud analysis of blog post

Advertisements

Generating Blog Keyword Tags 2

Standard

I had another go at automating the tagging process for my blog using Yahoo Pipes, as I wanted to improve on my original idea, which was a bit scrappy.

So, I’ve reworked the pipe to pull out all of the keywords from all of the posts (using Category RSS feeds). The original pipe listed the blog post title and keywords associated with that blog post. The new pipe lists the most frequently used keywords in all of the blog posts. When the keyword is clicked on (in the RSS feed) it runs a search on that keyword and returns any blog post mentioning the keyword.

In the pipe I’ve manually filtered out certain irrelevant words eg ‘blog’, ‘amp’ and ‘doc’. As time goes on I’ll have to manually add more words.

The only problem at the moment is that, even though the pipe returns an unlimited number of keywords, WordPress.com is limited to showing the first twenty items. I decided to compromise and call the feed ‘Top automatic Tags’. Unsurprisingly the most common phrase is ‘Yahoo pipes’.

You can see it on the right hand side of this blog (if Yahoo pipes is working, of course 😉 )

PS. I’ve not abandoned the original Tagxedo idea, but I need a bit more time to tinker with it.

Generating Blog Keyword Tags and Tagxedo Clouds with Yahoo Pipes

Standard

As I’ve been adding blog posts here I’ve noticed that the keyword tags are getting into a mess. So I’ve been thinking about what I could do to sort them out, either by getting the computer to do the tagging for me, or provide another way of presenting relevant keywords about specific blog posts to anyone who visits my blog.

As a first attempt (1), I decided to use Yahoo pipes and simply feed in the RSS feed of my blog, pull out keywords from each blog post and then create another RSS feed to be used anywhere. Visitors can view keywords for the last 10 blog posts, as I couldn’t get Yahoo pipes to go beyond 10. They can also click on a link to the blog post. The words/phrases pulled out aren’t perfect (as with any automated word extraction), but I think you get a good feel for the blog posts from them.

(NB: Click on ‘List’ to see the keywords and use the link back to the blog.)

At the same time, I was thinking about whether I could use something like a tag cloud generator to do what I wanted a bit more creatively. Having a look at Tagxedo, I realised that if you use the URL function on the home page to create a cloud you could actually build the url yourself. So, I created a second pipe (2) that presented the same keywords, but also provides a clickable link that feeds through to Tagxedo and creates individual tag clouds for each entry.

(NB: Click on ‘List’ to see the keywords and use the link to generate the cloud.)

Ultimately, I did want to combine the two pipes, but I couldn’t get Yahoo pipes to create valid links to 2 places in the same RSS item.

I would have also liked the Tagxedo cloud to display in the RSS feed, but at the moment the link just creates a cloud from the RSS.

Hopefully there is a way to achieve both of these things, but as a first attempt I think they both work quite well, even if the RSS feeds/presentation do need a bit of tidying up.

The results of the embedded pipes can be found on my test site here. Links to the source code of the pipe can be found in (1) and (2) above. I’ve also added the RSS to the blog on the right hand side to see how people get on with it. The feed is labelled ‘Term Tags’.

Meaning in Twitter Hashtags

Standard

(NB: Originally posted on Library2.0.ning 11/12/09)

I find hashtags on Twitter a useful way of pulling all the information about a particular discussion together. However, sometimes I see a hashtag that is attached to an interesting tweet, but I haven’t got a clue what the background of the hashtag is. If I’ve not been involved in the thread/discussion/conversation from the beginning it can be difficult to back-track to where it was first used. Sometimes keywords are used as hashtags, which can give you a clue (but not always), but in some cases abbreviations are used. Phil Bradley raised the keyword/abbreviation issue recently in his blog. My issue is not necessarily with conference hashtags, as mentioned by Phil though, and the shorter the hashtag the more difficult it can be to work out what that hashtag is about.

Various hashtag search/retrieval sites have popped up as a sidekick to Twitter. They can retrieve tweets by hashtag/keyword, but you can’t always get the gist of a hashtag thread from them. Some of the sites concentrate on how the information is presented, which can be fun/interesting in itself, but isn’t always a useful search tool. For example, Trendsmap has an interesting way of presenting information – it displays hashtags/keywords on a map, based on most common words used in tweets. However, what’s the point when you’ve got vague/non-descript words appearing on it, like ‘appear’, ‘whilst’ and received’ (see below). Nice to see ‘@serafinowicz’ in there though! He must be doing one of his Q&A sessions.

So, I’ve been trying to see if there’s a better service out there to achieve what I want.

I started playing with ‘Yahoo Pipes‘ a couple of months ago. It’s a simple way of getting data from web sites. I realised I could get info out of Twitter via it and then tinker with that data. I created a pipe to search for a hashtag and I used the ‘Term Extractor’ module to pull out useful words in the tweets that mentioned the hashtag I’d searched for. The ‘Term Extractor’ did what it said on the tin – pulls out words it thinks are important. But I found that the terms it extracted weren’t that helpful. I don’t know why it thought some words were more important than others and pulled them out. I also had an intermittent problem with accessing Twitter via the pipe – access to Twitter via this method is limited to a number of searches per hour. So, it worked in theory, but wasn’t ideal.

I also realised last week that @psychemedia had set up a Yahoo Pipe along the same lines, so I needn’t have struggled to work on it. However, saying that, I do get satisfaction working out these ideas for myself, so I hadn’t really wasted my time… all part of the learning curve.

Another alternative is to look on ‘Twubs’ for the hashtag, to see if someone has registered it. When a person registers the hashtag they can also give it a description and indicate other hashtags related to the one they’ve registered. However, this relies on people knowing the service exists and then registering the hashtag. With so many Twitter sidekick applications how can you expect everyone to know every application that is out there?

This leads me onto ‘TweetCloud’. I didn’t know it existed until about a week ago. See, I told you. I can’t keep up with all the Twitter apps! You can type in a hashtag and it gives you a tag cloud showing the most popular words used in tweets related to this hashtag. It’s more useful than the other options above, but it’s still not ideal. For example, I searched for ‘middlemash’ and as well as some useful words, TweetCloud gave me things like ‘anyone’, ‘nice’, ‘many’, ‘trying’, ‘really’. This isn’t helpful, but I think it’s the best I’m going to get at the moment.

In the long run it would be great if someone could create a way of feeding Tweet messages mentioning hashtags into a decent term extractor site/API – one that understands what is and isn’t a useful word or phrase and can categorise them into different types of term eg. person, place. There are a few services available out there already that can be used for this sort of process – AlchemyAPI, Zemanta, Term Extractor by LCL and Terminology Extraction by Translated .net Labs. Both AlchemyAPI and Zemanta provide APIs. I’ve not tinkered with them long enough to be certain, but they do appear to be better than the other options mentioned earlier (ie Yahoo Pipes ‘Term Extractor’, TweetCloud, Twubs, Trendsmap.) I copied and pasted a good portion of results from twapperkeeper/middlemash into AlechemyAPI and you can see the results below.

I’d prefer to be able to do this in a more user friendly way, with a decent interface that presents results in a way that relates to Twitter. I imagine it won’t be long before someone with more brains than myself comes along and achieves what I’m after. I can wait, but in the mean time it’s on my ‘Projects I’d like to have a go at, but might never complete‘ list.