Generating Blog Keyword Tags 2

I had another go at automating the tagging process for my blog using Yahoo Pipes, as I wanted to improve on my original idea, which was a bit scrappy.

So, I’ve reworked the pipe to pull out all of the keywords from all of the posts (using Category RSS feeds). The original pipe listed the blog post title and keywords associated with that blog post. The new pipe lists the most frequently used keywords in all of the blog posts. When the keyword is clicked on (in the RSS feed) it runs a search on that keywordย and returns any blog post mentioning the keyword.

In the pipe I’ve manually filtered out certain irrelevant words eg ‘blog’, ‘amp’ and ‘doc’. As time goes on I’ll have to manually add more words.

The only problem at the moment is that, even though the pipe returns an unlimited number of keywords, is limited to showing the first twenty items. I decided to compromise and call the feed ‘Top automatic Tags’. Unsurprisingly the most common phrase is ‘Yahoo pipes’.

You can see it on the right hand side of this blog (if Yahoo pipes is working, of course ๐Ÿ˜‰ )

PS. I’ve not abandoned the original Tagxedo idea, but I need a bit more time to tinker with it.

Middlemash Video Thing

Some time ago I started thinking about putting some kind of tribute (I suppose you’d call it that!) to the Mashed Libraries Middlemash event held at the end of last year.ย  I wanted to do something that used the tweet stream around the event, but tweaked the information in a way that referred to the original event, but not necessarily in an obvious way. So I put an audio visual thing together. It’s a bit “Hmm! I stroke my chin and nod sagely while watching it!”, but I wanted to do something different. ๐Ÿ˜‰

What did I do?

  • Identified the top 11 tweeters/presenters at the event. (the 11th was so close to the 10th that it seemed unfair to leave them out!)
  • Fed their individual tweet stream into Tagxedo, using their Twitter photo as the Tagxedo image outline.
  • Added the image that had been generated in Tagxedo to Audiopaint to create some digital noise.
  • Visually scanned each image for the most dominant words and typed them into ‘Let Them Sing It For You‘. This software creates vocals from the words you’ve entered by using samples from existing songs.
  • Created a short video clip for each person using the images and audio that had been generated from their tweets.
  • Mixed all the videos together.

It is a bit odd, but it was fun to do – it gives the whole event a mad cyber-info twist, with images of the people built from the words they tweeted! I think most of them turned out well, but my personal favourite is probably Paul Stainthorp’s (with Middlemash plastered across his eyes like the robot in “The Day The Earth Stood Still“). It’s also fun trying to identify which songs are used in the vocals.

Twitter follower/friend map

I seem to be getting into the swing of things with Yahoo Pipes at the moment and I seem to be creating lots of maps. Every time I use it, something else clicks in my head and puts a smile on my face. Yesterday, Aaron Tay asked me if I knew how to create a Twitter followers or friends map. I didn’t, but I thought it would be a good way to see if I could get to grips with some of Twitter’s APIs and also if they’d play more nicely with Yahoo pipes than previously. It was also nice to be asked by someone else to do something like this – my own projects seem to be a bit self-centred, so being able to do something useful for someone else made a nice change.

The Twitter API lets you pull out details of a users friends/followers. It does this via their Twitter id number, but by creating a URL with their id added to it, you can pull out full details. You can use a programming language to do this too, but if it goes into Yahoo pipes I’d rather do it there. Once you’ve got this, you can narrow the info down to the various bits you need. In my case I wanted biography details, location, photo and a link to Twitter profile.

In summary, I had to:

(1) Create user input boxes for ‘username’ and to identify if the map was for ‘followers’ or ‘friends’. This meant anyone can enter their user details, rather than just myself.

(2) I then had to build a url to point to the Twitter API and include the detail in (1).

(3) This url then fetched the details of the users followers or friends. ie their id numbers only.

(4) I then built another url using the id’s, to fetch full details of every follower or friend of the user.

(5) Each users profile contains a location field and if you put this into the ‘location builder’ module it extracts very detailed geographic location. Pretty impressive, considering some users only give the vaguest of details. It’s not perfect though, as, for example @therealwikiman is mapped to the USA, even though his location info is detailed. As he’s really based in England, I imagine the commute in the morning is a bit of a nightmare. ๐Ÿ˜‰

(6) From various fields in each profile I then built a description that contained Twitter image, biography and location in text.

(7) I also added a link to each of their Twitter home pages.

(8) Finally I mapped all of the data to standard RSS/map data fields (title, link, description, y:location). When Yahoo pipes works with data it changes field names to reflect what it’s done to the data, so you need to change them to a format that is recognised.

(9) I connected it to the pipe output.

Twitter follower and friends map

When it ran, because it saw the field ‘item.y:location’ in there, it automatically displayed the information as a map, which you can see here. You can also add your own user info into the search box and create your own map. (NB: Sometimes Yahoo pipes & Twitter don’t play nicely together. If you have a problem with this pipe and have a Yahoo account, try copying the pipe and adding your own information into the search boxes.)

One thing I would like to get to grips with in Yahoo pipes is to be able to embed the output of a pipe into a web page and also allow users to add their own input on the same page, but I’ve not cracked that yet. So, if anyone else can help me with that side of things it would be appreciated. Thanks.

A Travellers Map in Yahoo Pipes

Before putting together the Surrey Fiction Book Map for work I was considering the possibility of creating a map of the world that would link from markers to Surrey Libraries’ catalogue. I didn’t fancy creating it manually and I was sure it could be done via Yahoo Pipes. However, at the time I hadn’t used Yahoo Pipes in this way before, so I didn’t follow through with the idea.

Now I’ve had a bit more time to think about it, I’ve managed to put something together using a spreadsheet version of our Subject index and Yahoo Pipes.

Firstly, the spreadsheet contains all of the information I need – text description of the location, plus the sub topic (eg Travel; history; etc). It also contains the Dewey number and our Reader Interest Categories (RIC). In Surrey the RIC is used to shelve our stock by subject area – helping to bring together related stock that would otherwise be separated.

Section from Subject Index spreadsheet

I created a Yahoo pipe that pulled in the spreadsheet information.

It then filtered the subject headings based on the ‘NewRIC’ column, removing any subject headings that weren’t location-based. In the above example you can see some subject headings in the original source file that it excluded eg Aramaic Language; Arboretums; Archery.

The pipe combined the Heading/Subheading fields (so they appeared in the title) and the RIC and Dewey number (so they appeared in the description). It’s a librarian thing I do to scare off the public ๐Ÿ˜‰

I also fed the title field into the ‘Location builder’ module and it did a pretty good job of identifying the map locations mentioned. It did have some problems, as you can see from the fact that “War of the Roses” has been mapped to just off the Australian coast! This was due to the fact that some of the text wasn’t precise. I’m correcting these issues gradually, as there are over 800 items to check.

War of the Roses, just off the coast of Australia!

Finally I created a link from each marker pin back to the library catalogue. As the subject index contained Dewey numbers I could add this information to each link via the String builder module. The link basically acts as a catalogue search.

If you’re interested you can take a look at it here.

As a next stage I need to tidy up the subject index, so it maps more accurately and removes subject headings that I can’t map correctly.

It would also be useful to be able to present the map so it is less tightly packed and maybe add a location search too. Maybe with some location images, as well.

Also, if you do want to know what each part of the pipe does in detail, feel free to ask.

Is Data Scaping Naughty?

Whilst tinkering, I’ve been doing a bit of data scraping ie automatically pulling out bits of information from web pages and re-using them. I’ve been a little concerned about this, because I’m not sure if it has any impact on the system that provides the information I’m scraping from. NB: I’m not going in and pulling out tons of information. I only do it against specific queries to pull out a handful of web pages, which I then manipulate and I do it infrequently.

I’m not so much concerned about the ethics of using scraped data – I’m not branding it as my own, or making money from it. In fact, 99.99% of the time I’m the only person who sees it/uses it. I’m just presenting it in a way that is more useful to me. I am really just concerned about the impact my data-scraping has on the server that is hosting the web page I’m scraping.

Data Scraping Guilt Complex Flowchart

I can see that it might have an impact on my host server if I’m pulling out lots of information. In fact, I’ve got into a bit of trouble doing this using RunBasic, as I stupidly hadn’t thought about the strain it was putting on my host server when I kept testing something online. (I’ve reverted back to running scraping via RunBasic on my own PC now!) As well as RunBasic, I’ve been using Yahoo pipes to data scrape.

Looking at how the information comes into the systems it seems that it just calls up the web page I want, caches it off-site and the manipulation goes on off-site, so it can’t have an impact on the host server (the data originally came from) itself. It seems it’s the same as if I called up a web page normally (via the address bar or a search), looked at the source code, copied it to notepad and tweaked it there. Is this right or wrong? I’m happy to be re-educated in a way that doesn’t sound patronising or rude. ๐Ÿ˜‰

I’ve read around this a bit and some people suggest it does have more of an impact and others say it doesn’t.

So, if anyone can say for definite and explain it in, words of, ummm!… 4 syllables or less I’d appreciate it. Thanks.