Tinkering in March 2010

Standard

I’m quite pleased with what I’ve achieved in March when it comes to tinkering outside of my day job. Most of the things are only small and uncomplicated, but¬†I’ve either learnt from what I’ve done (no matter how little use or interest to others it is) or I’ve achieved something practical.

(1) Set up an events RSS feed for the Library service Twitter account. We had a general Council events RSS feed, so I had a look at how it was structured and realised I could pull out the library events as a separate feed via Yahoo pipes.

(2) Put together a ‘We Love Public Libraries’ page. It was put together using a Flickr slideshow and Collecta Widget looking for mentions of the phrase. I’m going to expand it to include other relevant phrases too, as it’s not picking up as many positive vibes about public libraries as I’d like ūüėČ

(3) Using Run Basic I put together the basics of¬†“Where’s My Chuffin’ Train?”. Put in your train details and it gives you a few lame excuses as to why it’s late. I need to do it properly, and work on presentation, add signs/symbols. Pointless I know, but I can’t help it.

(4) Using RunBasic I put together a basic URL convertor to feed URLs of book searches from the Library catalogue to Owen Stephens’ “Read To Learn” project. “Read To Learn” suggests courses you might be interested in studying if you were¬†reading a particular non-fiction book/ or range of books. We were both interested to see if it could be of use from a public library point of view, along the lines of… public library users might be interested in studying if they could find courses that related to book they were interested in.¬†My bit of programming needs tidying up. It’s basically got the code there, converts the URL and passes it to “Read To Learn”, it just looks dull and I don’t present the returned courses properly.

(5) As part of the Celebrating Surrey event in June, the library team I’m in thought it would be good to do something to support it from a Web2.0 point of view. So we have started putting together a “Surrey Fiction Bookmap” using Google My Maps. It shows locations that are mentioned in works of fiction. It’s early days and we adding to it as we go along. It will probably also include locations associated with fiction authors too. Each of the books mentioned link back to our catalogue and have a bit of a snippet about the book in the popup.

(6) Working on term extractor using Run Basic.

I had a bit of a¬†frustrating time at the beginning of the month, not being able to decide what to do/ how to¬†focus on¬†things, but with a bit of advice from @chibbie and @ostephens I’ve learnt that it’s best to go for small goals and release things¬†into the wild even if they aren’t perfect.¬†Thanks for the advice.

Advertisements

Literary Twist Project and Run Basic

Standard

After tinkering with my Literary Twist Yahoo pipe (put in a book synopsis and it turns it into a synopsis for a horror novel) I’ve decided that it doesn’t work. Well, it sort of works when it finds any relevant words. The problem is, it relies on the common words appearing in the text fields that are entered into the synopsis text boxes and after testing it for a bit I’ve decided that this method isn’t good enough. Even though I methodically chose the words that would occur frequently enough, it seems that synopsis writers don’t like to write using common words ūüėČ

I’m going to try a different method now – Yes, I know that this project¬†has no practical use in the world, apart from¬†amusing myself, but¬†it’s a challenge to see if I can get it working in the way I wanted.

 

(c) Tomas Rotger (Flickr)

I’ve now worked out that what it needs to do is, basically¬†identify the most common words in any text that is entered into the text box (rather than common words in general) and twist or replace them in a way that makes sense, but also gives the horror aspect to the new words.

I realise I can’t do this with Yahoo pipes – it’s just too complicated to do it that way for my brain. I find Yahoo pipes is fine as long as I don’t make it really complicated and sometimes Yahoo pipes just stalls and sputters into lifelessness if I make anything too complicated.

So, I’m currently using Run Basic¬†to try and¬†achieve this. As the name suggests it’s a¬†language based around Basic – no sniggering! Basic is embedded in my brain and I will champion this favourite language of 80s school boys until you mock me so much that I curl up into a ball and cry. The good thing is that it’s server based, so you can¬†create dynamic web pages from it. I’ve used php¬†to create dynamic web pages in the past, but if I don’t use it for a while I forget the syntax/methods, etc.¬†Php¬†also tended to go wonky on me when I upgraded browsers as my programming was less than standard. Whereas, as I spent years programming in Basic, Run Basic was so easy to pick up. Run Basic also allows you to parse XML, manipulate files, and use HTTPGET, HTTPPUT functions, as well as other useful things.

So the first thing I’m doing on this new plan is to¬†put together a¬†term extractor and word count. It’s not quite there, but I’ve more confidence cracking it with Run Basic than anything else. I won’t let it beat me, no matter how useless the result is!

Share it with RDA

Standard

One of the key points behind RDA is being able to re-use bibliographic data held in library management systems, outside of the system. If you release this data into the wild, it’s likely that someone else will come up with an interesting and innovative way of using it, way beyond its original purpose on the library system.

In the past, library communities have managed to share data between their systems fairly successfully – as long as you catalogued your stock according to the rules. We achieved this sharing process through the use of MARC formats.

Unfortunately, I think the use of MARC formats, specifically MARC21 (the dominant MARC format in the English language speaking world), will be the thing that undoes the RDA plan to share data outside of the catalogue.

MARC21 records are stuffed full of punctuation that will need to be stripped out before you can share it. There’s no doubt that this can be done – if you added the punctuation in the first place, based on rules, you should be able to strip it all out again. However, it would be a lot more helpful when going down the “Let’s open this data up to the world” route if we didn’t have to do this. Why should users of the data have to frustrate themselves with this process?

So, now RDA is with us, isn’t it time to look at MARC21 and do something about this barrier to sharing data?

RDA for the People

Standard

I’ve known RDA was coming along for us cataloguer librarian types. I knew what it was all about, but until recently I’ve been trying to work out why it’s so important and what difference it will make? How will it change what I’m doing in my day job? I know you’ve got the practical side of things – how MARC cataloguing on our library system ties in with the new RDA rules, but apart from this, what’s the big deal?

I think the big deal and most important thing for me, is that it puts into words and agrees rules on what a lot of cataloguers have been trying to do for some time now – providing information in catalogue records with a focus on helping users of the catalogue, rather than as an academic and formal exercise in cataloguing.

Whenever I’ve been cataloguing over the past 15 years, it’s always been a case of “this is cataloguing – it needs to fit the standards set out in MARC and AACR2, but it also needs to give the users what they want.”

I know I’m not unique in this situation – other cataloguers recognise that MARC needs to be tweaked, based on the information you know the users will make use of and how individual library systems work.

In recent years there’s been a change in the way we look at cataloguing – defining the purpose from a different angle, acknowledging that the information world is dominated by internet search and presentation and shifting accordingly. We needed to give the users a way of searching/interacting with the catalogue in a way they’re familiar with.

This is why for me the most important part of the RDA changes are:

(1) Recognising users needs, including the type of information they want to see eg. fiction genres

(2) Focussing on keyword search styles

(3) Presenting information in a human readable form –¬† no longer inverting subject headings and moving away from abbreviations

(4) Display issues

(5) Reducing the need for editing of data.

Now that RDA says it’s okay to focus on the users needs, I can sleep soundly in my bed and not worry about whether I’m offending another cataloguer by using the incorrect form of an abbreviated inverted main title entry, with trailing responsiblity codes, or not!

PS. I just made up the ‘abbreviated inverted main title entry, with trailing responsiblity codes‘ statement for illustration and you don’t have to worry that it doesn’t make sense. It would only be us cataloguers who’d be able to tell I was talking rubbish anyway ūüėČ

Interaction, RFID and the British Music Experience

Standard

I visited the British Music Experience at the O2 arena in Greenwich today and was really impressed. Not just by the fantastic amount of great music that has come out of Britain since the 50s, but also in the way they presented it at the exhibition.

Each room covered a particular period or style of music and each room used a combo of presentation/interaction styles. Most of the images were projected in some way or displayed on a screen.

(1) Juke boxes allowed you to choose different genres of music and gave you some background information about the music you’d chosen.

(2) Fretboard/keyboard style input allowed you to find out more information about memorabilia held in glass cabinets.

(3) Trackballs could be used to control news time lines – images related to particular news items were displayed on a wall and you could find out more details by skimming over them.

(4) Projected images that responded to touch and ran through more detailed documentaries.

Part of my bookmarked 1975 timeline

My personal favourite was the large map of Britain, which was projected onto a stage area in the main room. Here you could use one of 3 separate track balls to move a circular cursor over the map. On a smaller projection a few feet wide, nearer your trackball, you’d be shown information about musicians associated with that location on the map.

As well as these clever ways of presenting the information, your entrance ticket was also an rfid enabled smart ticket. At many of the information points you could scan your ticket over a sensor and bookmark the information you were looking at/listening to. Then, when you take your ticket home, you can type the ticket number into the British Music Experience website and you’re shown the information you bookmarked in the exhibition. It’s a permanent record of the bits of the exhibition that you found the most interesting. I can’t help think that it would have been good if the website provided you with further details about the areas you were interested in based on your ticket number, rather than just showing you the information you saw at the exhibition – maybe pointing you to other websites related to this music genre/band. Following on from this, I wonder if libraries could do a similar thing, by recognising when a user logs in to the library catalogue that they had recently read a particular book on a particular subject and therefore work out via some clever algorithms that they might be interested in further information on a related web site.

Another minor criticism of the exhibition was the inability to search for specific musicians/bands. Browsing is great, but if you have a particular interest in a specific musician you might want to know if they are mentioned in the exhibition at all, and if they are, in which room.

It was well worth the visit and the way it was organised meant that you could personalise the exhibition according to your own musical interests, by either ignoring, skimming, exploring in detail and/or bookmarking the resources that were there. I’d definitely recommend you visit, if you are in any way interested in popular music produced in Britain in the past 60 years.

Literary Twist Update

Standard

I mentioned in my Tinkering Day post that I’d made some progress on the Literary Twist project. I thought it might be interesting for others to see what I’d done/how I’d done it.

Well, I’ve sort of done what I wanted on tweaking the words, but at the same time it’s obvious that what I wanted to do wasn’t enough to make it as entertaining as I wanted. ūüė¶

I basically got a list of commonly used words – I looked for a few sites that covered this to get an aggregated group of words… just to make sure I was replacing the best set of words. Then, using Google docs I pulled data from tables in websites into a spreadsheet, rather than retyping the info (Tony Hirst wrote a blog post about doing this). Sometimes, because the words weren’t in a table, I had to copy/paste the data into the spreadsheet. The data was a bit scrappy, as it came through to the spreadsheet in a variety of formats. Google spreadsheets doesn’t have a regex function and I didn’t want to do hundreds of manual find/replace, so I fed into Yahoo pipe to clean it up, using regex.

I output the clean file as csv and imported it into Excel, so I could get a count on the number of times specific words appeared. This helped me decide which words I’d do the find/replace on later on. I also needed to look at a few dictionary sites to make sure I replaced words that could only be used as one class of word, rather than more than one (ie adjective, noun, verb) – more than one messes up the syntax/form of the sentence.

Then I created a new Yahoo pipe, which had 2 text input boxes for title & synopsis. I added find/replace modules & manually entered words that needed to be replaced, along with text that replaced it.

Werewolf by Schnaars (Flickr)

Werewolf by Schnaars (Flickr)

Still, at this stage, some words didn’t work. Some of the replacement words didn’t work either. This is partly because I hadn’t thought too much about the type of text that would work with the replacement. For example, synopsis seem to talk more about ‘he’, ‘she’, ‘it’, rather than ‘I’, ‘me’ and this affects the way that you need to deal with the whole word replacement style.

I also worked out that, even though it’s a good idea to replace common words, because you’ve got a better chance of hitting words that can be replaced out of the 171,476 words in common use in the English language (according to the Oxford English Dictionary), more synopsis actually try to avoid the cliche/common words.

It still needs tweaking and it’s presentation still needs prettifying (or horrifying ;-)), but here’s the pipe for people to have a look at. All you need to do is enter a book title and synopsis into the boxes. I’d be interested in the output from anything you paste into the pipe, as I’d like to see how the pipe works on a wide variety of synopsis. Maybe anyone who uses it could cut/paste the output of the pipe as a comment to this post. I know it needs work on.

Get Involved in the Revolution

Standard

I enjoyed watching the BBC’s ‘Virtual Revolution’. It filled in gaps in my knowledge about how things have developed since the early days of computer networks. It was also interesting to see things from an information society perspective as well as a techy one.

The series was developed with the help of the common man/woman. The BBC announced it back in Summer 2009, asking for people to contribute to its development. I thought this was a great idea, re-tweeting their calls for input into the series and it was fun/exciting watching it develop over the months.

However, when I go back to look at the blog post feedback on the BBC site and I remember the tweets I picked up around the series, I was really surprised at how little input was added by people outside the BBC. I know the programme makers also went beyond blog posts and Twitter feedback, including forums and discussion groups, but it still seemed like a miniscule response from the people. It got me quite frustrated- expecting at least a decent % of the internet world to get involved in this discussion. People had the chance to shape the programme and they didn’t take the opportunity.

I don’t know why. Maybe they hadn’t picked up on the fact they could contribute. Maybe they just didn’t want to, or couldn’t be bothered. Maybe they didn’t feel it was their place to get involved. Maybe they thought their opinions would be laughed at.

It’s a shame really, as I think a lot of people missed the chance to get involved, get their useful opinions/viewpoints heard and also, in a way, allow the social networking they get involved in over the internet, go beyond the computer and out to a broader audience on TV.