I mentioned in my Tinkering Day post that I’d made some progress on the Literary Twist project. I thought it might be interesting for others to see what I’d done/how I’d done it.

Well, I’ve sort of done what I wanted on tweaking the words, but at the same time it’s obvious that what I wanted to do wasn’t enough to make it as entertaining as I wanted. :-(

I basically got a list of commonly used words – I looked for a few sites that covered this to get an aggregated group of words… just to make sure I was replacing the best set of words. Then, using Google docs I pulled data from tables in websites into a spreadsheet, rather than retyping the info (Tony Hirst wrote a blog post about doing this). Sometimes, because the words weren’t in a table, I had to copy/paste the data into the spreadsheet. The data was a bit scrappy, as it came through to the spreadsheet in a variety of formats. Google spreadsheets doesn’t have a regex function and I didn’t want to do hundreds of manual find/replace, so I fed into Yahoo pipe to clean it up, using regex.

I output the clean file as csv and imported it into Excel, so I could get a count on the number of times specific words appeared. This helped me decide which words I’d do the find/replace on later on. I also needed to look at a few dictionary sites to make sure I replaced words that could only be used as one class of word, rather than more than one (ie adjective, noun, verb) – more than one messes up the syntax/form of the sentence.

Then I created a new Yahoo pipe, which had 2 text input boxes for title & synopsis. I added find/replace modules & manually entered words that needed to be replaced, along with text that replaced it.

Werewolf by Schnaars (Flickr)

Still, at this stage, some words didn’t work. Some of the replacement words didn’t work either. This is partly because I hadn’t thought too much about the type of text that would work with the replacement. For example, synopsis seem to talk more about ‘he’, ‘she’, ‘it’, rather than ‘I’, ‘me’ and this affects the way that you need to deal with the whole word replacement style.

I also worked out that, even though it’s a good idea to replace common words, because you’ve got a better chance of hitting words that can be replaced out of the 171,476 words in common use in the English language (according to the Oxford English Dictionary), more synopsis actually try to avoid the cliche/common words.

It still needs tweaking and it’s presentation still needs prettifying (or horrifying ;-)), but here’s the pipe for people to have a look at. All you need to do is enter a book title and synopsis into the boxes. I’d be interested in the output from anything you paste into the pipe, as I’d like to see how the pipe works on a wide variety of synopsis. Maybe anyone who uses it could cut/paste the output of the pipe as a comment to this post. I know it needs work on.

