Dapper.net: How To Make Feeds From Web Pages That Really Don’t Want You To

Standard

If I ever want to put together a mashup or just tinker with data on the web my first port of call is Yahoo pipes. However, even though I really like pipes, it frustrates me a fair amount of the time too. Sometimes it behaves erratically and I get a sulk on with it. So, I decided to have a scout around  to look for other ways of achieving what I want.

My first great find is Dapper. I imagine this is old hat to some people, as it’s been around for a few years. It’s actually owned by Yahoo too. As the site itself says…

Dapper is a tool that enables users to create update feeds for their favorite sites and website owners to optimize and distribute their content in new ways.

It doesn’t do the same thing as Yahoo pipes, but is extremely handy for pulling out data from web pages where a feed doesn’t exist, and it provides the output in the following formats (if it’s relevant to the data on the page) – XML, RSS, HTML, Google Gadget, Google Map, Image Loop, iCalendar, ATOM, CSV, JSON, XSL, YAML. I’m not going to pretend that I know what all of the feeds are, but they seem like a fairly handy group of feeds to be able to use.

I thought I’d see if I could create an RSS feed for our library catalogue. I’ve always wanted an RSS for it (so we can feed stock information through to different places easily) and I’ve also wanted a way to produce alerts for new titles (so users can be informed about any new stock they may be interested in), but our library catalogue neither. But now, using Dapper, I can do both easily.

Dapp Factory screen capture

To achieve this Dapper asks you to:

  1. Provide URL’s of web pages your data appears in. You just need to provide sample pages here. I gave it URLs of catalogue search results pages.
  2. Highlight samples of the data on these pages that you want in your feed. I highlighted fields containing Title, Author, Format (eg Hardback, DVD, etc), Book cover, Number of copies and then told Dapper what to call these fields.
  3. Group together data fields – this effectively puts related data together in a single record. If you don’t do this you end up with a list of unrelated data items in your RSS feed, rather than a list of ready formed records.
  4. Identify any portion of the url that can be changed by the user to create a brand new search using that resource. For example, in my url I changed “_TitleResults.aspx?page=1&searchTerm=cake&searchType=99&searchTerm2=&media=&br” to “_TitleResults.aspx?page=1&searchTerm={Query}&searchType=99&searchTerm2=&media=&br”, so I could easily create a new feed for a search for any other keyword without having to go through the whole process again.
  5. Choose the output format of the feed eg RSS, ATOM, HTML, iCalendar, etc (as mentioned earlier). You can also say which fields you want to appear in the output feed.

In response to this Dapper gives you a unique URL for your feed.

From this stage you can also:

  1. Change the query text, as mentioned in (4) and get its own unique URL for this new feed.
  2. Set up a service using the feed you created. Here you can make it public and allow others to create their own searches by changing the query text. This is the service I created. I also created a Google Gadget and added it to my iGoogle page.
  3. Set up an email alert for your feed. So, if a new item is added to the feed (eg a new book comes in stock matching your search query) it will send you an email notification.

I’ve only been tinkering with it for a few hours, but it looks like it’s going to come in handy for pulling out and re-using data in web pages that has in the past been difficult for me to get at. 🙂

Advertisements

4 thoughts on “Dapper.net: How To Make Feeds From Web Pages That Really Don’t Want You To

  1. Aaron Tay

    Hi Gary, yes I use Dapper this way since 2009ish. Like yahoipes It’s good for a hack but not if you want something really reliable.

  2. Pingback: Look Dapper Can

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s