It has been a long time since I blogged on this site. This is partly due to my involvement in a couple of really interesting research projects (the Digital Data Analysis project of the http://www.communitiesandculture.org network with Dr Helen Kennedy and Dr Giles Moss, and the Leeds Media Ecology project which will result in a forthcoming book edited by Prof. Stephen Coleman). It is partly due to my taking over as the programme leader of the BA Hons New Media degree at the Institute of Communications Studies at the University of Leeds (most of this summer has been spent developing a new level three module entitled “Mobile Media”, looking at the impacts and influences of and on mobile communications and introducing students to the basics of mobile web and app development). So I’ve been pretty busy.
I have continued to develop my PhD work, though, reaching the stage where I am about to start to harvest a lot of conversational data from all across the web where people are talking about UK political issues. When I say a lot, I mean thousands of contributions. Hopefully tens of thousands, maybe lots more. To enable this, I managed to find three weeks this summer to devote myself to the development of a tool that will allow me to grab all this data pretty quickly and get it all stuffed nicely into my database. The Conversation Scraper is the solution I came up with – a Mozilla Firefox plug-in that operates in the same way as many screen scraping applications out there, allowing users to select parts of a web page, mark them up as a category of relevant content, building up a profile for a particular website before clicking a button and watching the data be selected and harvested automatically before their eyes -at least in theory, as long as the user has marked up the page carefully enough. The difference with this screen scraper is tat it is customised for conversation, allowing users to mark up fields like usernames, dates, message content, reply-to names and ratings or likes.
I will eventually release this tool on GPL. At present it puts data into my own database (which is no good to anyone else and is not robust enough to handle crowd sourcing) but I hope to modify it to produce a JSON or CSV export instead of a database insert, so that anyone can use it. Get in touch if you want to have a look. Maybe we could do a trade – you tell me some good ways to measure metrics like domination in a conversation and I’ll give you a look at the plug-in!