WWW2009 EPrints

Content Analysis in Web 2.0

Abstract

Web mining deals with understanding, and discovering information in, the World Wide Web. Web mining focuses on analyzing three different sources of information: web structure, user activity and the contents. When referring to the Web 2.0, web structure and user activity related data can be dealt with in a very similar way that in the case of the traditional Web, however, in the case of contents, conventional analysis and mining procedures are not suitable anymore. This is mainly because, in the Web 2.0, contents are generated by users, who make a very free use of language and are constantly incorporating new communication elements which are generally context dependent. This kind of language can also be found on chats, SMS, e-mails and other channels of informal textual communication. This workshop focuses on the problem of making Web 2.0 both searchable and analyzable in terms of its contents. This is an extremely important endeavor for current web mining technologies because of two reasons: first, user generated content (UGC) is growing faster than ever in the cyberspace and, two, automatic analysis of UGC will allow improving the user experience of common citizens about Internet resources and opportunities, while, simultaneously, detecting and tracking criminal and terrorist activity. In this first edition of the workshop we attempt to focus the attention of interested research groups and companies into the new challenges and opportunities related to Web 2.0 content analysis. More specifically, we will focus on specific tasks on the scope of text content mining, with the intention of extending the coverage to multimedia data in future editions of the workshop. According to this, for the first edition of the workshop, we will collect and provide a corpus which should be used as experimental collection to conduct research in three specific shared tasks: text normalization, opinion mining and misbehavior detection. In the text normalization shared task we want to address the problem related to chat-speak style of communication. Recently, some research has been carried out in this area for SMS communications and from the perspective of machine translation approaches. In this shared task we attempt to generalize the problem to Web 2.0 contents and to explore additional alternatives the participants can come out with. In the opinion mining shared task we want to address problems such as determining text subjectivity and polarity, and sentiment analysis. Although these problems have been already approached from different perspectives, most of the research has been carried out on specific domain data and applications where users are requested to rate services or products. Our intention is to focus the attention into the more general domain in which Web 2.0 users express their sentiments and opinions in their daily interaction within a virtual community. Finally, in the misbehavior detection shared task, we want to address the problems of detecting inappropriate activity in which some users in a virtual community can be molesting or offensive to some other members of the community. We consider that this shared task can provide a good starting point for a future shared task with the more ambitious goal of classifying users and detecting identity supplantation for on-line criminal activity.

Fun web stuff for this record

RKBExplorer (from linked data workshop)
URI: http://eprints.rkbexplorer.com/id/www2009/eprints-255
Browse the data for this paper at RKBExplorer
REST Interface
http://www2009.eprints.org/cgi/rest/eprint/255/
ORE Resource Map
ORE was described in the Linked Data Workshop. View Resource Map
Export Record As...

Repository Staff Only: item control page | Correct metadata | Add files

About this site

This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.

Add your Slides, Posters, Supporting data, whatnots...

If you are presenting a paper or poster and have slides or supporting material you would like to have permentently made public at this website, please email cjg@ecs.soton.ac.uk - Include the file(s), a note to say if they are presentations, supporting material or whatnot, and the URL of the paper/poster from this site. eg. http://www2009.eprints.org/128/

Add workshops

It's impractical to add all the workshops at WWW2009 by hand, but if you can provide me with the metadata in a machine readable way, I'll have a go at importing it. If you are good at slinging XML, my ideal import format is visible at http://www2009.eprints.org/import_example.xml

Preservation

We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it...

Fun Stuff

OAI:
WWW2009 EPrints supports OAI 2.0 with a base URL of http://www2009.eprints.org/cgi/oai2
JSON
The JSON URL is http://www2009.eprints.org/cgi/json?callback=function&eprintid=number

To prevent google killing the server by hammering these tools, the /cgi/ URL's are denied to robots.txt - ask Chris if you want an exception made.

Feel free to contact me (Christopher Gutteridge) with any other queries or suggestions. ...Or if you do something cool with the data which we should link to!

Handy Tools

These are not directly related to the EPrints set up, but may be of use to delegates.

Social tool links
I've put links in the page header to the WWW2009 stuff on flickr, facebook and to a page which will let you watch the #www2009 tag on Twitter. Not really the right place, but not yet made it onto the main conference homepage. Send me any suggestions for new links.
SplashURL.net
When demoing live websites, use this tool to shorten the current URL and make it appaer real big, your audience can then easily type in the short URL and get to the same page as you. Available as a javascript bookmark