Items from Semantic/Data Web track
Number of items: 9.
and Cattuto, Ciro
and Menczer, Filippo
and Benz, Dominik
and Hotho, Andreas
and Stumme, Gerd Evaluating Similarity Measures for Emergent Semantics of Social Tagging.
Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We ﬁnd that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.
and Grinev, Maxim
and Lizorkin, Dmitry Extracting Key Terms From Noisy and Multi-theme Documents.
We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch up into densely interconnected subgraphs or commu- nities, while non-important terms fall into weakly intercon- nected communities, or even become isolated vertices. We apply graph community detection techniques to partition the graph into thematically cohesive groups of terms. We introduce a criterion function to select groups that contain key terms discarding groups with unimportant terms. To weight terms and determine semantic relatedness between them we exploit information extracted from Wikipedia. Using such an approach gives us the following two ad- vantages. First, it allows effectively processing multi-theme documents. Second, it is good at filtering out noise infor- mation in the document, such as, for example, navigational bars or headers in web pages. Evaluations of the method show that it outperforms exist- ing methods producing key terms with higher precision and recall. Additional experiments on web pages prove that our method appears to be substantially more effective on noisy and multi-theme documents than existing methods.
and Haghani, Parisa
and Jost, Michael
and Aberer, Karl
and De Meer, Hermann idMesh: Graph-Based Disambiguation of Linked Data.
We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities – represented by globally identiﬁable declarative artifacts – self-organize in a dynamic and probabilistic manner. Our solution has the following two desirable properties: i) it lets end-users freely deﬁne associations between arbitrary entities and ii) it probabilistically infers entity relationships based on uncertain links using constraintsatisfaction mechanisms. We outline the interface between our scheme and the current data Web, and show how higher-layer applications can take advantage of our approach to enhance search and update of information relating to online entities. We describe a decentralized infrastructure supporting efﬁcient and scalable entity disambiguation and demonstrate the practicability of our approach in a deployment over several hundreds of machines.
and d'Aquin, Mathieu
and Mena, Eduardo Large Scale Integration of Senses for the Semantic Web.
Nowadays, the increasing amount of semantic data available on the Web leads to a new stage in the potential of Semantic Web applications. However, it also introduces new issues due to the heterogeneity of the available semantic resources. One of the most remarkable is redundancy, that is, the excess of different semantic descriptions, coming from different sources, to describe the same intended meaning. In this paper, we propose a technique to perform a large scale integration of senses (expressed as ontology terms), in order to cluster the most similar ones, when indexing large amounts of online semantic information. It can dramatically reduce the redundancy problem on the current Semantic Web. In order to make this objective feasible, we have studied the adaptability and scalability of our previous work on sense integration, to be translated to the much larger scenario of the Semantic Web. Our evaluation shows a good behaviour of these techniques when used in large scale experiments, then making feasible the proposed approach.
and Matsuo, Yutaka
and Ishizuka, Mitsuru Measuring the Similarity between Implicit Semantic Relations from the Web.
Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, information retrieval and analogy detection. For example, consider the case in which a person knows a pair of entities (e.g. Google, YouTube), between which a partic- ular relation holds (e.g. acquisition). The person is interested in retrieving other such pairs with similar relations (e.g. Microsoft, Powerset). Existing keyword-based search engines cannot be ap- plied directly in this case because, in keyword-based search, the goal is to retrieve documents that are relevant to the words used in a query – not necessarily to the relations implied by a pair of words. We propose a relational similarity measure, using a Web search en- gine, to compute the similarity between semantic relations implied by two pairs of words. Our method has three components: repre- senting the various semantic relations that exist between a pair of words using automatically extracted lexical patterns, clustering the extracted lexical patterns to identify the different patterns that ex- press a particular semantic relation, and measuring the similarity between semantic relations using a metric learning approach. We evaluate the proposed method in two tasks: classifying semantic relations between named entities, and solving word-analogy ques- tions. The proposed method outperforms all baselines in a relation classification task with a statistically significant average precision score of 0.74. Moreover, it reduces the time taken by Latent Relational Analysis to process 374 word-analogy questions from 9 days to less than 6 hours, with an SAT score of 51%.
and Fodor, Paul
and Wan, Hui
and Kifer, Michael OpenRuleBench: An Analysis of the Performance of Rule Engines.
The Semantic Web initiative has led to an upsurge of the interest in rules as a general and powerful way of processing, combining, and analyzing semantic information. Since several of the technologies underlying rule-based systems are already quite mature, it is important to understand how such systems might perform on the Web scale. OpenRuleBench is a suite of benchmarks for analyzing the performance and scalability of different rule engines. Currently the study spans ﬁve different technologies and eleven systems, but OpenRuleBench is an open community resource, and contributions from the community are welcome. In this paper, we describe the tested systems and technologies, the methodology used in testing, and analyze the results.
and Polleres, Axel
and Hauswirth, Manfred
and Tummarello, Giovanni
and Morbidoni, Christian Rapid Prototyping of Semantic Mash-Ups through Semantic Web Pipes.
The use of RDF data published on the Web for applica- tions is still a cumbersome and resource-intensive task due to the limited software support and the lack of standard pro- gramming paradigms to deal with everyday problems such as combination of RDF data from different sources, object iden- tifier consolidation, ontology alignment and mediation, or plain querying and filtering tasks. In this paper we present Semantic Web Pipes that support fast implementation of Se- mantic data mash-ups while preserving desirable properties such as abstraction, encapsulation, component-orientation, code re-usability and maintainability which are common and well supported in other application areas.
Suchanek, Fabian M.
and Sozio, Mauro
and Weikum, Gerhard SOFIE: A Self-Organizing Framework for Information Extraction.
This paper presents SOFIE, a system for automated on- tology extension. SOFIE can parse natural language docu- ments, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the exist- ing knowledge and on the new knowledge in order to disam- biguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plau- sibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontolog- ical reasoning in one unified model. Our experiments show that SOFIE delivers high-quality output, even from unstruc- tured Internet documents.
and Dietzold, Sebastian
and Lehmann, Jens
and Hellmann, Sebastian
and Aumueller, David Triplify Light-Weight Linked Data Publication from Relational Databases.
In this paper we present Triplify – a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relational database queries. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web in various RDF serializations, in particular as Linked Data. The rationale for developing Triplify is that the largest part of information on the Web is already stored in structured form, often as data contained in relational databases, but usually published by Web applications only as HTML mixing structure, layout and content. In order to reveal the pure structured information behind the current Web, we have implemented Triplify as a light-weight software component, which can be easily integrated into and deployed by the numerous, widely installed Web applications. Our approach includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of conﬁgurations for common relational schemata and a REST-enabled data source registry. Triplify conﬁgurations containing mappings are provided for many popular Web applications, including osCommerce, WordPress, Drupal, Gallery, and phpBB. We will show that despite its light-weight architecture Triplify is usable to publish very large datasets, such as 160GB of geo data from the OpenStreetMap project.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
Add your Slides, Posters, Supporting data, whatnots...
If you are presenting a paper or poster and have slides or supporting material you would like to have permentently made public at this website, please email
firstname.lastname@example.org - Include the file(s), a note to say if they are presentations, supporting material or whatnot, and the URL of the paper/poster from this site. eg. http://www2009.eprints.org/128/
It's impractical to add all the workshops at WWW2009 by hand, but if you can provide me with the metadata in a machine readable way, I'll have a go at importing it. If you are good at slinging XML, my ideal import format is visible at http://www2009.eprints.org/import_example.xml
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it...
- WWW2009 EPrints supports OAI 2.0 with a base URL of http://www2009.eprints.org/cgi/oai2
- The JSON URL is http://www2009.eprints.org/cgi/json?callback=function&eprintid=number
To prevent google killing the server by hammering these tools, the /cgi/ URL's are denied to robots.txt - ask Chris if you want an exception made.
Feel free to contact me (Christopher Gutteridge) with any other queries or suggestions. ...Or if you do something cool with the data which we should link to!
These are not directly related to the EPrints set up, but may be of use to delegates.
- Social tool links
- I've put links in the page header to the WWW2009 stuff on flickr, facebook and to a page which will let you watch the #www2009 tag on Twitter. Not really the right place, but not yet made it onto the main conference homepage. Send me any suggestions for new links.