Items where author is affiliated with Georgia Institute of Technology
Number of items: 3.
and Zhou, Ke
and Xue, Gui-Rong
and Zha, Hongyuan
and Yu, Yong Enhancing Diversity, Coverage and Balance for Summarization through Structure Learning.
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document’s main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisﬁes the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate signiﬁcant performance improvements in terms of F1 and ROUGE metrics.
and Liu, Yandong
and Zhou, Ding
and Agichtein, Eugene
and Zha, Hongyuan Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement.
Community Question Answering (CQA) has emerged as a popular forum for users to pose questions for other users to answer. Over the last few years, CQA portals such as Naver and Yahoo! Answers have exploded in popularity, and now provide a viable alternative to general purpose Web search. At the same time, the answers to past questions submitted in CQA sites comprise a valuable knowledge repository which could be a gold mine for information retrieval and automatic question answering. Unfortunately, the quality of the submitted questions and answers varies widely - increasingly so that a large fraction of the content is not usable for answering queries. Previous approaches for retrieving relevant and high quality content have been proposed, but they require large amounts of manually labeled data – which limits the applicability of the supervised approaches to new sites and domains. In this paper we address this problem by developing a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process. Results of a large scale evaluation demonstrate that our methods are more effective than previous approaches for ﬁnding high-quality answers, questions, and users. More importantly, our quality estimation signiﬁcantly improves the accuracy of search over CQA archives over the state-of-the-art methods.
and Xue, Gui-Rong
and Yu, Yong
and Zha, Hongyuan Web-Scale Classification with Naive Bayes.
Traditional Naive Bayes Classiﬁer performs miserably on web-scale taxonomies. In this paper, we investigate the reasons behind such bad performance. We discover that the low performance are not completely caused by the intrinsic limitations of Naive Bayes, but mainly comes from two largely ignored problems: contradiction pair problem and discriminative evidence cancelation problem. We propose modiﬁcations that can alleviate the two problems while preserving the advantages of Naive Bayes. The experimental results show our modiﬁed Naive Bayes can signiﬁcantly improve the performance on real web-scale taxonomies.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it... [this has now happened, this site is now static]