Items where author is affiliated with Yahoo! Laboratories
Number of items: 6.
and Zhang, Ya A Dynamic Bayesian Network Click Model for Web Search Ranking.
As with any application of machine learning, web search ranking requires labeled data. The labels usually come in the form of relevance assessments made by editors. Click logs can also provide an important source of implicit feedback and can be used as a cheap proxy for editorial labels. The main difficulty however comes from the so called position bias — urls appearing in lower positions are less likely to be clicked even if they are relevant. In this paper, we propose a Dynamic Bayesian Network which aims at providing us with unbiased estimation of the relevance from the click logs. Experiments show that the proposed click model outperforms other existing click models in predicting both click-through rate and relevance.
and Dmitriev, Pavel
and Lee Giles, C. Graph Based Crawler Seed Selection.
This paper identiﬁes and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more “good” and less “bad” pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data.
and Park, Seung-Taek Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models.
In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty of timely identifying new items of high-quality and providing recommendations for new users. We propose a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively. We maintain proﬁles of content of interest, in which temporal characteristics of the content, e.g. popularity and freshness, are updated in real-time manner. We also maintain proﬁles of users including demographic information and a summary of user activities within Yahoo! properties. Based on all features in user and content proﬁles, we develop predictive bilinear regression models to provide accurate personalized recommendations of new items for both existing and new users. This approach results in an oﬄine model with light computational overhead compared with other recommender systems that require online re-training. The proposed framework is general and ﬂexible for other personalized tasks. The superior performance of our approach is veriﬁed on a large-scale data set collected from the Today-Module on Yahoo! Front Page, with comparison against six competitive approaches.
and Chen, Bee-Chung
and Elango, Pradheep Spatio-Temporal Models for Estimating Click-through Rate.
We propose novel spatio-temporal models to estimate clickthrough rates in the context of content recommendation. We track article CTR at a ﬁxed location over time through a dynamic Gamma-Poisson model and combine information from correlated locations through dynamic linear regressions, signiﬁcantly improving on per-location model. Our models adjust for user fatigue through an exponential tilt to the ﬁrstview CTR (probability of click on ﬁrst article exposure) that is based only on user-speciﬁc repeat-exposure features. We illustrate our approach on data obtained from a module (Today Module) published regularly on Yahoo! Front Page and demonstrate signiﬁcant improvement over commonly used baseline methods. Large scale simulation experiments to study the performance of our models under different scenarios provide encouraging results. Throughout, all modeling assumptions are validated via rigorous exploratory data analysis.
and Pantel, Patrick
and Popescu, Ana-Maria
and Gabrilovich, Evgeniy Towards Intent-Driven Bidterm Suggestion.
In online advertising, pervasive in commercial search engines, advertisers typically bid on few terms, and the scarcity of data makes ad matching difficult. Suggesting additional bidterms can signiﬁcantly improve ad clickability and conversion rates. In this paper, we present a large-scale bidterm suggestion system that models an advertiser’s intent and ﬁnds new bidterms consistent with that intent. Preliminary experiments show that our system signiﬁcantly increases the coverage of a state of the art production system used at Yahoo while maintaining comparable precision.
and Shiowattana, Dungjit
and Dmitriev, Pavel
and Chan, Su The Web of Nations.
In this paper, we report on a large-scale study of structural differences among the national webs. The study is based on a webscale crawl conducted in the summer 2008. More specifically, we study two graphs derived from this crawl, the nation graph, with nodes corresponding to nations and edges – to links among nations, and the host graph, with nodes corresponding to hosts and edges – to hyperlinks among pages on the hosts. Contrary to some of the previous work , our results show that webs of different nations are often very different from each other, both in terms of their internal structure, and in terms of their connectivity with other nations.
About this site
This website has been set up for WWW2009 by Christopher Gutteridge of the University of Southampton, using our EPrints software.
Add your Slides, Posters, Supporting data, whatnots...
If you are presenting a paper or poster and have slides or supporting material you would like to have permentently made public at this website, please email
firstname.lastname@example.org - Include the file(s), a note to say if they are presentations, supporting material or whatnot, and the URL of the paper/poster from this site. eg. http://www2009.eprints.org/128/
It's impractical to add all the workshops at WWW2009 by hand, but if you can provide me with the metadata in a machine readable way, I'll have a go at importing it. If you are good at slinging XML, my ideal import format is visible at http://www2009.eprints.org/import_example.xml
We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it...
- WWW2009 EPrints supports OAI 2.0 with a base URL of http://www2009.eprints.org/cgi/oai2
- The JSON URL is http://www2009.eprints.org/cgi/json?callback=function&eprintid=number
To prevent google killing the server by hammering these tools, the /cgi/ URL's are denied to robots.txt - ask Chris if you want an exception made.
Feel free to contact me (Christopher Gutteridge) with any other queries or suggestions. ...Or if you do something cool with the data which we should link to!
These are not directly related to the EPrints set up, but may be of use to delegates.
- Social tool links
- I've put links in the page header to the WWW2009 stuff on flickr, facebook and to a page which will let you watch the #www2009 tag on Twitter. Not really the right place, but not yet made it onto the main conference homepage. Send me any suggestions for new links.