[ChimeText] (reminder, tomorrow!) Yahoo! talks
Min-Yen Kan
knmnyn at gmail.com
Thu Jul 24 09:34:14 SGT 2008
Dear all:
Three reminders (two in the next emails) about seminars tomorrow. See
you at today's MSRA talks.
Min
***VENUE CHANGED TO SR1 (COM1 02-06)
Yahoo! Research Labs talks / Recent Research in NLP / IR at YRL
Talk Overviews (times are approximate):
9:30-10:00 - Ricardo Baeza-Yates / Towards a Distributed Search Engine
10:00-10:30 - Evgeniy Gabrilovich / Overview of Computational
Advertising
10:30-11:00 - Rosie Jones / Geography in Web Search
11:00-11:30 - Donald Metzler / Predicting when (not) to Advertise
11:30-12:00 - Vanessa Murdock / Diversifying Image Search with User
Generated Content
1. Ricardo Baeza-Yates
Title: Towards a Distributed Search Engine
Abstract: Distributed search engines are often more complex to
implement compared to centralized engines. Distributing a search
engine across multiple sites, however, has several advantages. In
particular, it enables the utilization of less computer resources and
the exploitation of data and user locality. In this presentation we
show the feasibility of distributed Web search engines, by proposing a
model for assessing the total cost of a distributed Web-search engine
that includes the computational costs as well as the communication
cost among all distributed sites. Using examples, we show that a
distributed Web search engine can be more cost effective than a
centralized one, if there is a large percentage of local queries,
which is usually the case. We then present a query-processing
algorithm that maximizes the amount of queries answered locally,
without sacrificing the quality of the results, by using caching and
partial replication. We simulate our algorithm on real document
collections and real query workloads to measure the actual parameters
needed for our cost model, and we show that a distributed search
engine can be competitive compared to a centralized architecture with
respect to cost. This is joint work with Aris Gionis, Flavio
Junqueira, Vassilis Plachouras and Luca Telloli.
Bio: Ricardo Baeza-Yates is VP of Yahoo! Research for Europe and
Latin America, leading the labs at Barcelona, Spain and Santiago,
Chile. Until 2005 he was the director of the Center for Web Research
at the Department of Computer Science of the Engineering School of the
University of Chile; and ICREA Professor at the Dept. of Technology of
Univ. Pompeu Fabra in Barcelona, Spain. He is co-author of the book
Modern Information Retrieval, published in 1999 by Addison-Wesley, as
well as co-author of the 2nd edition of the Handbook of Algorithms and
Data Structures, Addison-Wesley, 1991; and co-editor of Information
Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among
more than 150 other publications. He has received the Organization of
American States award for young researchers in exact sciences (1993)
and with two Brazilian colleagues obtained the COMPAQ prize for the
best CS Brazilian research article (1997). In 2003 he was the first
computer scientist to be elected to the Chilean Academy of Sciences.
During 2007 he was awarded the Graham Medalfor innovation in
computing, given by the University of Waterloo to distinguished
ex-alumni.
2. Evgeniy Gabrilovich
Title: Overview of Computational Advertising
Abstract: Web advertising is the primary driving force behind
many Web activities, including Internet search as well as publishing
of online content by third-party providers. A new discipline -
Computational Advertising - has recently emerged, which studies the
process of advertising on the Internet from a variety of angles. A
successful advertising campaign should be relevant to the immediate
user's information need as well as more generally to user's
background, be economically worthwhile to the advertiser and the
intermediaries (e.g., the search engine), as well as not detrimental
to user experience. At first approximation, the process of obtaining
relevant ads can be reduced to conventional information retrieval,
where one constructs a query that describes the user's context, and
then executes this query against a large inverted index of ads. We
show how to augment the standard IR approach using query expansion and
text classification techniques. We demonstrate how to employ a
relevance feedback assumption and use Web search results retrieved by
the query. We will also survey the numerous challenges and open
research problems posed by computational advertising, such as text
summarization, natural language generation, named entity extraction,
handling geographic names, and others.
Bio: Evgeniy Gabrilovich is a Senior Research Scientist and
Manager of the NLP & IR Group at Yahoo! Research. His research
interests include information retrieval, machine learning, and
computational linguistics. Recently, he co-organized a workshop on the
synergy between Wikipedia and research in AI at AAAI 2008, as well as
co-presented a tutorial on computation advertising at ACL 2008 and EC
2008. He served on the program committees of ACL-08:HLT, AAAI 2008,
WWW 2008, CIKM 2008, JCDL 2008, AAAI 2007, EMNLP-CoNLL 2007, and
COLING-ACL 2006. Evgeniy earned his MSc ad PhD degrees in Computer
Science from the Technion - Israel Institute of Technology. In his
Ph.D. thesis, Evgeniy developed a methodology for using large scale
repositories of world knowledge (e.g., all the knowledge available in
Wikipedia) in order to enhance text representation beyond the bag of
words. URL: http://research.yahoo.com/Evgeniy_Gabrilovich
3. Rosie Jones
Title: Geography in Web Search
Abstract: Web search results are typically based on the user's
search query, without taking other contextual information into
account. However, we can see from user search behavior that for some
search topics the user may prefer results which are geographically
close to home. We will show topics which have a geographical
dependence, as well as others which appear to be geographically
independent. Based on these findings, we propose a more flexible
approach to web search, which in which we prefer a ranking with
results close to the user location when this will best satisfy the
user's information need.
Bio: Rosie Jones is a Senior Research Scientist at Yahoo!. Her
research interests include web search, geographic information
retrieval and natural language processing. She received her PhD from
the School of Computer Science at Carnegie Mellon University. In 2005
she co-organized the SIGIR workshop on lexical cohesion and
information retrieval, and in 2003 she co-organized the ICML workshop
on The Continuum from Labeled to Unlabeled Data in Machine Learning
and Data Mining. She served as a Senior PC member for SIGIR in 2007
and 2008. URL: http://research.yahoo.com/Rosie_Jones
4. Donald Metzler
Title: Predicting when (not) to Advertise
Abstract: In this talk we discuss the problem of whether or not
to show online advertisements. We propose two methods for addressing
this problem, a simple thresholding approach and a machine learning
approach, which collectively analyzes the set of candidate ads
augmented with external knowledge. Our experimental evaluation, based
on over 28,000 editorial judgments, shows that we are able to predict,
with high accuracy, when to show ads for both content match and
sponsored search advertising tasks.
Bio: Donald Metzler is a Research Scientist at Yahoo! Research
in Santa Clara, CA. He obtained his Ph.D. degree in Computer Science
from the University of Massachusetts Amherst in 2007. His research
interests include information retrieval, machine learning, and their
intersection. He is the co-author of Search Engines: Information
Retrieval in Practice, which will be published in the early part of
2009. URL: http://research.yahoo.com/Don_Metzler
5. Vanessa Murdock
Title: Diversifying Image Search with User Generated Content
Abstract: Large-scale image retrieval on the Web relies on the
availability of short snippets of text associated with the image. This
user-generated content is a primary source of information about the
content and context of an image. While traditional information
retrieval models focus on finding the most relevant document without
consideration for diversity, image search requires results that are
both diverse and relevant. This is problematic for images because they
are represented very sparsely by text, and as with all user-generated
content the text for a given image can be extremely noisy.
The contribution of this paper is twofold. We show that it is
possible to minimize the trade-off between precision and diversity,
relevance models offer a unified framework to afford the greatest
diversity without harming precision. Furthermore we show that
estimating the query model from the distribution of tags favors the
dominant sense of a query. Relevance models operating only on tags
offers the highest level of diversity with no significant decrease in
precision.
Bio: Vanessa Murdock currently holds a Post Doc position at
Yahoo! Research Barcelona. Her current work focuses on retrieval of
short texts, such as for advertisements, and user-generated content
for images and video. She completed her PhD in 2006 at the University
of Massachusetts, working with W. Bruce Croft. Her thesis, focusing on
sentence retrieval for applications such as Question Answering,
novelty detection, and information provenance, was recently published
as a book "Exploring Sentence Retrieval. URL:
http://research.yahoo.com/Vanessa_Murdock.
Upcoming Talks:
24 Jul: MSRA NLP Research Labs talks: 2 talks on
1) Ming Zhou / Generating Chinese Couplets using a Statistical MT Approach
2) Chin Yew-Lin / Web Scale Question Answering -- SQuAD
25 Jul: Yahoo! Research Labs talks: 5 talks on
1) Ricardo Baeza-Yates / Distributed Information Retrieval
2) Evgeniy Gabrilovich / Overview of Computationa lAdvertising
3) Rosie Jones / Geography in Web Search
4) Donald Metzler / Predicting when (not) to Advertise
5) Vanessa Murdock / Diversifying Image Search with User Generated Content
25 Jul: Dell Zhang / Learning to Classify Networked Entities
25 Jul: William Chang (CTO of Baidu) / The WWW in China and Three
Generations of Intelligent Search
28 Jul: Qiu Long / Context for Semantic Similarity
More information about the ChimeText
mailing list