1. INTRODUCTION
Traditional search engines, based on exact keyword match, return
too many documents in response to a user query, and most of the
returned documents are irrelevant. Very often the user simply
doesn't know how to formulate the query in a way that expresses
his intention. Furthermore, as observed by the authors of
Scatter/Gather [1], users may not only search documents but also
browse through the collection to discover the general information
content of the corpus. In addition, users generally reject complex
interfaces for formulating advanced queries, and demand a fast
response time. To overcome this problem, major search engines
take a lot of trouble to provide the user with an intuitive interface
to:
- help formulate a query representing his intention
- browse long lists of documents
- discover related topics
Several methods based on Document Clustering [1, 2, 8], Faceted
Categories [6] or more recently Tag Clouds [3, 5, 7], introduced
by the Blog community, are used to satisfy these needs. Google
Labs Suggestion [4], Yahoo! Search Assistant [11] or Clusty
remix clustering [10] are examples of this kind of interface.
In this paper, we describe TopicRank, a Word Clustering based
approach that automatically and dynamically generates an
interactive Tag Cloud related to the user query where the layout of
presented keywords relies on a semantic closeness metric. Used in
this way, in contrast to [6], we found that Tag Clouds are both an
efficient navigational tool and a good tool for understanding
abstract information.
2. TOPICRANK APPROACH
As in SHOC [2], the TopicRank approach meets the Semantic and
Online Clustering requirements but doesn't require the clustering
algorithm to produce a Hierarchical output, and the clustering is
done on words rather than documents.
TopicRank focuses primarily on producing semantically related
clusters of words but does not try to name clusters, thus mitigating
the difficulty of the automatic labeling problem that faces the
Document Clustering approaches.
The proposed algorithm follows the following 3 steps:
1) Candidate Extraction
2) Word Clustering
3) Ranking
댓글 없음:
댓글 쓰기