전체 페이지뷰

2013년 2월 13일 수요일

Lucene similar score

More formally, using the nomenclature used by Lucene, the Similarity class out- lines the score that’s computed between a document d for a given query q:
Score(q, d) = coord(q, d) • (norm(q)) ∑ tf(t • in • d) • (idf(t)) • boost(t • field • in • d) • (norm(t, d)) t • in • q

Factor
Description
Score(q,d) : Relevance of query q to a document d.
tf( t in d) : Term frequency of term t in the document.
how many times the term occurs in the document.
Idf(t) : Inverse document frequency of term t across all documents.
a measure of how “unique” the term is. Very common terms have a low idf; very rare terms have a high idf.
Boost(t field in d) : Boost for the field—product of field and document boost factors 
You may use this to statically boost certain fields and certain docu- ments over others.  .as set during indexing
Norm(t,d) : Normalization factor for term t in the document
Normalization value of a field, given the number of terms within the field. This value is computed during indexing and stored in the index norms. 
Shorter fields (fewer tokens) get a bigger boost from this factor.
Coord(q,d) : Score factor based on the number of query terms found in document d
Coordination factor, based on the number of query terms the document contains. 
The coordination factor gives an AND-like boost to documents that contain more of the search terms than other documents.
Norm(q) : Normalization factor for the query
Normalization value for a query, given the sum of the squared weights of each of the query terms.
Most of these scoring formula factors are controlled and implemented as a sub- class of the abstract Similarity class. DefaultSimilarity is the implementation used unless otherwise specified. More computations are performed under the covers of DefaultSimilarity 
If you want to see how all these factors play out, Lucene provides a helpful feature called Explanation. IndexSearcher has an explain method, which requires a Query and a document ID and returns an Explanation object. 
for (ScoreDoc match : topDocs.scoreDocs) {
      Explanation explanation
         = searcher.explain(query, match.doc);
      System.out.println("----------");
      Document doc = searcher.doc(match.doc);
      System.out.println(doc.get("title"));
      System.out.println(explanation.toString());
    }
page126image14444



댓글 없음:

댓글 쓰기