전체 페이지뷰

2013년 10월 10일 목요일

Searching on unanalyzed fields

from lucene in action second edition

To make sure the text is analyzed, specify Field.Index.ANALYZED or
Field.Index.ANALYZED_NO_NORMS when creating the field. To index the entire field’s
value as a single token, like Field 3 in figure 4.1, pass Field.Index.NOT_ANALYZED or
Field.Index.NOT_ANALYZED_NO_NORMS as the fourth argument.

Note : new Field(String, String, Field.Store.YES, Field.Index.ANALYZED)
creates a tokenized and stored field. Rest assured the original String
value is stored. But the output of the designated Analyzer dictates what’s
indexed and available for searching.

Searching on unanalyzed fields

There are often cases when you’d like to index a field’s value without analysis. For
example, part numbers, URLs, and Social Security numbers should all be indexed and
searched as a single token. During indexing this is easily done by specifying
Field.Index.NOT_ANALYZED or Field.Index.NOT_ANALYZED_NO_NORMS when you create the field. You also want users to be able to search on these part numbers. This is simple if your application directly creates a TermQuery.
But a dilemma can arise if you use QueryParser and attempt to query on an unanalyzed
field; this is because the fact that the field wasn’t analyzed is only known during
indexing. There’s nothing special about such a field’s terms once indexed; they’re just
terms. Let’s see the issue exposed with a straightforward test case that indexes a document with an unanalyzed field and then attempts to find that document again

The TermQuery worked fine, but QueryParser found no results. This issue of Query-
Parser encountering an unanalyzed field emphasizes a key point: indexing and analysis
are intimately tied to searching. The testBasicQueryParser test shows that
searching for terms created using Index.NOT_ANALYZED_NO_NORMS when a query
expression is analyzed can be problematic. It’s problematic because QueryParser analyzed the partnum field, but it shouldn’t have. There are a few possible solutions:

If part numbers or other textual constructs are common lexical occurrences in
the text you’re analyzing, consider creating a custom domain-specific analyzer
that recognizes and preserves them.
􀂃 Subclass QueryParser and override one or both of the getFieldQuery methods
to provide field-specific handling.
􀂃 Use PerFieldAnalyzerWrapper for field-specific analysis.

댓글 없음:

댓글 쓰기