Lucene implements a variety of advanced search forms

  • 2020-06-23 00:15:58
  • OfStack

Boolean operator

Most search engines provide Boolean operators that allow users to combine queries. Typical Boolean operators are AND, OR, and NOT. Lucene supports five Boolean operators: AND, OR, NOT, plus (+), minus (-). I'll describe the use of each operator next.

OR: If you want to search for documents with the characters A or B, you need to use the OR operator. Keep in mind that if you simply separate two keywords with a space, the search engine will automatically add the OR operator between the two keywords. For example, "Java OR Lucene" and "Java Lucene" are both searches for documents containing Java or Lucene.

AND: If you need to search for documents with more than one keyword, use the AND operator. For example, "Java AND Lucene" returns all documents containing both Java and Lucene.

The NOT: Not operator prevents documents containing keywords immediately after NOT from being returned. For example, if you want to search all documents that contain Java but not Lucene, you can use the query "Java NOT Lucene". But you can't use this operator for just one search term; for example, the query "NOT Java" returns no results.

The plus sign (+) : This operator works just like AND, but only for the search term that follows it. For example, if you want to search for documents that must contain Java but not Lucene, you can use the query "+Java Lucene".

Minus (-) : This operator functions as NOT 1, and the query "Java-ES57en" returns all documents containing Java but not Lucene.

Let's look at how to implement Boolean queries using API, provided by Lucene. Listing 1 shows how to do the query using the Boolean operator.

Listing 1: Using the Boolean operator


//Test boolean operator
public void testOperator(String indexDirectory) throwsException{
Directory dir =FSDirectory.getDirectory(indexDirectory,false);
IndexSearcher indexSearcher = new IndexSearcher(dir);
String[] searchWords = {"Java AND Lucene", "Java NOT Lucene", "JavaOR Lucene",
"+Java +Lucene", "+Java -Lucene"};
Analyzer language = new StandardAnalyzer();
Query query;
for(int i = 0; i < searchWords.length; i++){
query = QueryParser.parse(searchWords[i], "title", language);
Hits results = indexSearcher.search(query);
System.out.println(results.length() + "search results for query " +searchWords[i]);}<p></p>
<p></p>

Domain Search (Field Search)

Lucene supports domain searches, and you can specify which fields (Field) a query will be performed on. For example, if the indexed document contains two fields, Title and Content, you can use the query "Title: Lucene Content: Java" to return all documents containing Lucene on Title and Java on Content. Listing 2 shows how to implement a domain search using API of Lucene.

Listing 2: Implementing a domain search


<p>//Test field search
public void testFieldSearch(String indexDirectory) throwsException{
Directory dir =FSDirectory.getDirectory(indexDirectory,false);
IndexSearcher indexSearcher = new IndexSearcher(dir);
String searchWords = "title:Lucene AND content:Java";
Analyzer language = new StandardAnalyzer();
Query query = QueryParser.parse(searchWords, "title",language);
Hits results = indexSearcher.search(query);
System.out.println(results.length() + "search results for query " +searchWords);</p>
<p></p>

Wildcard search (Wildcard Search)

Lucene supports two wildcards: question mark (?) And the asterisk (*). You can use question mark (?). To do single-character wildcard queries or multi-character wildcard queries using the asterisk (*). For example, if you want to search tiny or tony, you can use the query t? ny "; If you want to query Teach, Teacher and Teaching, you can use the query "Teach*". Listing 3 shows the process of a wildcard query.

Listing 3: Making a wildcard query


<p>//Test wildcard search
public void testWildcardSearch(String indexDirectory)throwsException{
Directory dir =FSDirectory.getDirectory(indexDirectory,false);
IndexSearcher indexSearcher = new IndexSearcher(dir);
String[] searchWords = {"tex*", "tex?", "?ex*"};
Query query;
for(int i = 0; i < searchWords.length; i++){
query = new WildcardQuery(new Term("title",searchWords[i]));
Hits results = indexSearcher.search(query);
System.out.println(results.length() + "search results for query " +searchWords[i]);}</p>
<p></p>

Fuzzy query

The fuzzy query provided by Lucene is based on the edit distance algorithm (Edit distance algorithm). You can add the character ~ to the end of a search term to make a fuzzy query. For example, the query "think~" returns all documents containing keywords similar to think. Listing 4 shows the code for a fuzzy query using API of Lucene.

Listing 4: Implementing a fuzzy query


<p>//Test fuzzy search
public void testFuzzySearch(String indexDirectory)throwsException{
Directory dir =FSDirectory.getDirectory(indexDirectory,false);
IndexSearcher indexSearcher = new IndexSearcher(dir);
String[] searchWords = {"text", "funny"};
Query query;
for(int i = 0; i < searchWords.length; i++){
query = new FuzzyQuery(new Term("title",searchWords[i]));
Hits results = indexSearcher.search(query);
System.out.println(results.length() + "search results for query " +searchWords[i]);}</p>
<p></p>

Range search (Range Search)

Scope searches for documents that match a range of 1 on a field. For example, the query "age:[18 TO 35]" returns all documents with values between 18 and 35 on the age field. Listing 5 shows a return search using API of Lucene.

Listing 5: Test scope search


<p>//Test range search
public void testRangeSearch(String indexDirectory)throwsException{
Directory dir =FSDirectory.getDirectory(indexDirectory,false);
IndexSearcher indexSearcher = new IndexSearcher(dir);
Term begin = new Term("birthDay","20000101");
Term end = newTerm("birthDay","20060606");
Query query = new RangeQuery(begin,end,true);
Hits results = indexSearcher.search(query);
System.out.println(results.length() + "search results isreturned");
}</p>

Related articles: