Optimizing search strategies

As a concept-based search engine, Sybase Search performs best when you enter queries with search words in context in short phrases rather than as isolated words. If more than one language is in use, repeating the concepts using different words generally improves results. Searching is often an iterative activity: expand and refine queries based on the results returned.

Optimizing the search engine

A concept-based search engine provides greater flexibility than traditional approaches to free-text searching, such as the Boolean combination of keywords.

For example, a user receives an e-mail message that says:

Following the incident close to Watford railway station in July, we need to assess the damage being done by tree branches tangling in overhead power lines or falling onto the tracks.

The user wants to locate documents matching the e-mail message. Using a traditional search method, he or she might enter something similar to:

branches AND lines AND tracks

In this query, the user is using the Boolean operator “AND” to filter the information. This type of query is very precise and is helpful when:

In practice, it is more common that users are unsure of how to precisely formulate their query, which introduces ambiguity, and less relevant search results. Different vocabulary used to describe similar concepts can also result in important documents being missed, and too many irrelevant documents being returned.

If the user is searching a large database of documents, a query like the one in the previous example may retrieve a large number of items, many of which are not relevant to the specific query due to the search for a small number of specific, isolated words. Words like “branches” and “lines” are ambiguous and are common in a database of documentation concerning the railway system.

Querying a number of concepts

Sybase Search is better suited to a query that contains a number of concepts and uses unambiguous language, thus increasing the likelihood that the user retrieves results that are relevant to the query.

Using the previous e-mail example, isolate the key concepts, which are:

Irrelevant concepts might include:

Inclusion of irrelevant concepts distorts the search and may introduce some unwanted documents. An example of a query that is more effective than the AND query, above, is:

damage being done by tree branches, tangling of overhead
power lines, falling tree branches, obstruction and
damage to tracks

NoteYou do not need to delimit concepts, commas are used here only for clarity.

This query contains all of the key concepts in the original query and expresses them using words in context. Results returned by this query are likely to produce significantly better results.

Adding variations

It is possible that some relevant documents will still be missed, due to differing vocabulary. Therefore, if you expand the original concepts to include variations that you assume may tend to occur, this may produce a query similar to:

damage being done by tree branches, tangling of overhead
power lines, falling tree branches, obstruction and
damage to tracks, forestry, wind damage, storm damage,
damage to rails, lines being pulled down by trees blown
over

At first, this may seem more confusing and less precise than the previous examples, but it contains additional ways of defining the original concepts. You may find that no documents achieve a 100% relevance score with this query because no document includes all of these combinations. However, the most relevant documents appear at the top of the results list.

Often, you can improve search results by feeding back information from documents discovered by the system. For example, if a search produces a document that is relevant but the terminology used in the extracted summary is different from the search text, try expanding the original query by appending words or phrases from the document search results. The search becomes more accurate as you provide additional information.

Improving relevance

Sybase Search automatically determines the documents that are more relevant than others. This decision is based on the information extracted from all the documents that are indexed by Sybase Search. Part of the relevance calculation assigns an internal weighting for each term in the search query. Depending on the search results, you may want to manually adjust the query term weighting to bias the search results in favor of a particular query term.For example, Sybase Search has indexed many documents about trains and railway accidents, and incidents. A typical query to find documents about tree branches causing damage to either trains or track:

damage being done by tree branches

Sybase Search can return relevant documents about damage to branches in the rail tracks that were not caused by trees. This can occur if Sybase Search has indexed documents that are exclusively about “damage to branches in railway tracks,” while documents about “tree branches causing damage” include sections about other topics. The second set of documents include relevant matching sections; however, overall, these documents are not as relevant, and are therefore assigned lower relevance scores.Based on the search results, you can decide to place more emphasis on “tree” damage as opposed to other types of damage. You can use custom term weighting to make your search results more relevant to documents that have references to trees:

damage being done by ctw{tree,5} branches

Depending on the results from term-weighted search, you can further adjust the custom term weighting to get appropriate emphasis on the term “tree”, for example, or any other term to which you want to assign more importance, or a greater weight. See “Setting Text Manager parameters” to know about how to use custom term weighting in your queries.

Alternative results for a query

Sybase Search includes query expansion functionality that allows alternative matching for your search results; that is, relevant documents that might not have been suggested from your original query. You can adjust the strength of the query expansion to ensure the search results are either close to the original results or more different.