Coveo Knowledge Base – Best Practices – 040330-2

CES040330-2: Ranking Optimization

The information in this technical note applies to:

Coveo Enterprise Search 3.5+
Coveo Enterprise Search 4

Best Practices for Ranking Optimization

Ranking is the art of sorting results according to their relevancy to the submitted query. Although Coveo Enterprise Search was optimized to reach high ranking accuracy, improvements can still be realized by the content manager of the organization as each organization has its own set of rules and practices related to the document production and knowledge sharing on the corporate network. So as to better fit these particularities, ranking optimization techniques can be initiated and maintained over time.

Optimization approaches

There are two approaches to tackling ranking optimization with Coveo Enterprise Search.

·          Refining the ranking parameters used by the Coveo Enterprise Search ranking module according to the company’s specific needs or requirements.

·          Optimizing the document production process and the structure of their localization on the network.

 

Refining Coveo Enterprise Search’s ranking parameters

Coveo Enterprise Search is installed and set-up with pre-tuned ranking weights that are likely to be satisfying most of the time. Nevertheless, fine-tuning these factors can sometimes increase accuracy, especially for specific situations. Suppose for instance that Coveo Enterprise Search indexes all of the press releases and news related to competitors of Super Car Audio’s, a fictitious company. It is likely that the document creation and modification dates are very decisive ranking criteria for such an index. Another organization might have other requirements that make freshness a less important factor.

Each ranking factor's value is relative to the other factors' weight. The resulting ranking will be the same whether all multipliers are set to 4 or if they are all set to 2. Nevertheless, it is possible to give an even higher weight to a factor by setting it to 7 and all others to, for instance, 2. Indeed, the relative weight of 7 against 2 is much higher than 7 against 4.

A multiplier of 0 completely dismisses a factor from the ranking process.

Note: Having all factors set to 4 does not mean that they will all generate an equal contribution to the ranking process. For instance, the “Term is bolded” factor is far less significant than the “Term proximity” criterion. All weights were balanced using hidden weights so that each factor starts with the same default multiplier value of 0 to 7. As a consequence, even if “Term is bolded” is set to a greater value than “Term proximity”, the latter is likely to still weigh more in the resulting ranking since its hidden weight is much higher.

Guidelines for refining each ranking parameter are available from the application. Please refer to the online help.

Document Production Best Practices to Improve Ranking

Several good practices can be implemented by an organization to improve ranking. This section provides tips and tricks that can be adopted by document authors and those responsible for archiving documents on the corporate network.

·          Files and folders naming conventions.  As seen in the previous section, some of the ranking parameters are related to the document path. This path is made of the concatenation of several folder names and one filename. Thus, to ensure ranking accuracy, choose suitable and meaningful names for folders and files. A simple and easy method to clarify names is to insert spaces within file and folder names. For instance, the query “Super” would not be matched against “c:\Superaudio\docs” but it would against “c:\Super_audio\docs” or “c:\Super audio\docs”. In addition, insert both acronyms and their corresponding meaning, such as “c:\companies\Super car audio – SCA\docs” whenever possible. As a result, both the query “Super car audio” and “SCA” will match this path.

·          Create a small organizational vocabulary and use Metadata production tools. There are often equivalences among the terms used by people within an organization. For instance, the acronym “SCA” might be used instead of “Super car audio”. Nonetheless, the query “SCA” would not match any documents that only contain “Super car audio” and never mention the acronym “SCA”. To remedy this problem, metadata content can be created to be indexed when such interchangeabilities occur. For instance, an HTML file that contains “SCA” in the body text can have a META KEYWORDS set to “Super car audio” (Microsoft Office™ documents also have a “Keywords” field that can be set in the document properties). In both cases, the field contents are indexed and can be searched for with Coveo Enterprise Search. See figure below for an example of well-defined Microsoft Word™ document properties.

·          Ensure that document titles (.doc, .pdf) are properly set. It is natural for users to search for documents using the expected document title. Hence, the title is handled separately from the rest of the text by the ranking module. However, this title must be set properly by document authors (for instance, Microsoft Word™ titles are set through the document properties; otherwise the software will try to guess the title of the document using the first sentence). If titles are rightly set, a query using a document title is almost guaranteed to have the right result returned in the first position. See figure below.

 

Figure 2. Document properties

·          Define "Top Results". Some queries might be so ambiguous that they are bound to fail whatever ranking tuning process is applied. A Coveo Enterprise Search mechanism named "Top Results" will overcome the problem. As a matter of fact, type a query (or distinct queries) and the designated result(s) in the Top Result section of the Administration Tool and the corresponding URI(s) will be returned at the top of the result list. The rest of the list will be ranked normally.

Other hints

·          Use multiple sources instead of a single large one. Each source can be rated (for instance, by the organization’s knowledge manager) according to the general estimated relevance of the documents contained in the source. Rating a source above others will favor documents found in this source. See source rating guidelines in the Coveo Enterprise Search online help.

·          Keep titles short and accurate. Title/query matching is an important part of the ranking process. In the title score calculation, the proportion of the title that matches the query is taken into account. That is, a document titled “Super Car Audio – SCA Car Audio – International Order Form Access Page” would get less title points than a document titled “International Order Form” for the query “Order Form”, since a larger part of the title of the latter document matches the query.

 

 

Last Reviewed

2006/03/30

Keywords

Ranking, Optimization, Hints