Welcome to the next episode of Whiteboard Wednesdays, our learning series where Coveo experts teach you how to build great search experiences. Check out all of our videos here.
Understanding and knowing how to inspect how a query is executed is fundamental when you want to examine your search configuration. This episode will explain how to do that.
In our previous episode, we discussed how and when you need to map metadata into fields as the basis of building a proper index. This episode will help you troubleshoot if your mappings aren’t applied correctly.
Follow along with Wim Nijmeijer, technical evangelist in the Coveo R&D Department.
For the full transcript, read on below.
We’ll start with how to troubleshoot queries. If you can’t find your query results, you can’t check your mappings.
Getting Content into the Index
There are two methods for getting content into your index:
- A connector that pulls your content. An example of pull connectors: Salesforce, Sharepoint, Web, Generic REST.
- A Push or Stream API that pushes your content. An example of push connectors: ecommerce catalogs.
You first must configure your connector to get your content into your index.
Configure the (Pull) Connectors’ Metadata
Find your connector and configure its metadata. We delved deeper into this topic in our previous post, To Map or Not To Map.
Configure the (Push) Connectors’ Metadata
Configuring metadata for Push/Stream API connectors is much easier than configuring a pull connector. In your JSON file, the fields are already contained — you only need to map them properly.
Protip: Always start “large;” so no inclusion/exclusion filters. When your source is indexing properly, you can then add more connector options.
In order to properly troubleshoot indexing problems, you first need to understand what the Indexing Process looks like.
The process starts with (1) using a token or username/password combination to authenticate the connection between the Coveo Platform and the targeted repository.
Common Mistake: You are using the wrong token or username/password combination. Look at the connector documentation and what the best practice for setting up the authentication is.
When the authentication succeeds, (2) a list of documents is retrieved. Then (3) each document is downloaded.
Common Mistake: You are using a username or token that lacks enough privileges. To index content, we sometimes need special permissions. Refer to the connector documentation regarding which permissions are needed.
Each of these steps can fail; especially when setting up a brand new repository, you need to troubleshoot this process carefully.
Once a document has been downloaded, it goes into the Document Processing Manager process.
That will first (1) execute the defined Pre-Extension scripts, (2) followed by the actual conversion of the native format (for example, Word or Excel) to text. Now it’s time to (3) map your metadata fields to Coveo Fields. The last step (4) is executing the defined Post-Extension scripts.
Finally, the content is (5) sent to the index.
Now that we know exactly how the process works, let’s start troubleshooting some common issues that can interrupt indexing.
1. Start the indexing process.
Start the indexing of your source.
|A) Do you see proper activity?||Select your source.Click on ‘Activity’:|
|B) Do you see the status ‘Failed’ appear?||Then expand the row, so that you can see the details.|
If your source contains configuration issues, it normally will only index partially or not at all. Before you can continue, you must first fix the issues described by the error.
After that proceed to the next step:
|C) Do you see progress?||Log Browser, filter on your source name.|
|D) Do you see status ‘Indexing’ appear?|
When content does not reach the ‘Indexing’ stage, they will not be available in the index at all.
Common mistake: People ‘think’ that their content is part of the content being indexed. Make sure you look for the right content. For example: the url https://www.coveo.com/help does not exist. First start by looking at all the content of your source; when that works, move into finding the exact document you are troubleshooting.
|E) Check for the exact URL in the Log Browser. Is it there?||Log Browser, filter on your URL.|
Be careful: sometimes the connector rewrites the URL, so it could be a bit different. Best is to hover over the URL and click (that will copy it to the clipboard).
Put the copied url in the box ‘Search an exact item URL’.
|F) Do you see status ‘Indexing’ appear?|
|You have found your document in the Log Browser.||Move to step 2. Inspect your content without any filters.|
|You have NOT found your document in the Log Browser||Your content is not indexed. Check for errors in the Activity of your source.|
Fix the errors, Rebuild and try again.
2. Inspect your content without any filters.
To validate if the content was properly indexed, we’ll inspect it in the Content Browser. We first start examining the content without any filters applied.
|A) Open the Content Browser and filter on your source.||Open the Content Browser.Filter on your source.|
|B) Make sure no filters are set by the pipeline||Select pipeline: ‘Empty (for internal testing)’|
|C) Content is still not displayed||Enable ‘View all content’|
|D) Content is still not displayed||Wait 5 minutes. It might be that the content is not yet submitted to the index.|
Enabling a pipeline filter uses all the settings of that pipeline on your Content Browser. When troubleshooting, you don’t want this. Therefore set your Content Browser to ‘Empty (for internal testing)’ then you are absolutely sure that there is no filter applied.
Indexed content inherits security permissions from their source. The ‘View all content’ feature disables that restriction. You may want to do this in a scenario where, for example, security is not properly applied. As an administrator, you can enable the flag ‘View all content’ to remove that restriction.
During indexing Coveo uses batches of documents to commit to the index. Therefore you sometimes need to be patient (like 5 minutes) before new content is available in the index.
|You have found your document in the Content Browser.||Move to step 3. Inspect your content with the normal filters.|
|You have NOT found your document in the Content Browser||Your content is not indexed. Check for errors in the Activity of your source.|
Fix the errors, Rebuild and try again.
3. Inspect your content with the normal filters.
Now that we are certain that your content is available in the Content Browser, it is time to apply the normal filters as you would normally execute during searching in your search interface.
|A) Apply the pipeline you are targeting||In the dropdown, select your pipeline.|
|B) Is your content still displayed?||If yes, your filters are properly set up. If no, review your ‘Filter’ or ‘Query Parameter Overrides’|
|C) Apply permissions back to the content||Disable ‘View all content’|
|D) Content is not displayed||You do not have access to the documents. If you should have access, proceed to the next step.|
Otherwise, you are all good.
|E) Examine the security permissions on the documents||Enable ‘View all content’|
Select one of the documents.
Double click on it, to open the Properties window.
Then select the ‘Permissions’ tab:
|F) Is your name not listed under the permissions, but it should be?||Update your security cache|
Permissions can be quite complicated. Find more assistance in our Online Help.
|You have found your document in the Content Browser.||Move to step 4. Debug your query.|
|You have NOT found your document in the Content Browser.||Your filters are not properly set up. Review your ‘Filter’ or ‘Query Parameter Overrides’ .|
4. Debug your query.
Your content is available in the index, works in the Content Browser, but still isn’t visible in your normal search interface?
|A) Enable the debug=1 flag||Add &debug=1 to the URL|
In the ‘Query Parameter Overrides’ of your targeted pipeline: add debug (boolean) and set it to true.
Using the debug flag will enable the debug output on your search API request.
Open Chrome Developer Tools, and select the Network Tab.
You should see the v2 network requests: https://…/rest/search/v2
The request will provide you with a response:
This will reveal in the response the following important information (if the fields listed below are not available in your request, then the debug=1 flag is not properly set):
|advancedExpression||Contains the filters (normally the facet selections)||1|
|basicExpression||The expression the end user entered||2|
|constantExpression||The standard filter expression which normally never changes||3|
|disjunctionExpression||The OR expression||4|
|pipeline||The pipeline which executed your Search Request||5|
|userIdentities||The list of users which are stored inside the token. Security trimming is based on these contents.||6|
|warnings||If query parameters were overridden, you will see a warning of that in this list|
The pipeline ( 5 ) can be used to quickly dig into your query pipeline settings. If you test the results in the Administration Console, use the Content Browser and select this specific ( 5 ) pipeline.
|A) Open the Content Browser.||Open the Content Browser.|
|B) Make sure no filters are set by the pipeline||Select pipeline: ‘Empty (for internal testing)|
|C) Remove any security restrictions||Enable ‘View All Content’|
|D) Check if the advancedExpression returns results||Enter 1 in the search box.|
|E) Do you see content?||Your 1 is probably valid. Proceed to G.|
|F) Do you NOT see content?||Your 1 expression is not valid.|
Fix and Back to E.
Common mistake: Your query is wrongly formulated.
For example: people want to filter on the Sales Orders category. They have entered the following query: @category==Sales Orders, which isn’t correct. It essentially translates to: @category==Sales AND Orders. It should have been @category==”Sales Orders”.
|G) Apply the contstantExpression||Clear the search box.|
Enter 3 in the search box.
|H) Do you see content?||Your 3 is probably valid. Proceed to J.|
|I) Do you NOT see content?||Your3expression is not valid.|
Fix and Back to H.
|J) Apply the basic expression||Clear the search box.|
Enter all three expressions:
Separated by spaces.
|K) Do you see content?||Your query is valid|
|L) Do you NOT see content?||Your 2 expression together with 1 3 does not give back any results.|
Check your content; does it really contain what you are looking for?
Not used a lot, but the disjunctionExpression ( 4 ) is an OR part of the full query.
If ( 4 ) is present, you can test your query by entering this expression in the content browser:
( 1 2 3 ) OR ( 4 )
The useridentities from ( 6 ) can be used to check if the right permissions are set on the search interface, but also by examining the content in the content browser.
The first step in troubleshooting mapping is examining your index. Does it contain your indexed content? Our next section walks through how to verify that.
|A) List the name of your original metadata of your repository (as in example table below)||This could be: urlToImage.|
Special case: Sharepoint. You need to know the exact name of your field. Follow this guideline.
No clue what your field is called? Use the script: Download. This will report all the metadatavalues available in the repository in a field called ‘allmetadatavalues’. You need to first create this field in the index!!! And only use it for debugging purposes.
|B) Check if the field is available in Coveo||Is each field created in the Fields of your organization?|
If Not, add them.
|C) Check if the field is configured in your Connector Configuration of your Source||For most pull connectors like Sharepoint or Salesforce, there is not a lot you can do. You must rely on the metadata supplied by the API.|
For Sitemap and Web connectors, you can configure metadata with a Web Scraper configuration file.
The Generic REST API uses a two-step approach to map your metadata:
Your original JSON metadata (which is supplied by your REST API). Map that field to a ‘temporary field’ .
You can see that in the above example: the urlToImage field is available in the JSON of the REST api. This is mapped to mymediaurl.
The mymediaurl metadata is mapped in the ‘mappings’ of the source to a Coveo Field.
|D) Check if the field is mapped in your Source||Check if each field in each source is mapped.|
|E) Make sure the field is visible||Is the field ‘visible’? If the field is set to ‘Not displayable in search results’, it will never show up in the Content Browser or reported in the Search API request|
Example table for troubleshooting your mapping.
|Coveo Source||Original Metadata Name||Coveo Field|
→ maps to id (in REST)
|Mistake||How to identify||How to solve|
|No field defined||Field is not visible in ‘Fields’ section||Create a new FieldMap the Field in your source|
|No mapping created in source||Field is visible in FieldsField is not visible in Content Browser content||Map the Field in your source|
|Field remains empty because mapping was done to a temporary field in Generic REST||Field is visible in FieldsField is not visible in Content Browser content||Map the temporary Field to your Field in your source|
|No content in Field||Field is not visible in Content Browser content||Add content to your Field. Empty Fields will not be indexed!|
And that’s it! Hope to see you back in the next episode.