Coveo Knowledge Base – How to Article – CES4-060401-4
CES4-060401-4: Indexing Metadata with User Custom Fields
The information in this technical note applies to:
Coveo Enterprise Search 4
This document explains how to index metadata from various types of documents using Custom Fields.
Metadata is data that informs about the nature of some other data and therefore allows a relevant use of them. For example, the title, subject, author and size of a file are metadata. In addition to this built-in metadata, custom or user-defined metadata can be added to several types of documents. Coveo Enterprise Search can extract metadata from the following types of documents:
· Microsoft Office documents
· Adobe PDF files
· SharePoint list items
· Web pages
· Exchange items
· XML documents
However, Coveo Enterprise Search does not automatically extract user-defined metadata from your documents; therefore, it is not yet available for queries. You must first create a custom field for each type of metadata.
The following steps describe how to index metadata from your documents:
1. Make sure that Coveo Enterprise Search does not already extract the data through system or built-in custom fields. See Understanding System Fields and Built-in Custom Fields.
2. Identify the user-defined metadata information in your documents. Web Documents, Microsoft Office documents , SharePoint items and Adobe Acrobat documents.
3. Go to the Index > Sources and Collections > Custom Fields page to create a custom field with this information
4. Rebuild the source
Here is an overview of the required information:
· Name: specify a name for the custom field. It will then be used in field queries like "@fieldname=fieldvalue"
· Type: metadata is always extracted from documents as text. Conversion is made afterward depending on this parameter and then inserted in index.
· Metadata Name: name of the metafield in the document itself.
You can now perform advanced queries (ex: @docid=7) with the new custom field.
Metadata information in a Web document can be found between the HTML META tags. For example, company XYZ adds 2 specific pieces of information in their documents: the department name and document internal identifier.
<html>
<head>
<meta name="Department" content="Sales">
<meta name="DocId" content="7">
</head>
<body>
</body>
</html>
You can view, add or modify metadata in Microsoft Office documents. Click Properties on the shortcut menu and select the Custom tab.

It is not possible to view metadata information in Adobe Acrobat Reader. But a lot of other PDF editors allow you to view the information. For example with Adobe Acrobat:

In SharePoint lists, columns contain metadata.

In SharePoint Document Libraries, metadata about a document is stored in its SharePoint list item and in its internal structure (Word custom properties, PDF properties, etc.). Both are be indexed but if a metadata name collision occurs, priority is given to the SharePoint item metadata.
The displayed name of the column may not be its real internal name, as needed by a Custom Field. To see the real name, sort the list by the column (click on the columns displayed name) and locate SortField=Name in the address bar of the browser. The name could be encoded with percent signs you must decode it. For example, a column's display name could be My Column but its real name is My_x0020_Column. In the address bar, it would be shown as My%5fx0020%5fColumn. The useful value to use as the custom field's metadata name is My_x0020_Column.
Here is the list of rules that must be followed when creating custom fields:
1. All custom fields that have the same name, even if they are in different sources, must be of the same type.
2. If you want to change a custom field type, you must:
a. Stop indexing all sources with the custom field.
b. Delete the custom field from all these sources.
c. Rebuild all these sources.
d. Compact the index.
e. Recreate the custom field for all these sources with the new data type.
f. Rebuild all these sources.
|
Last Reviewed |
2006/04/01 |
|
Keywords |
Metadata, Custom Fields |