It is common practice to have a search functionality the web site or intranet. Content Studio support this using AS-components and the underlying fulltext catalogue found in Microsoft SQL Server.

The standard search mechanism consists of a form for entering the search parameter and sending it to Microsoft SQL Server, and a page that handles the resulting data from SQL Server. The AS-component “Search form” creates a search form, and points out the page that should display the search result. The AS-component “Search result for categories” can be used to display the search result. The appearance of the search result table can be controlled in great detail. By default, the result is sorted on relevance.

All search components are also accessible via the API. This method allows even more detailed control over the implementation and presentation.

The standard search syntax means that all included words are used to find a match. There is an advanced search syntax available which allows wildcard characters (asterisk) and the logical operators OR (|), AND (&), NEAR, and NOT (-). If the standard search syntax is used, the search phrase uses the NEAR functionality in SQL to give priority to search patterns where the words are located near each other.

Full-text search indexing

Content Studio relies on the fulltext search engine of Microsoft SQL Server. This fulltext index is only used by fulltext search components and should not be confused with the XML-index technology of Content Studio. Microsoft SQL Server analyzes the data of the published pages, including components and metadata, and creates an internal fulltext catalogue within the database. The operative system service Microsoft Search filters the pages and creates the set of items that can be indexed. Binary objects can not be indexed unless there is an available interpreter module supplied.

All documents within Content Studio are stored in database fields of the type image and ntext. Automatic background updates are not supported for data stored in such fields, and it is therefore necessary to manually initiate updates on a regular basis. An alternative is to use Microsoft SQL Agent to accomplish the same mechanism. When installing Content Studio, the server is set up to update the fulltext serach index every night. An administrator can at any time rebuild the fulltext index using the settings dialog of the web site (root level). The fulltext reindexing update can be either a complete or incremental update.
The Express editions of SQL Server lacks the Microsoft SQL Agent and Management Studio cannot work with fulltext either which makes it more complicated to work with. In the Content Studio administrative interface you can monitore and repopulate the fulltext index but this can be a tedious task. For this version a scheduled job that calls sqlcmd commands is a better alternative in the long run. For more information on this subject see the article How to start a fulltext population from the command prompt How to start a fulltext population from the command prompt in the Content Studio knowledgebase.

Important
Starting with version 5.2 Content Studio supports automatic background fulltext indexing. For this reason SQL Server fulltext indexing jobs are no longer needed!

Documents and categories are by default not searchable, but they are easily made searchable using the the properties. Only documents included in searchable categories and for which the user has browse access can be included in the search results. Whenever a document is changed within Content Studio it becomes invalid for full-text seraching. It will not be available until the full-text index is rebuilt.

By default all custom fields in all EPT document will be indexed. This behavior can be changed through the site setting named system.IgnoreFullTextFields. The setting can contain a comma separated list of EPT fields that will not be full text indexed. An asterisk can be used at the end or at the start of each field. The default value for the setting is CS_*, WF_* which means that all fields except those that starts with CS_ and WF_ will be indexed. The setting is new in CS 5.1.

Implementation example

This is a short and rather generic example to show how to implement web site searching.

Set up search indexing

Assuming there is a category named "News" that we want to include in our web site search. In the Settings section of the properties of the category, there is a checkbox named "Searchable". If the checkbox is checked, the category is searchable. The settings are inherited to all underlying levels.

The indexing is done automatically on a regular basis. This means that changes in the search indexing properties not always are effective immediately. The indexing can be started manually via the properties of the folder root node. Normally, the incremental indexing is enough to get the index up to date, but it may be wise to do a full update if there has been many big changes made.

Add the search input form

The search input form can be placed within any document, but it is mostly used in Presentation templates. Insert the AS component "Search form" and adjust the component setting to your needs. The following parameters are the most important to get a search result.

Show the search result

In the document that was selected to display the search result, insert the AS component "Search results for Categories". The search result is always presented as a table. The layout can be controlled via CSS style sheets. The following parameters should be reviewed.

Searching binary files

The search result is performed only on text based data (such as XML documents). To enable successful searching in pure binary text files (i.e. PDF or Word documents), the SQL Server search engine must be equipped with additional filter modules. For instance, Adobe has a plug-in named "IFilter" to manage PDF files that they supply for free.

Since all data that is stored in Content Studio in XML format, it is always possible to search uploaded binary files if they have attached meta-data. When uploading an image it is therefore also for this reason god to supply a description and other textual facts about the image.

Search modes

Starting with version 5.1 Content Studio supports two search modes, the CONTAINS or advanced mode, which was the only mode supported in earlier versions, and the new FREETEXT or standard mode. These modes can be used both in code as well as in the Search result for categories component. The following table compares the two modes.

Comparison between the two fulltext search modes
Mode Pros Cons
CONTAINS (advanced) Fully supports boolean operation such as AND, NEAR, AND NOT, OR. Makes it possible to use wildcards and to do very precise searchings. Difficult to use for the normal Web site visitor, actually very few visitors uses advanced searching. No support for forms of words or synonyms. If the search term contains any of the "noise" words outside of an specific expression an error occurs.
FREETEXT (simple) Simple and intuitive to use for the visitior. Supports searching in forms of words and expression as well as synonyms and forms of single words. Ignored (noise words) will not generate any errors. No advanced support such as boolean operators or wildcards.

Simple search argument syntax (FREETEXT mode)

The simple search mode is very initutive and suits the vast majority of users. You simply add a list of words you are interested in and Content Studio returns the matching documents. In addition to this you can provide an expression such as "the brown fox" an Content Studio looks for documents where this phrase exists. In addition to this the FREETEXT mode supports synonyms and form so words in such a was that a search for the word Flute also might match articles where words such as Pipe (synonym) and Flutes (forms of) occur. In order for the forms of and the synonym functionality to work one also need to specify the language to use. This is a new option available programtically or via the Search result for categories component.

Advanced search argument syntax (CONTAINS mode)

The advanced search syntax is straight forward for the normal user and offers advanced possibilites for the advanced user. Content Studio has a built in parser that translates the user's input to the awkward syntax used by SQL Server's full-text search commands. If only simple words are entered consider using the new FREETEXT option instead since it also supports forms of words and terms.

In its simplest form the user only enters a single word or a number of words such as Data, Content Studio searches for documents where these words exists (using the NEAR operator). The result is normally returned by relevance in such a way that documents where the word occurs more often and in the beginning of the document come first.

    data database SQL Server

It is also possible to search for expressions and combination of expressions with specific words. Expression must be enclosed between double quotes. In this case, Content Studio searches for documents where a combination of the words SQL and Server and the expression large databases exist, preferably near each other.

    SQL Server "large databases"

A number of boolean operators, to be used together with expressions, are available. An expression is a sequence of words surrounded by quotes. Below the table are a number of examples.

Content Studio search operators
Operator Alternative syntax Meaning
NEAR ~ or (blank) Searches for documents where both of the surrounding expressions exist, as near each other as possible. This is the default operator and is used between each word or expression if no other operator is specified.
OR | Searches for documents where either (or both) of the surrounding expressions exist.
AND & or + Searches for documents where both of the surrounding expressions exist.
AND NOT - (hyphen) Searches for documents where the first expression exists but not the second expression.

Search for documents that contains either of the words Data or Database.

    Data OR Database

Search for documents where the word Database exists near the expression SQL Server 2005 but the expression My SQL does not occur.

    Database "SQL Server 2005" - "My SQL"

Wildcard matching

An unspecified number (possibly none) of unspecified characters can be matched with the wildcard character denoted with an asterisk (*). The wildcard can only occure at the end of a word.

The following example searches for any occurrence of words that begin with Data, such as Database and Datafilter, in combination with the expression SQL Server 2005.

    Data* "SQL Server 2005"