VistaDB 5
What is a full text search index?

Understanding the intent and purpose of full text search indexes will improve how you use and build them into your applications. Full text search is a very simple concept, but can have a very large impact on database performance.

There are a number of techniques used in Computer Science for building and maintaining indexes on full text search terms. The following is not intended to show how our FTS is implemented, but rather to give an example of the concepts.

The following is a simple example of how a full text index might be created for the following database rows.

DocumentID Text
1 Pease porridge hot, pease porridge cold,
2 Pease porridge in the pot
3 nine days old
4 some like it hot, some like it cold,
5 some like it in the pot
6 nine days old

A full text search would first split each word and build an index that might look like this:

IndexID Word DocumentIDs
1 cold 1,4
2 days 3,6
3 hot 1,4
4 in 2,5
5 it 4,5
6 like 4,5
7 nine 3,6
8 old 3,6
9 pease 1,2
10 porridge 1,2
11 pot 2,5
12 some 4,5
13 the 2,5

Performing a lookup for CONTAINS(word, 'the AND some') would return documentID 5 since it is the only documentid that contains both words.
'the OR some' would return 2,4,5 since they are contained in all three documentids.

As you can see building an index over large quantities of text can generate very large indexes.

Compare LIKE to full text search
But to compare the lookup paths between a full text search and a LIKE predicate in the above scenario consider the following:

CONTAINS(word, 'the AND some') - generates 2 index scans in the FTS index
LIKE IN ('the', 'some') - generates 9 row loads, then performs 18 string index operations.


For an excellant book on the topic of Indexes and Inverted File Indexing I highly recommend the book "Managing Gigabytes" - Second Edition by Witten, Moffat, Bell.
See Also