VistaDB 5
FTS Concepts and terminology

File Filters

Some database engines allow you to store extended file formats for parsing (Microsoft Word files for example). VistaDB only supports parsing and indexing of plain text at this time.

Full Text Index

This is the index that stores each individual word and their location within a given column. You can only create one full text index per table, but it may contain more than one column. VistaDB does not require a non null unique column to create the index. We use our internal RowID value to determine where in the database the word is located.

Noise or Stop Words

These are frequent occurring words that do not generally help the search. Most search engines today ignore words like a, an, the, it, etc because they occur with such frequency. See the list of VistaDB Stopwords.

Population or Crawl

Building a full text index is sometimes called the population of the index, or the crawl of the table. Until a few years ago this activity was generally handled by large enterprise class systems only. Now it is becoming much more popular with databases. SQL Server and many others require a manual rebuild of the Index, or a service running on the machine to rebuild the index periodically. VistaDB full text indexes are dynamically built during insert.

Stemmer

Stemming a word is the act of taking a given word and breaking it down into the root or singular form for a given language. For example speaking might be stemmed to speak. This generally improved the accuracy of free text search queries, but can cause problems if you only want exact matches.

Token

A single word from a string identified by the word breaker. The act of splitting up a sentence into individual words is called tokenizing. Breaking a single word down from plural to singular, or to the root word is called stemming.

Word Breaker

A word breaker tokenizes text into parts based upon rules for a given language. VistaDB is capable of supporting other languages, but right now is focused on English.

Full Text Catalog

This is a MS specific term. We do not use a catalog. The full text index is kept within the VistaDB database file for ease of deployment.

See Also