Each database engine has its own support for full-text searches. See the relevant section below for each database engine.
Advantage Database Server
The search conditions used for full text searches are character strings composed of words, phrases, logical operators, and parentheses for grouping precedence. Aside from white space characters, which are considered to be delimiters in search strings, special characters include the double quote (0x22) used for delimiting phrases, the asterisk (0x2A) used for prefix, postfix, and substring matching, and parentheses (0x28 and 0x29). The reserved words include the logical operators OR, AND, NOT, NEAR.
The precedence of the logical operators from lowest to highest is OR, AND, NOT, NEAR. You can use parentheses to change the precedence grouping. For example, the search condition ‘meeting and request or changelog’ finds records that have both the words "meeting" and "request" or the word "changelog". Changing the precedence with parentheses to ‘meeting and (request or changelog)’ causes the search condition to find records that have the word "meeting" and either of the words "request" or "changelong".
The logical OR operator produces a "true" result when either operand evaluates to true. The logical AND operator produces a true result only when both of its operands evaluate to true. The logical NOT operator produces a true result when its single operand has a false result. The NEAR operator (proximity operator) is similar to the AND operator in that both of its operands must be found. In addition, it requires that its operands be within a certain physical distance of each other for it to return a true result. The default distance for the NEAR operator is 8 words. To use a proximity value other than the default, specify the distance as a parameter to the NEAR operator. For example ‘medical near(15) doctor’ will evaluate to true for records where the word "medical" is within 15 words of "doctor".
Search words that do not have logical operators specified are assumed to have an implied AND operator between them. For example, the following two search conditions are equivalent:
‘mechanical engineer’
‘mechanical AND engineer’
And the following two are equivalent:
‘computer programmer or software developer’
‘(computer AND programmer) OR (software AND developer)’
Individual search words that are not enclosed in double quotes can be searched for as exact matches, prefix matches, postfix matches, or substring matches. This behavior is controlled through the use of the asterisk (*) character.
- Exact match - Simply specify the word in the search condition. The word will be matched if it is found exactly as given in the text. For example, the search word "special" will match only "special" in the text (or upper case versions if it is not a case sensitive index).
- Prefix match - Place an asterisk at the end of the word to match all words that begin with the given characters. For example, the search word "special*" will match "special","specialty", "specialization", etc.
- Postfix match - Place an asterisk at the beginning of the word to match all words that end with the given characters. For example, the search word "*ation" will match words"station", "specialization", "citation", etc.
- Substring match - Place an asterisk at both the beginning and end of the word to match all words that contain the given characters. For example, the search word "*lock*" will match the words "locker", "antilock", "blocking", etc.
Important Note: Exact matches and prefix matches provide the best performance. Both postfix and substring matches require that the entire FTS index be scanned in order to satisfy the search. This is still much more efficient than searching the actual data, but it is less efficient than prefix and exact matches, which can be resolved with O(Log N) searches.
Double quotes are used as phrase delimiters. In addition, special characters can be enclosed inside double quotes to ensure that they are unchanged by the search condition parser. For example, to search for a parenthesis, it is necessary to enclose it in double quotes in order to keep it from being treated as a precedence operator in the search condition itself. To search for a physical double quote, use two of them in a row. Single quotes are not treated as a special case by the FTS parser, but they are the text delimiter in SQL statements. This means that if you use single quotes inside FTS search conditions in an SQL statement, you must include two of them in a row.
Spaces are considered to be delimiters in search conditions. If you have a special situation in which spaces are not delimiters in the text and are part of search words, then it may be necessary to enclose the search words in double quotes in order to preserve the spaces. Multi-word phrases in search conditions match identical phrases in the text being searched. For example, the search condition ‘ "alpine skiing" ’ will match only records that have the exact words "alpine" and "skiing" in that order in the text with no other non-noise words between them.
If you use the NEAR operator with sub-expressions, you may need to use the optional form of the operator with the parentheses after it to avoid ambiguity in the expression. For example, the condition "a near (b and c)" is not valid because the left parenthesis is interpreted as the opening of the NEAR proximity parameter. The condition would need to be written as "a near() (b and c)". The empty parentheses after the NEAR operator remove the ambiguity; because they are empty, the default proximity value is used. You can also specify a proximity value: "a near(25) (b and c)".
The following is a more formal grammar definition for the search conditions. The square brackets indicate optional items.
search-condition ::= <and-term> | <and-term> OR <search-condition>
and-term ::= <not-term> | <not-term> [AND] <and-term>
not-term ::= <near-term> | NOT <near-term>
near-term ::= <value-term> | <value-term> NEAR [<near-distance>] <near-term>
near-distance ::= ( [integer] )
value-term ::= ( <condition> ) | <simple-term>
simple-term ::= text | text* | *text | *text* | "phrase"
MySQL
By default, article text is stored as BLOB data when using MySQL with Newsman Pro. In order to enable full-text seaches, we must first convert the BLOB data field to TEXT. This is done automatically when you enable full-text search indexes from within the Global Body Search screen. We don't expect that this change will cause problems, but it could if the collation you use is incompatible with certain characters that can be found in the body of a message. Therefore, if you find that you have problems while downloading text articles after enabling full-text search indexes, it is recommended that you turn them back and revert the data field to a BLOB data type.
The boolean full-text search capability supports the following operators:
- + A leading plus sign indicates that this word must be present in each row that is returned.
- - A leading minus sign indicates that this word must not be present in any of the rows that are returned. Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return “ all rows except those containing any of the excluded terms. ”
- (no operator) By default (when neither + nor - is specified) the word is optional, but the rows that contain it are rated higher.
- > < These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it. See the example following this list.
- ( ) Parentheses group words into subexpressions. Parenthesized groups can be nested.
- ~ A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. This is useful for marking “ noise ” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator.
- * The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.
- " A phrase that is enclosed within double quote (‘ " ') characters matches only rows that contain the phrase literally, as it was typed . The full-text engine splits the phrase into words, performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include non-word characters in the phrase. As of MySQL 5.0.3, non-word characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase" in MySQL 5.0.3, but not before.
If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.
The following examples demonstrate some search strings that use boolean full-text operators:
- 'apple banana' - Find rows that contain at least one of the two words.
- '+apple +juice' - Find rows that contain both words.
- '+apple macintosh' - Find rows that contain the word “ apple ”, but rank rows higher if they also contain “ macintosh ”.
- '+apple -macintosh' - Find rows that contain the word “ apple ” but not “ macintosh ”.
- '+apple ~macintosh' - Find rows that contain the word “ apple ”, but if the row also contains the word “ macintosh ”, rate it lower than if row does not. This is “ softer ” than a search for '+apple -macintosh' , for which the presence of “ macintosh ” causes the row not to be returned at all.
- '+apple +(>turnover <strudel)' - Find rows that contain the words “ apple ” and “ turnover ”, or “ apple ” and “ strudel ” (in any order), but rank “ apple turnover ” higher than “ apple strudel ”.
- 'apple*' - Find rows that contain words such as “ apple ”, “ apples ”, “ applesauce ”, or “ applet ”.
- '"some words"' - Find rows that contain the exact phrase “ some words ” (for example, rows that contain “ some words of wisdom ” but not “ some noise words ”). Note that the ‘ " ' characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotes that enclose the search string itself.
Microsoft SQL Server
Newsman Pro supports creating and rebuilding full-text indexes for "SQL Server 2005" and "SQL Express 2005 with Advanced Services". Creation and maintenance of full-text indexes for all previous versions of SQL Server must be performed using Enterprise Manager. Please note that "SQL Express 2005" (without Advanced Services) does not support full-text index searches at all.
Newsman Pro can perform the following types of full-text searches:
- A word or phrase.
- The prefix of a word or phrase.
- A word near another word (proximity search).
The following operators can be used in your searches:
AND | &
Indicates that the two search words or phrases must be met for a match. The ampersand symbol (&) may be used instead of the AND keyword to represent the AND operator.
AND NOT | &!
Indicates that the second search word or phrase must not be present for a match. The ampersand followed by the exclamation mark symbol (&!) may be used instead of the AND NOT keyword to represent the AND NOT operator.
OR | |
Indicates that either of the two search words or phrases must be met for a match. The bar symbol (|) may be used instead of the OR keyword to represent the OR operator.
When <search_condition> contains parenthesized groups, these parenthesized groups are evaluated first. After evaluating parenthesized groups, these rules apply when using these logical operators with contains search conditions:
- NOT is applied before AND.
- NOT can only occur after AND, as in AND NOT. The OR NOT operator is not allowed. NOT cannot be specified before the first term. For example, CONTAINS (mycolumn, 'NOT "phrase_to_search_for" ' ) is not valid.
- AND is applied before OR.
- Boolean operators of the same type (AND, OR) are associative and can therefore be applied in any order.
A word or phrase search typically searches for one or more words, separated by a space. Matches are found if the text contains the search word. If more than one word is used, then matches are found if the text contains the search phrase. Punctuation is ignored, so searching for "computer failure" will match on "Where is my computer? Failure to find it would be expensive."
A prefix search allows you to search for one or more partial words using "*" as a wild-card. Therefore, a query specifying a prefix term of "local wine*" matches any messages with the text of "local winery", "locally wined and dined", and so on.
A NEAR (or, proximity search) allows you to search for a word or phrase that is near another word or phrase. NEAR operates similarly to the AND operator: both require that more than one word or phrase exist in the text being searched. As the words in <proximity_term> appear closer together, the better the match.
NEAR | ~
Indicates that the word or phrase on the left side of the NEAR or ~ operator should be approximately close to the word or phrase on the right side of the NEAR or ~ operator. Multiple proximity terms can be chained, for example:
a NEAR b NEAR c
This means that word or phrase a should be near word or phrase b , which should be near word or phrase c . When several proximity terms are chained, all proximity terms must be near each other. So, in the example a ~ b ~ c , word or phrase a should also be near word or phrase c .