What is the text search estimate? Why does it eliminate results?
RecFind's metadata full text search functionality is based on a
complex signature file technique whereby we set bits in an array to
identify words, numbers and dates, etc and then 'point' to the location
of each record that contains the data.
The basic principle of this methodology is that you ALWAYS get
everything you ask for BUT you may also get more than you ask for
(because the result is based on a complex mathematical algorithm and in
some instances, two or more words may result in the same bit pattern).
When the search module returns more than we ask for they are called
'false drops' and we eliminate these from the results with 'post
processing'.
For example, when performing a text search, the first result will be
"An estimate of 1,344 records match the search criteria". This is an
estimate that may contain possible false drops. At this point, the user
decides whether "This is too many I need to click Cancel and redefine my
search criteria to reduce the number of hits" or "This is too few, I
need to click Cancel and refine my search criteria to produce more
hits".
The major advantage of using this technique is that it is a flat time
search. The search time is based on the number of 'words' in the search
criteria, not the size of your database; and we do not need to read the
database, all results are provided by our signature file.
To provide a true count of matching records, we would need to read
each 'hit' and see if it is in fact a 'false drop'. By doing so it would
take time to read the database. It is only after the user says, "show me
the hits" that we actually read the database (the time consuming bit)
and 'double check' each record to see if in fact it does meet the search
criteria. This is when we eliminate the false drops.
There is another reason for a false drop. Although the indexing
algorithm writes the bit patterns for new words, dates, phrases etc, it
does not unwrite the bit patterns for deleted words, dates, phrases etc.
Only the "clear index" process where we recreate the entire signature file
will remove deleted records.
The final reason for eliminating records is that the user doesn't
have the appropriate security to view the record, or the additional
criteria (ie. dates ranges, department codes, etc) remove the record
from the search result.
In summary, the reasons for eliminating a result are:
- the complex mathematical algorithm
returned additional results,
- the signature file is out of date and the
record no longer exists in the database, or
- the record doesn't meet the action officer
security or the selection criteria.
» Back to FAQ index