Red Hat Bugzilla – Bug 109200
Need better handling of stopwords and wildcard expansion for Intermedia search
Last modified: 2007-04-18 12:59:10 EDT
Description of problem:
Need better handling of stopwords and wildcard expansion for
This is only an issue if wildcard expansion of query words is used.
For the London 5.2 branch, we added % to the end of query words. Looks
like the current implementation doesn't do the wildcard part, but
since the Rickshaw implemention will eventually do this, you'll
probably need to do something similar.
From the london 5.2 checkin comment:
Change 37698 by scott@sseago-london-camden on 2003/11/05 11:03:27
Strange things happen when we try to do wildcard searches on
stopwords. A search including the term "at" will simply ignore the
term. But with "at%", the the expansion will occur, and "at"
will be removed. So if the user searches for "foo at", if we use
wildcards (and thus "foo% AND at%), then "foo attention" will match,
but "foo at" will fail. We need to add the full default stoplist to
the list of words to escape (and not apply wildcards). Strictly
speaking, we don't need to escape stopwords which aren't keywords,
but there's no harm in doing so, and we don't need to keep two
different lists. I've included the full default English stoplist in
In reality, this is not an ideal approach. It only works for an
English database, as the default stoplist is different depending on
the language settings. In addition, stopwords can be added or removed
from the stoplist. Ideally we'd be able to query Oracle for the
currently active stoplist, although I don't know if this is possible.
Version-Release number of selected component (if applicable):
5.2, will be applicable to rickshaw (and possibly troika) if wildcards
always, if wildcards are active
Steps to Reproduce:
1. Implement wildcard query expansion (if not on London 5.2)
2. Create an item with a title of "IT Policies". Make sure there are
no other words in this item beginning with "it..."
3. Create another item "Fear itself"
4. After index update, a search for "IT Policies" will not find
anything. A search for "Policies" will find the "IT Policies" document.