Bug 133942

Summary: cscope inverted index buggy if source files include read errors
Product: [Fedora] Fedora Reporter: Frank Ch. Eigler <fche>
Component: cscopeAssignee: Neil Horman <nhorman>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: fche
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-06 21:00:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frank Ch. Eigler 2004-09-28 15:58:26 UTC
When cscope is run with "-R -q", it (re)builds an inverted index in
addition to the normal index.  This inverted index appears to go bad
with respect to a mapping to file names, if the source directory
contains unreadable source files.  One source of such files are emacs
symlink "locks" that point to nonexistent places, although other
errors can also occur.

What happens then is that an interactive search using the inverted
index identifies the wrong file names with hits.  It is as if the
unreadable files did get an ID reserved in one cscope table, but not
in others.

One can see this effect if one makes a toy directory with a few .c
files, and a symlink like "ln -s /NONEXISTENT file.c".  With "cscope
-R" alone, the file.c nonexistence will be noted during index rebuild,
but will not result in corrupted data.  With "cscope -R -q", the hits
can point to the wrong file.

Comment 2 Frank Ch. Eigler 2004-09-28 17:32:02 UTC
I doubt the behavior is in any way dependent on the OS version.  I'm
pretty sure it's some smallish error handling bug in build.c someplace.

Both invocations note the inaccessibility of the symlink during parse.
 The difference I saw was that during interactive use of a database
built with "-q" also, the filename listed for a search hit was incorrect.

Comment 3 Neil Horman 2004-09-28 17:35:12 UTC
Ah, you're right, I've reproduced it now.  I'll get it fixed ASAP.

Thanks

Comment 4 Neil Horman 2004-09-28 20:11:28 UTC
I've checked a patch into CVS for this bug, and its ready for the next
QA build.

Comment 6 Frank Ch. Eigler 2004-09-28 20:17:19 UTC
BTW the new patch is not quite enough.  Consider the case of an
ordinary read error, like if a source file was "chmod 000".  I believe
the symlink example is just a special case of a more general problem.

Comment 8 Frank Ch. Eigler 2004-09-28 20:30:05 UTC
The same problem does recur with the "chmod 000" real file.

Re what the check should be...  I don't know exactly.  I would follow
the code to see how it handles parsing errors in general - what
control flow ends up in printing that error message to the screen. 
There I'd modify the code in order to make the index exclude the
problematic file.

Comment 9 Neil Horman 2004-10-04 14:52:15 UTC
I think I found the root cause of this problem.  It would appear that
searches in cscope rely on both the srcfiles array, which is a list of
all the files found in a source tree, and the cscope database, which
indexes all the symbols in the files listed in srcfiles.  The problem
is, that a minimal entry is required in the database for every file,
even if it contains no symbols.  The problem is that unreadable files
(as described in this bug), don't get that minimal entry (which is
added in the crossref() function), and as such the database index into
the srcfiles array becomes skewed.  I'm proposing a fix for this in
the public forum right now, and as soon as I get feedback/acceptance
on it, I'll check in the fix here.

Comment 10 Neil Horman 2004-10-06 20:59:15 UTC
The public list is fairly quiet at the moment on this.  I like the fix
though, and its fairly straightforward, so I've checked it in.  If
there is any future disagreement on this fix upstream, I'll make the
appropriate correction at that time, although I don't think there will
be any argument.