133942 – cscope inverted index buggy if source files include read errors

Bug 133942 - cscope inverted index buggy if source files include read errors

Summary: cscope inverted index buggy if source files include read errors

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	cscope
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Neil Horman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-28 15:58 UTC by Frank Ch. Eigler
Modified:	2007-11-30 22:10 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-10-06 21:00:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Frank Ch. Eigler 2004-09-28 15:58:26 UTC

When cscope is run with "-R -q", it (re)builds an inverted index in
addition to the normal index.  This inverted index appears to go bad
with respect to a mapping to file names, if the source directory
contains unreadable source files.  One source of such files are emacs
symlink "locks" that point to nonexistent places, although other
errors can also occur.

What happens then is that an interactive search using the inverted
index identifies the wrong file names with hits.  It is as if the
unreadable files did get an ID reserved in one cscope table, but not
in others.

One can see this effect if one makes a toy directory with a few .c
files, and a symlink like "ln -s /NONEXISTENT file.c".  With "cscope
-R" alone, the file.c nonexistence will be noted during index rebuild,
but will not result in corrupted data.  With "cscope -R -q", the hits
can point to the wrong file.

Comment 2 Frank Ch. Eigler 2004-09-28 17:32:02 UTC

I doubt the behavior is in any way dependent on the OS version.  I'm
pretty sure it's some smallish error handling bug in build.c someplace.

Both invocations note the inaccessibility of the symlink during parse.
 The difference I saw was that during interactive use of a database
built with "-q" also, the filename listed for a search hit was incorrect.

Comment 3 Neil Horman 2004-09-28 17:35:12 UTC

Ah, you're right, I've reproduced it now.  I'll get it fixed ASAP.

Thanks

Comment 4 Neil Horman 2004-09-28 20:11:28 UTC

I've checked a patch into CVS for this bug, and its ready for the next
QA build.

Comment 6 Frank Ch. Eigler 2004-09-28 20:17:19 UTC

BTW the new patch is not quite enough.  Consider the case of an
ordinary read error, like if a source file was "chmod 000".  I believe
the symlink example is just a special case of a more general problem.

Comment 8 Frank Ch. Eigler 2004-09-28 20:30:05 UTC

The same problem does recur with the "chmod 000" real file.

Re what the check should be...  I don't know exactly.  I would follow
the code to see how it handles parsing errors in general - what
control flow ends up in printing that error message to the screen. 
There I'd modify the code in order to make the index exclude the
problematic file.

Comment 9 Neil Horman 2004-10-04 14:52:15 UTC

I think I found the root cause of this problem.  It would appear that
searches in cscope rely on both the srcfiles array, which is a list of
all the files found in a source tree, and the cscope database, which
indexes all the symbols in the files listed in srcfiles.  The problem
is, that a minimal entry is required in the database for every file,
even if it contains no symbols.  The problem is that unreadable files
(as described in this bug), don't get that minimal entry (which is
added in the crossref() function), and as such the database index into
the srcfiles array becomes skewed.  I'm proposing a fix for this in
the public forum right now, and as soon as I get feedback/acceptance
on it, I'll check in the fix here.

Comment 10 Neil Horman 2004-10-06 20:59:15 UTC

The public list is fairly quiet at the moment on this.  I like the fix
though, and its fairly straightforward, so I've checked it in.  If
there is any future disagreement on this fix upstream, I'll make the
appropriate correction at that time, although I don't think there will
be any argument.

Note You need to log in before you can comment on or make changes to this bug.