Bug 837626 - inverted index: subtle corruption in cscope.in.po
Summary: inverted index: subtle corruption in cscope.in.po
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: cscope
Version: 16
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 839068
TreeView+ depends on / blocked
 
Reported: 2012-07-04 12:48 UTC by Paul Bolle
Modified: 2012-07-20 01:53 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 839068 (view as bug list)
Environment:
Last Closed: 2012-07-20 01:53:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
simple parse of inverted index file (cscope.in.out) (3.19 KB, text/plain)
2012-07-04 12:50 UTC, Paul Bolle
no flags Details
fix code that is fairly obviously wrond on x86_64 (1.34 KB, patch)
2012-07-04 12:55 UTC, Paul Bolle
no flags Details | Diff
fix code that is apparently also incorrect (505 bytes, patch)
2012-07-04 13:07 UTC, Paul Bolle
no flags Details | Diff

Description Paul Bolle 2012-07-04 12:48:30 UTC
Description of problem:
On x86_64 cscope creates a subtly corrupted inverted indices.

Version-Release number of selected component (if applicable):
cscope-15.8-1.fc16

How reproducible:
Always

Steps to Reproduce:
1. [in kernel git repo] git ls-files "*.[ch]" | cscope -bqkv -i-
2.
3.
  
Actual results:
Subtle corruption in the inverted index.

Expected results:
Correct inverted index.

Additional info:
0) While researching bug #837296 I wrote a simple index parser (which will attach shortly).

1) After _lots_ of staring at the cscope code and the inverted indices it generates I noticed it subtly corrupts the inverted its indices (on x86_64).

2) Some background:
- the index consist of blocks of 16384 bytes.
- each blocks starts with a 3 longs sized header:
  * number of terms in this block
  * number of next block
  * number of previous block
- this is followed by a list (number of terms long) of entries, of size 2 long. These entries contain (amongst other things) the offset to an entry in the next list.
- this next list (also number of terms long) contains the strings of these terms and their offset in the postings file (ie, cscope.po.out)

The first list grows up, the second list grows down. Due to errors in the handling of x86_64's long size these two lists end up overwriting each other. Net effect: some terms are not properly indexed.

3) I'll also attach two patches that (together) seem to fix this issue.

Comment 1 Paul Bolle 2012-07-04 12:50:26 UTC
Created attachment 596212 [details]
simple parse of inverted index file (cscope.in.out)

Usage: perl index.pl cscope.in.out

Comment 2 Paul Bolle 2012-07-04 12:55:48 UTC
Created attachment 596217 [details]
fix code that is fairly obviously wrond on x86_64

0) This patch fixes code (and some comments) that is fairly obviously wrong. "amtused" (amount used?) starts at 16 (3 * 4 plus 4 for security). I suggest changing this to simply 3 * sizeof(long).

1) So this also drops the security margin: that shouldn't be necessary.

2) Comments welcome.

3) I'd be happy to turn this into a proper patch after review. (Does cscope have a git repo?)

Comment 3 Paul Bolle 2012-07-04 13:07:28 UTC
Created attachment 596219 [details]
fix code that is apparently also incorrect

0) The previous patch in itself doesn't fix all corruption. Some further staring at the code suggested this patch, which simply changes another suspicious magic constant at a strategic spot. (The spot were the code somehow manages to remove the last term from a block, ie drop the term that would be too large to also fit in that block.)

1) This code seems to work. I'm far from sure why. To become sure requires in depth analysis of invmake() and invnewterm(). But I happen to value my sanity. So chances are I will not be a to do that analysis...

2) Review appreciated.

Comment 4 Neil Horman 2012-07-04 17:02:45 UTC
thanks, I'll take a look at this tomorrow!

Comment 5 Neil Horman 2012-07-10 19:57:26 UTC
Paul, I've loked at your code and all your changes look reasonable to me.  When cscope was written back in Bell labs in the 80's I imagine no one ever thought that longs would be more than 4 bytes.  

I've been meaning to convert the cscope tree from cvs to git forever now, but for now you can browse the upstream source at sourceforge (http://cscope.sourceforge.net).  i'll integrate these patches as they are upstream and pull them back to fedora shortly.  Thanks!

Comment 6 Fedora Update System 2012-07-10 20:27:04 UTC
cscope-15.8-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/cscope-15.8-2.fc16

Comment 7 Paul Bolle 2012-07-11 07:29:30 UTC
(In reply to comment #5)
> I've been meaning to convert the cscope tree from cvs to git forever now,
> but for now you can browse the upstream source at sourceforge
> (http://cscope.sourceforge.net).

It's probably quite a job to convert from cvs to git. I can't help with that job (having never done it myself). That would make it rather self-serving to just say that a git repository for cscope would be greatly appreciated. So I'll refrain from saying that.

> i'll integrate these patches as they are
> upstream and pull them back to fedora shortly.  Thanks!

Would you prefer that future patches, if any, be submitted upstream? Fedora cscope is now patch clean, so Fedora specific patches should be rare.

Comment 8 Fedora Update System 2012-07-11 23:55:24 UTC
Package cscope-15.8-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing cscope-15.8-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-10537/cscope-15.8-2.fc16
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2012-07-20 01:53:35 UTC
cscope-15.8-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.