Bug 839068

Summary: inverted index: subtle corruption in cscope.in.po
Product: [Fedora] Fedora Reporter: Neil Horman <nhorman>
Component: cscopeAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: nhorman, pebolle
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 837626 Environment:
Last Closed: 2012-07-20 01:52:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 837626    
Bug Blocks:    

Description Neil Horman 2012-07-10 20:12:31 UTC
+++ This bug was initially created as a clone of Bug #837626 +++

Description of problem:
On x86_64 cscope creates a subtly corrupted inverted indices.

Version-Release number of selected component (if applicable):
cscope-15.8-1.fc16

How reproducible:
Always

Steps to Reproduce:
1. [in kernel git repo] git ls-files "*.[ch]" | cscope -bqkv -i-
2.
3.
  
Actual results:
Subtle corruption in the inverted index.

Expected results:
Correct inverted index.

Additional info:
0) While researching bug #837296 I wrote a simple index parser (which will attach shortly).

1) After _lots_ of staring at the cscope code and the inverted indices it generates I noticed it subtly corrupts the inverted its indices (on x86_64).

2) Some background:
- the index consist of blocks of 16384 bytes.
- each blocks starts with a 3 longs sized header:
  * number of terms in this block
  * number of next block
  * number of previous block
- this is followed by a list (number of terms long) of entries, of size 2 long. These entries contain (amongst other things) the offset to an entry in the next list.
- this next list (also number of terms long) contains the strings of these terms and their offset in the postings file (ie, cscope.po.out)

The first list grows up, the second list grows down. Due to errors in the handling of x86_64's long size these two lists end up overwriting each other. Net effect: some terms are not properly indexed.

3) I'll also attach two patches that (together) seem to fix this issue.

--- Additional comment from pebolle on 2012-07-04 08:50:26 EDT ---

Created attachment 596212 [details]
simple parse of inverted index file (cscope.in.out)

Usage: perl index.pl cscope.in.out

--- Additional comment from pebolle on 2012-07-04 08:55:48 EDT ---

Created attachment 596217 [details]
fix code that is fairly obviously wrond on x86_64

0) This patch fixes code (and some comments) that is fairly obviously wrong. "amtused" (amount used?) starts at 16 (3 * 4 plus 4 for security). I suggest changing this to simply 3 * sizeof(long).

1) So this also drops the security margin: that shouldn't be necessary.

2) Comments welcome.

3) I'd be happy to turn this into a proper patch after review. (Does cscope have a git repo?)

--- Additional comment from pebolle on 2012-07-04 09:07:28 EDT ---

Created attachment 596219 [details]
fix code that is apparently also incorrect

0) The previous patch in itself doesn't fix all corruption. Some further staring at the code suggested this patch, which simply changes another suspicious magic constant at a strategic spot. (The spot were the code somehow manages to remove the last term from a block, ie drop the term that would be too large to also fit in that block.)

1) This code seems to work. I'm far from sure why. To become sure requires in depth analysis of invmake() and invnewterm(). But I happen to value my sanity. So chances are I will not be a to do that analysis...

2) Review appreciated.

--- Additional comment from nhorman on 2012-07-04 13:02:45 EDT ---

thanks, I'll take a look at this tomorrow!

--- Additional comment from nhorman on 2012-07-10 15:57:26 EDT ---

Paul, I've loked at your code and all your changes look reasonable to me.  When cscope was written back in Bell labs in the 80's I imagine no one ever thought that longs would be more than 4 bytes.  

I've been meaning to convert the cscope tree from cvs to git forever now, but for now you can browse the upstream source at sourceforge (http://cscope.sourceforge.net).  i'll integrate these patches as they are upstream and pull them back to fedora shortly.  Thanks!

Comment 1 Fedora Update System 2012-07-10 20:25:00 UTC
cscope-15.8-2.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/cscope-15.8-2.fc17

Comment 2 Fedora Update System 2012-07-12 00:00:42 UTC
Package cscope-15.8-2.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing cscope-15.8-2.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-10562/cscope-15.8-2.fc17
then log in and leave karma (feedback).

Comment 3 Fedora Update System 2012-07-20 01:52:15 UTC
cscope-15.8-2.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.