Bug 111516

Summary: intermittent up2date segfaults on RHEL3 for hammer
Product: Red Hat Enterprise Linux 3 Reporter: Mike McLean <mikem>
Component: rpmAssignee: Paul Nasrat <nobody+pnasrat>
Status: CLOSED CURRENTRELEASE QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: barryn, msw
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-19 18:45:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of up2date
none
backtrace of stuck up2date process
none
strace of segfaulting up2date
none
strace of `rpm -ihvv` seg fault on x86_64 none

Description Mike McLean 2003-12-04 20:05:37 UTC
* 3ES/3WS
* up2date-4.0.1-1.x86_64
* rpm-4.2.1-4.2.x86_64
* rpm-python-4.2.1-4.2.x86_64

This might be rpm's fault (or maybe even the kernel's fault), but here
goes...

up2date is getting stuck in futex.

.live.[root@colossus root]# strace -p 2839
Process 2839 attached - interrupt to quit
futex(0x2a99719240, FUTEX_WAIT, 0, NULL <unfinished ...>

Here is what I did:
1) standard kickstart install of 3ES for hammer (everything)
2) register with rhn via key (rhnreg_ks
3) run up2date -u -f -i --nox --nosrc

Will attach details...

Comment 2 Mike McLean 2003-12-04 20:08:14 UTC
Created attachment 96349 [details]
output of up2date

Comment 3 Mike McLean 2003-12-04 20:12:50 UTC
Created attachment 96350 [details]
backtrace of stuck up2date process

Comment 4 Mike McLean 2003-12-04 20:20:39 UTC
after that, killed up2date, rebooted and tried to finish up2dating. 
Got the following:

.live.[root@colossus root]# up2date -u -f --nox --nosrc
 
Fetching package list for channel: rhel-x86_64-ws-3...
########################################
 
Fetching Obsoletes list for channel: rhel-x86_64-ws-3...
 
Name                                    Version        Rel
----------------------------------------------------------
kernel                                  2.4.21         4.0.1.EL      
     x86_64
kernel-smp                              2.4.21         4.0.1.EL      
     x86_64
kernel-source                           2.4.21         4.0.1.EL      
     x86_64
nptl-devel                              2.3.2          95.6          
     x86_64
nscd                                    2.3.2          95.6          
     x86_64
 
 
Testing package set / solving RPM inter-dependencies...
########################################
RPM package conflict error.  The message was:
Test install failed because of package conflicts:
package kernel-2.4.21-4.0.1.EL is already installed
package kernel-smp-2.4.21-4.0.1.EL is already installed
 


Comment 5 Mike McLean 2003-12-11 20:37:16 UTC
The second problem (the kernel conflict) is an up2date bug that is
addressed elsewhere.  The primary problem (getting stuck in futex)
seems to be a problem with rpm.

Comment 6 Mike McLean 2003-12-11 22:20:24 UTC
I've been trying to get a shorter path to reproduce this bug.   Here
goes...

Start on an up2date 3AS x86_64 box

for x in $(seq 100); do
    echo ITERATION $x
    rpm -e kernel-2.4.21-4.0.1.EL kernel-smp-2.4.21-4.0.1.EL
    up2date -u -f --nox --nosrc
done &>/tmp/bug &

after several iterations one or the other will segfault.

Comment 7 Jeff Johnson 2003-12-12 20:51:35 UTC
Add -vv to up2date please. I can often guess what the problem
is if I can see the equiv of CLI -vv output.

(hmmm, after perusing the stack trace)

Hmmm, you also need to insure that no other root process
is accessing the database. Check the RHN applet and the
rpm -q cron script first. If you find another process,
then the hang on futex is "expected" behavior, this is known
as concurrent access, a current rpm "production" feature, not a bug.

That's what I see in the stack trace, concurrent access locking.

Comment 8 Mike McLean 2003-12-12 22:40:54 UTC
Created attachment 96506 [details]
strace of segfaulting up2date

Comment 9 Mike McLean 2003-12-12 23:00:09 UTC
Running the test with -vv now.

Comment 10 Jeff Johnson 2003-12-13 22:19:09 UTC
Hmmm, this looks like the dangling pointer problem.

Try doing the following to verify
    rm /var/lib/rpm/Pubkeys
    rpm --rebuilddb -vv
If that fixes (yes, reproduce is quite hard), then
this is the dangling pointer problem in rpm.

Comment 11 Jeff Johnson 2003-12-18 03:39:57 UTC
The dangling pointer is #107835, fixed in rpm-4.2.2-0.6 and later.
No idea how/when RHEL build, prolly soon.


Comment 12 Jeff Johnson 2003-12-26 17:26:15 UTC
Needinfo until someone tells me whether rpm is to be built for
RHEL.

Comment 13 Barry K. Nathan 2004-01-06 03:03:41 UTC
Maybe this isn't the right place to ask, but is there any chance of an
update coming for Red Hat Linux 9 or for Fedora Core 1?

Comment 14 Brian Brock 2004-02-17 14:52:20 UTC
I have an x86_64 system that reliably shows similar behavior, no need
to run rpm several times to see each error.

# rpm -ih -v -v kernel-2.4.21-9.0.1.EL.x86_64.rpm
D: ============== kernel-2.4.21-9.0.1.EL.x86_64.rpm
D: Expected size:      7152109 = lead(96)+sigs(180)+pad(4)+data(7151829)
D:   Actual size:      7152109
D: kernel-2.4.21-9.0.1.EL.x86_64.rpm: MD5 digest: OK
(cc8ba3c9e807aad192e22be7fa904d93)
D:      added binary package [0]
D: found 0 source and 1 binary packages
D: opening  db environment /var/lib/rpm/Packages joinenv
D: opening  db index       /var/lib/rpm/Packages rdonly mode=0x0
D: locked   db index       /var/lib/rpm/Packages
D: ========== +++ kernel-2.4.21-9.0.1.EL x86_64-linux 0x0
D: opening  db index       /var/lib/rpm/Depends create mode=0x0
D:  Requires: rpmlib(VersionedDependencies) <= 3.0.3-1      YES
(rpmlib provides)
D: opening  db index       /var/lib/rpm/Providename rdonly mode=0x0
D: opening  db index       /var/lib/rpm/Pubkeys rdonly mode=0x0
D:  read h#    1035 Header sanity check: OK
D: ========== DSA pubkey id 219180cddb42a60e
D:  read h#      71 Header V3 DSA signature: OK, key ID db42a60e
D:  Requires: fileutils                                     YES (db
provides)
Segmentation fault

Comment 15 Brian Brock 2004-02-17 14:55:15 UTC
Created attachment 97749 [details]
strace of `rpm -ihvv` seg fault on x86_64

Appears slightly different than earlier strace output.	If this seems like a
different bug, I'll be glad to open another report.

Comment 16 Jeff Johnson 2004-09-04 02:25:45 UTC
The 2nd bbrock strace indicates segfault while accessing added
Provides: and files table, different than the other strace.

Was this rpm-4.2.x or rpm-4.3.x? There is 1 line fix
in rpm-4.3.x that may be pertinent.

So the hypothesis is that bug is in rpm-4.2.x, but not rpm-4.3.2
(as in fc3).

Comment 17 Jeremy Katz 2005-04-19 18:45:49 UTC
Closing due to inactivity.  If this issue still occurs with current releases,
please reopen and set the release in which you've encountered the problem.