Bug 111516 - intermittent up2date segfaults on RHEL3 for hammer
Summary: intermittent up2date segfaults on RHEL3 for hammer
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: rpm
Version: 3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Paul Nasrat
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-12-04 20:05 UTC by Mike McLean
Modified: 2007-11-30 22:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-19 18:45:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of up2date (3.36 KB, text/plain)
2003-12-04 20:08 UTC, Mike McLean
no flags Details
backtrace of stuck up2date process (2.27 KB, text/plain)
2003-12-04 20:12 UTC, Mike McLean
no flags Details
strace of segfaulting up2date (18.83 KB, text/plain)
2003-12-12 22:40 UTC, Mike McLean
no flags Details
strace of `rpm -ihvv` seg fault on x86_64 (152.73 KB, text/plain)
2004-02-17 14:55 UTC, Brian Brock
no flags Details

Description Mike McLean 2003-12-04 20:05:37 UTC
* 3ES/3WS
* up2date-4.0.1-1.x86_64
* rpm-4.2.1-4.2.x86_64
* rpm-python-4.2.1-4.2.x86_64

This might be rpm's fault (or maybe even the kernel's fault), but here
goes...

up2date is getting stuck in futex.

.live.[root@colossus root]# strace -p 2839
Process 2839 attached - interrupt to quit
futex(0x2a99719240, FUTEX_WAIT, 0, NULL <unfinished ...>

Here is what I did:
1) standard kickstart install of 3ES for hammer (everything)
2) register with rhn via key (rhnreg_ks
3) run up2date -u -f -i --nox --nosrc

Will attach details...

Comment 2 Mike McLean 2003-12-04 20:08:14 UTC
Created attachment 96349 [details]
output of up2date

Comment 3 Mike McLean 2003-12-04 20:12:50 UTC
Created attachment 96350 [details]
backtrace of stuck up2date process

Comment 4 Mike McLean 2003-12-04 20:20:39 UTC
after that, killed up2date, rebooted and tried to finish up2dating. 
Got the following:

.live.[root@colossus root]# up2date -u -f --nox --nosrc
 
Fetching package list for channel: rhel-x86_64-ws-3...
########################################
 
Fetching Obsoletes list for channel: rhel-x86_64-ws-3...
 
Name                                    Version        Rel
----------------------------------------------------------
kernel                                  2.4.21         4.0.1.EL      
     x86_64
kernel-smp                              2.4.21         4.0.1.EL      
     x86_64
kernel-source                           2.4.21         4.0.1.EL      
     x86_64
nptl-devel                              2.3.2          95.6          
     x86_64
nscd                                    2.3.2          95.6          
     x86_64
 
 
Testing package set / solving RPM inter-dependencies...
########################################
RPM package conflict error.  The message was:
Test install failed because of package conflicts:
package kernel-2.4.21-4.0.1.EL is already installed
package kernel-smp-2.4.21-4.0.1.EL is already installed
 


Comment 5 Mike McLean 2003-12-11 20:37:16 UTC
The second problem (the kernel conflict) is an up2date bug that is
addressed elsewhere.  The primary problem (getting stuck in futex)
seems to be a problem with rpm.

Comment 6 Mike McLean 2003-12-11 22:20:24 UTC
I've been trying to get a shorter path to reproduce this bug.   Here
goes...

Start on an up2date 3AS x86_64 box

for x in $(seq 100); do
    echo ITERATION $x
    rpm -e kernel-2.4.21-4.0.1.EL kernel-smp-2.4.21-4.0.1.EL
    up2date -u -f --nox --nosrc
done &>/tmp/bug &

after several iterations one or the other will segfault.

Comment 7 Jeff Johnson 2003-12-12 20:51:35 UTC
Add -vv to up2date please. I can often guess what the problem
is if I can see the equiv of CLI -vv output.

(hmmm, after perusing the stack trace)

Hmmm, you also need to insure that no other root process
is accessing the database. Check the RHN applet and the
rpm -q cron script first. If you find another process,
then the hang on futex is "expected" behavior, this is known
as concurrent access, a current rpm "production" feature, not a bug.

That's what I see in the stack trace, concurrent access locking.

Comment 8 Mike McLean 2003-12-12 22:40:54 UTC
Created attachment 96506 [details]
strace of segfaulting up2date

Comment 9 Mike McLean 2003-12-12 23:00:09 UTC
Running the test with -vv now.

Comment 10 Jeff Johnson 2003-12-13 22:19:09 UTC
Hmmm, this looks like the dangling pointer problem.

Try doing the following to verify
    rm /var/lib/rpm/Pubkeys
    rpm --rebuilddb -vv
If that fixes (yes, reproduce is quite hard), then
this is the dangling pointer problem in rpm.

Comment 11 Jeff Johnson 2003-12-18 03:39:57 UTC
The dangling pointer is #107835, fixed in rpm-4.2.2-0.6 and later.
No idea how/when RHEL build, prolly soon.


Comment 12 Jeff Johnson 2003-12-26 17:26:15 UTC
Needinfo until someone tells me whether rpm is to be built for
RHEL.

Comment 13 Barry K. Nathan 2004-01-06 03:03:41 UTC
Maybe this isn't the right place to ask, but is there any chance of an
update coming for Red Hat Linux 9 or for Fedora Core 1?

Comment 14 Brian Brock 2004-02-17 14:52:20 UTC
I have an x86_64 system that reliably shows similar behavior, no need
to run rpm several times to see each error.

# rpm -ih -v -v kernel-2.4.21-9.0.1.EL.x86_64.rpm
D: ============== kernel-2.4.21-9.0.1.EL.x86_64.rpm
D: Expected size:      7152109 = lead(96)+sigs(180)+pad(4)+data(7151829)
D:   Actual size:      7152109
D: kernel-2.4.21-9.0.1.EL.x86_64.rpm: MD5 digest: OK
(cc8ba3c9e807aad192e22be7fa904d93)
D:      added binary package [0]
D: found 0 source and 1 binary packages
D: opening  db environment /var/lib/rpm/Packages joinenv
D: opening  db index       /var/lib/rpm/Packages rdonly mode=0x0
D: locked   db index       /var/lib/rpm/Packages
D: ========== +++ kernel-2.4.21-9.0.1.EL x86_64-linux 0x0
D: opening  db index       /var/lib/rpm/Depends create mode=0x0
D:  Requires: rpmlib(VersionedDependencies) <= 3.0.3-1      YES
(rpmlib provides)
D: opening  db index       /var/lib/rpm/Providename rdonly mode=0x0
D: opening  db index       /var/lib/rpm/Pubkeys rdonly mode=0x0
D:  read h#    1035 Header sanity check: OK
D: ========== DSA pubkey id 219180cddb42a60e
D:  read h#      71 Header V3 DSA signature: OK, key ID db42a60e
D:  Requires: fileutils                                     YES (db
provides)
Segmentation fault

Comment 15 Brian Brock 2004-02-17 14:55:15 UTC
Created attachment 97749 [details]
strace of `rpm -ihvv` seg fault on x86_64

Appears slightly different than earlier strace output.	If this seems like a
different bug, I'll be glad to open another report.

Comment 16 Jeff Johnson 2004-09-04 02:25:45 UTC
The 2nd bbrock strace indicates segfault while accessing added
Provides: and files table, different than the other strace.

Was this rpm-4.2.x or rpm-4.3.x? There is 1 line fix
in rpm-4.3.x that may be pertinent.

So the hypothesis is that bug is in rpm-4.2.x, but not rpm-4.3.2
(as in fc3).

Comment 17 Jeremy Katz 2005-04-19 18:45:49 UTC
Closing due to inactivity.  If this issue still occurs with current releases,
please reopen and set the release in which you've encountered the problem.


Note You need to log in before you can comment on or make changes to this bug.