Bug 111516 - intermittent up2date segfaults on RHEL3 for hammer
intermittent up2date segfaults on RHEL3 for hammer
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: rpm (Show other bugs)
3.0
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Paul Nasrat
Mike McLean
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-12-04 15:05 EST by Mike McLean
Modified: 2007-11-30 17:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-19 14:45:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output of up2date (3.36 KB, text/plain)
2003-12-04 15:08 EST, Mike McLean
no flags Details
backtrace of stuck up2date process (2.27 KB, text/plain)
2003-12-04 15:12 EST, Mike McLean
no flags Details
strace of segfaulting up2date (18.83 KB, text/plain)
2003-12-12 17:40 EST, Mike McLean
no flags Details
strace of `rpm -ihvv` seg fault on x86_64 (152.73 KB, text/plain)
2004-02-17 09:55 EST, Brian Brock
no flags Details

  None (edit)
Description Mike McLean 2003-12-04 15:05:37 EST
* 3ES/3WS
* up2date-4.0.1-1.x86_64
* rpm-4.2.1-4.2.x86_64
* rpm-python-4.2.1-4.2.x86_64

This might be rpm's fault (or maybe even the kernel's fault), but here
goes...

up2date is getting stuck in futex.

.live.[root@colossus root]# strace -p 2839
Process 2839 attached - interrupt to quit
futex(0x2a99719240, FUTEX_WAIT, 0, NULL <unfinished ...>

Here is what I did:
1) standard kickstart install of 3ES for hammer (everything)
2) register with rhn via key (rhnreg_ks
3) run up2date -u -f -i --nox --nosrc

Will attach details...
Comment 2 Mike McLean 2003-12-04 15:08:14 EST
Created attachment 96349 [details]
output of up2date
Comment 3 Mike McLean 2003-12-04 15:12:50 EST
Created attachment 96350 [details]
backtrace of stuck up2date process
Comment 4 Mike McLean 2003-12-04 15:20:39 EST
after that, killed up2date, rebooted and tried to finish up2dating. 
Got the following:

.live.[root@colossus root]# up2date -u -f --nox --nosrc
 
Fetching package list for channel: rhel-x86_64-ws-3...
########################################
 
Fetching Obsoletes list for channel: rhel-x86_64-ws-3...
 
Name                                    Version        Rel
----------------------------------------------------------
kernel                                  2.4.21         4.0.1.EL      
     x86_64
kernel-smp                              2.4.21         4.0.1.EL      
     x86_64
kernel-source                           2.4.21         4.0.1.EL      
     x86_64
nptl-devel                              2.3.2          95.6          
     x86_64
nscd                                    2.3.2          95.6          
     x86_64
 
 
Testing package set / solving RPM inter-dependencies...
########################################
RPM package conflict error.  The message was:
Test install failed because of package conflicts:
package kernel-2.4.21-4.0.1.EL is already installed
package kernel-smp-2.4.21-4.0.1.EL is already installed
 
Comment 5 Mike McLean 2003-12-11 15:37:16 EST
The second problem (the kernel conflict) is an up2date bug that is
addressed elsewhere.  The primary problem (getting stuck in futex)
seems to be a problem with rpm.
Comment 6 Mike McLean 2003-12-11 17:20:24 EST
I've been trying to get a shorter path to reproduce this bug.   Here
goes...

Start on an up2date 3AS x86_64 box

for x in $(seq 100); do
    echo ITERATION $x
    rpm -e kernel-2.4.21-4.0.1.EL kernel-smp-2.4.21-4.0.1.EL
    up2date -u -f --nox --nosrc
done &>/tmp/bug &

after several iterations one or the other will segfault.
Comment 7 Jeff Johnson 2003-12-12 15:51:35 EST
Add -vv to up2date please. I can often guess what the problem
is if I can see the equiv of CLI -vv output.

(hmmm, after perusing the stack trace)

Hmmm, you also need to insure that no other root process
is accessing the database. Check the RHN applet and the
rpm -q cron script first. If you find another process,
then the hang on futex is "expected" behavior, this is known
as concurrent access, a current rpm "production" feature, not a bug.

That's what I see in the stack trace, concurrent access locking.
Comment 8 Mike McLean 2003-12-12 17:40:54 EST
Created attachment 96506 [details]
strace of segfaulting up2date
Comment 9 Mike McLean 2003-12-12 18:00:09 EST
Running the test with -vv now.
Comment 10 Jeff Johnson 2003-12-13 17:19:09 EST
Hmmm, this looks like the dangling pointer problem.

Try doing the following to verify
    rm /var/lib/rpm/Pubkeys
    rpm --rebuilddb -vv
If that fixes (yes, reproduce is quite hard), then
this is the dangling pointer problem in rpm.
Comment 11 Jeff Johnson 2003-12-17 22:39:57 EST
The dangling pointer is #107835, fixed in rpm-4.2.2-0.6 and later.
No idea how/when RHEL build, prolly soon.
Comment 12 Jeff Johnson 2003-12-26 12:26:15 EST
Needinfo until someone tells me whether rpm is to be built for
RHEL.
Comment 13 Barry K. Nathan 2004-01-05 22:03:41 EST
Maybe this isn't the right place to ask, but is there any chance of an
update coming for Red Hat Linux 9 or for Fedora Core 1?
Comment 14 Brian Brock 2004-02-17 09:52:20 EST
I have an x86_64 system that reliably shows similar behavior, no need
to run rpm several times to see each error.

# rpm -ih -v -v kernel-2.4.21-9.0.1.EL.x86_64.rpm
D: ============== kernel-2.4.21-9.0.1.EL.x86_64.rpm
D: Expected size:      7152109 = lead(96)+sigs(180)+pad(4)+data(7151829)
D:   Actual size:      7152109
D: kernel-2.4.21-9.0.1.EL.x86_64.rpm: MD5 digest: OK
(cc8ba3c9e807aad192e22be7fa904d93)
D:      added binary package [0]
D: found 0 source and 1 binary packages
D: opening  db environment /var/lib/rpm/Packages joinenv
D: opening  db index       /var/lib/rpm/Packages rdonly mode=0x0
D: locked   db index       /var/lib/rpm/Packages
D: ========== +++ kernel-2.4.21-9.0.1.EL x86_64-linux 0x0
D: opening  db index       /var/lib/rpm/Depends create mode=0x0
D:  Requires: rpmlib(VersionedDependencies) <= 3.0.3-1      YES
(rpmlib provides)
D: opening  db index       /var/lib/rpm/Providename rdonly mode=0x0
D: opening  db index       /var/lib/rpm/Pubkeys rdonly mode=0x0
D:  read h#    1035 Header sanity check: OK
D: ========== DSA pubkey id 219180cddb42a60e
D:  read h#      71 Header V3 DSA signature: OK, key ID db42a60e
D:  Requires: fileutils                                     YES (db
provides)
Segmentation fault
Comment 15 Brian Brock 2004-02-17 09:55:15 EST
Created attachment 97749 [details]
strace of `rpm -ihvv` seg fault on x86_64

Appears slightly different than earlier strace output.	If this seems like a
different bug, I'll be glad to open another report.
Comment 16 Jeff Johnson 2004-09-03 22:25:45 EDT
The 2nd bbrock strace indicates segfault while accessing added
Provides: and files table, different than the other strace.

Was this rpm-4.2.x or rpm-4.3.x? There is 1 line fix
in rpm-4.3.x that may be pertinent.

So the hypothesis is that bug is in rpm-4.2.x, but not rpm-4.3.2
(as in fc3).
Comment 17 Jeremy Katz 2005-04-19 14:45:49 EDT
Closing due to inactivity.  If this issue still occurs with current releases,
please reopen and set the release in which you've encountered the problem.

Note You need to log in before you can comment on or make changes to this bug.