Bug 192170 - rpm segfaults on ppc when installing a large number of rpms
Summary: rpm segfaults on ppc when installing a large number of rpms
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: rpm
Version: 4.0
Hardware: ppc64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Paul Nasrat
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-05-18 04:05 UTC by Mike Bonnet
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-06-30 13:42:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gdb backtrace from the core file generated after rpm segfaulted (1.79 KB, text/plain)
2006-05-18 04:05 UTC, Mike Bonnet
no flags Details
Non rpm reproducing test case (2.51 KB, text/plain)
2006-05-24 20:37 UTC, Paul Nasrat
no flags Details

Description Mike Bonnet 2006-05-18 04:05:25 UTC
Description of problem:
When installing a large number of ppc rpms into a chroot on a ppc64 machine, rpm
gets part of the way through the transaction, and then segfaults.  It always
seems to segfault on the same rpm in this test case (rpm-build), though it has
segfaulted on other rpms under other conditions.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Update 3
rpm-4.3.3-17_nonptl

How reproducible:
Always

Steps to Reproduce:
1. Log in to the test machine
2. Become root
3. Run the /root/rpm-test/rpmtest.sh script, passing the name of a directory in
which to install the rpms.  The directory does not need to exist.  e.g.

# /root/rpm-test/rpmtest.sh /var/lib/mock/test-rpm-1
  
Actual results:
rpm segfaults when installing the 122nd rpm (rpm-build)

Expected results:
All of the rpms get successfully installed

Additional info:
backtrace from the core file attached

Comment 1 Mike Bonnet 2006-05-18 04:05:25 UTC
Created attachment 129394 [details]
gdb backtrace from the core file generated after rpm segfaulted

Comment 2 Jeff Johnson 2006-05-18 23:08:27 UTC
Is "rpm" in /etc/passwd and /etc/group?

What is in /etc/nsswitch.conf for user/group lookups?

Comment 3 Mike Bonnet 2006-05-18 23:19:16 UTC
It's installing into an empty root, so, there's no /etc/passwd, /etc/group, or
/etc/nsswitch.conf, at least until one of the rpms creates them.  On the host
machine, /etc/nsswitch.conf contains:

passwd:     files nis
shadow:     files nis
group:      files nis

Here's what gets done to the install root before running the rpm --root command
(this approximates what mock does before calling yum --installroot...I was
originally seeing these segfaults under mock/yum):

mkdir -p $1

mkdir -p $1/var/lib/rpm
mkdir -p $1/var/lock/rpm
mkdir -p $1/var/log
mkdir -p $1/dev
mkdir -p $1/dev/pts
mkdir -p $1/etc/rpm
mkdir -p $1/tmp
mkdir -p $1/var/tmp
mkdir -p $1/proc

mount -t proc proc $1/proc
mount -t devpts devpts $1/dev/pts

mknod $1/dev/null -m 666 c 1 3
mknod $1/dev/urandom -m 644 c 1 9
mknod $1/dev/random -m 644 c 1 9
mknod $1/dev/full -m 666 c 1 7
mknod $1/dev/ptmx -m 666 c 5 2
mknod $1/dev/tty -m 666 c 5 0
mknod $1/dev/zero -m 666 c 1 5

rpm --root $1 -ivh \

Comment 4 Paul Nasrat 2006-05-22 21:53:20 UTC
#1  0x0fa12cd8 in _nss_files_getpwent_r (result=Variable "result" is not available.
) at nss_files/files-XXX.c:281
        status = -134272824
#2  0x00000001 in ?? ()
No symbol table info available.
#3  0x0facc868 in __getpwnam_r (name=0x1036141c "rpm", resbuf=0xfb82fbc,
    buffer=0x105d1d58 "root", buflen=1024, result=0xffffb3f8)
    at ../nss/getXXbyYY_r.c:207
        startp = (service_user *) 0x10950358
        start_fct = 0xfa12c60 <_nss_files_getpwent_r+324>
        nip = (service_user *) 0x10950358
        fct = {l = 0xfa12c60 <_nss_files_getpwent_r+324>, ptr = 0xfa12c60}
        no_more = Variable "no_more" is not available.


Comment 5 Paul Nasrat 2006-05-24 20:34:48 UTC
Reassign.

Comment 6 Paul Nasrat 2006-05-24 20:37:53 UTC
Created attachment 129968 [details]
Non rpm reproducing test case

Test was done on ppc RHEL 4 U3 machine with a rawhide root.

Host: glibc-2.3.4-2.19

Comment 7 Jakub Jelinek 2006-05-25 09:37:16 UTC
This is rpm bug, the interface between libc.so and its NSS modules isn't stable
and using NSS modules from different glibc version is a bug.
Even if that happens to work in some cases, it does quite weird things, e.g.
/etc/nsswitch.conf will be usually taken from outside of the chroot, while
/etc/passwd etc. from inside of the chroot.
rpm either needs to do the resolving outside of the chroot, or do the resolving
fully in the chroot (by running a helper program there, or if /etc/{passwd,group}
files lookup is sufficient, by parsing the files by hand; the former can be
done e.g. by collecting all needed lookups, then spawning /usr/bin/getent in
the chroot and parsing its output).

Comment 8 Jeff Johnson 2006-05-29 13:03:42 UTC
The setup package installs /etc/passwd, and rpm.rpm is a system defined user.group, is the way
this problem has been avoided for years and years.

Was setup one of the packages in your transaction?

Is rpm.rpm still defined in /etc/passwd and /etc/group?

If the answer to either of those questions is "NO", then a different and far more complicated,
implementation will have to be attempted.


Comment 9 Jeff Johnson 2006-05-29 15:27:00 UTC
And the issue with "Non rpm reproducing test case" is the missing dependencies that force /etc/passwd
to be installed first. No implementation in rpm can parse data that isn't there, nor is there any reason to 
assume that the outer chroot has any relevance to what user/group mappings exist within the chroot.

So NOTABUG in rpm imho until the dependencies in glibc (or glibc-common) are restored to have setup/
filesystem/basesystem (or whatever package you choose to contain /etc/passwd and /etc/group) installed 
as prerequsites.

Comment 10 Paul Nasrat 2006-06-30 13:42:06 UTC
Closing as worked around in this case.


Note You need to log in before you can comment on or make changes to this bug.