Bug 191468

Summary: nscd silently dies after execution of usermod
Product: Red Hat Enterprise Linux 4 Reporter: Peter Bieringer <pb>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: rkhadgar
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-14 15:40:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Bieringer 2006-05-12 09:42:49 UTC
Description of problem:
Every time I update amavisd (taken from DAG repository) using "yum update", nscd
dies silently

Version-Release number of selected component (if applicable):
nscd-2.3.4-2.19

How reproducible:
Every update of amavisd, but not at the same day, looks like nscd has to run
some days.
Reproducable on 2 RHEL4 systems

Steps to Reproduce:
1. install amavisd, configure postfix, let mailsystem run
2. wait until new amavisd RPM update
3. update amavisd
  
Actual results:
nscd silently dies

Expected results:
nscd will not die

Additional info:
Issue is already known by me since 3 updates or so, unfortunatly, I forgot to
start strace today on both systems :-( Hopefully I will remember next time.

Comment 1 Peter Bieringer 2006-05-29 15:07:07 UTC
I have some news: today, I've updated to a new amaivsd-new package and on both
most identical RHEL4 systems, nscd crashes. Fortunately I straced one of them,
here is the output:

epoll_ctl(13, EPOLL_CTL_ADD, 14, {EPOLLRDNORM, {u32=14, u64=14}}) = 0
epoll_wait(13, {{EPOLLRDNORM, {u32=14, u64=14}}}, 100, 29976) = 1
time(NULL)                              = 1148913476
epoll_ctl(13, EPOLL_CTL_DEL, 14, {0, {u32=0, u64=0}}) = 0
futex(0xdd72e4, FUTEX_WAKE, 1)          = 1
futex(0xdd72e0, FUTEX_WAKE, 1)          = 1
epoll_wait(13, {{EPOLLRDNORM, {u32=12, u64=12}}}, 100, 29988) = 1
time(NULL)                              = 1148913476
accept(12, 0, NULL)                     = 14
epoll_ctl(13, EPOLL_CTL_ADD, 14, {EPOLLRDNORM, {u32=14, u64=14}}) = 0
epoll_wait(13, {{EPOLLRDNORM, {u32=14, u64=14}}}, 100, 29976) = 1
time(NULL)                              = 1148913476
epoll_ctl(13, EPOLL_CTL_DEL, 14, {0, {u32=0, u64=0}}) = 0
futex(0xdd72e4, FUTEX_WAKE, 1)          = 1
futex(0xdd72e0, FUTEX_WAKE, 1)          = 1
epoll_wait(13, bffc13cc, 100, 29988)    = -1 EINTR (Interrupted system call)
--- SIGHUP (Hangup) @ 0 (0) ---
stat64("/etc/passwd", {st_mode=S_IFREG|0644, st_size=2230, ...}) = 0
msync(0xb72e1000, 2472, MS_ASYNC)       = 0
stat64("/etc/group", {st_mode=S_IFREG|0644, st_size=765, ...}) = 0
msync(0xb72ac000, 3704, MS_ASYNC)       = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---


rpm scripts:

# rpm -q --scripts amavisd-new
preinstall scriptlet (using /bin/sh):
if ! /usr/bin/id amavis &>/dev/null; then
        /usr/sbin/useradd -r -d "/var/amavis" -s /bin/sh -c "Amavis email scan
user" -M amavis || \
                logger -t amavisd-new/rpm "Unexpected error adding user \"amavis\"."
fi

if ! /usr/bin/id -n -G amavis | grep -q "\<clamav\>"; then
        /usr/sbin/usermod -G $(id -Gn clamav | tr ' ' ','),amavis clamav || \
                logger -t amavisd-new/rpm "Failed to add user \"amavis\" to
group \"clamav\"."
fi
postinstall scriptlet (using /bin/sh):
/sbin/chkconfig --add amavisd

for file in /etc/postfix/aliases /etc/mail/aliases /etc/aliases; do
        if [ -r "$file" ]; then
                if ! grep -q "^virusalert:" "$file"; then
                        echo -e "virusalert:\troot" >> "$file"
                        /usr/bin/newaliases &>/dev/null || \
                                logger -t amavisd-new/rpm "Cannot exec
newaliases. Please run it manually."
                fi
        fi
done
preuninstall scriptlet (using /bin/sh):
if [ $1 -eq 0 ] ; then
    /sbin/service amavisd stop &>/dev/null || :
    /sbin/chkconfig --del amavisd
fi
postuninstall scriptlet (using /bin/sh):
if [ $1 -eq 0 ]; then
        /usr/sbin/userdel amavis || logger -t amavisd-new/rpm "User \"amavis\"
could not be deleted."
        /usr/sbin/groupdel amavis || logger -t amavisd-new/rpm "Group \"amavis\"
could not be deleted."
else
        /sbin/service amavisd condrestart &>/dev/null || :
fi


Digging further into scripts, I found the reason:

/usr/sbin/usermod -G $(id -Gn clamav | tr ' ' ','),amavis clamav

will cause the nscd crash.

I don't understand the reason of the code (looks strange to me), but anyway,
nscd should not crash.

# id -Gn clamav | tr ' ' ','
clamav,amavis

# strace -f nscd
...
[pid 14610] getsockopt(14, SOL_SOCKET, SO_PEERSEC,
"user_u:system_r:unconfined_t\0", [29]) = 0
[pid 14610] open("/proc/self/attr/current", O_RDONLY) = 13
[pid 14610] read(13, "user_u:system_r:unconfined_t\0", 4095) = 29
[pid 14610] close(13)                   = 0
[pid 14610] send(14, "\2\0\0\0\1\0\0\0\2\0\0\0e\0\0\0f\0\0\0", 20, MSG_NOSIGNAL)
= 20
[pid 14606] <... epoll_wait resumed> {{EPOLLRDNORM, {u32=11, u64=11}}}, 100,
29988) = 1
[pid 14610] close(14 <unfinished ...>
[pid 14606] time( <unfinished ...>
[pid 14610] <... close resumed> )       = 0
[pid 14606] <... time resumed> NULL)    = 1148914960
[pid 14610] futex(0x4c62e4, FUTEX_WAIT, 22, NULL <unfinished ...>
[pid 14606] accept(11, 0, NULL)         = 13
[pid 14606] epoll_ctl(12, EPOLL_CTL_ADD, 13, {EPOLLRDNORM, {u32=13, u64=13}}) = 0
[pid 14606] epoll_wait(12, {{EPOLLRDNORM, {u32=13, u64=13}}}, 100, 29976) = 1
[pid 14606] time(NULL)                  = 1148914960
[pid 14606] epoll_ctl(12, EPOLL_CTL_DEL, 13, {0, {u32=0, u64=0}}) = 0
[pid 14606] futex(0x4c62e4, FUTEX_WAKE, 1 <unfinished ...>
[pid 14611] <... futex resumed> )       = 0
[pid 14606] <... futex resumed> )       = 1
[pid 14611] futex(0x4c62e0, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid 14606] futex(0x4c62e0, FUTEX_WAKE, 1 <unfinished ...>
[pid 14611] <... futex resumed> )       = -1 EAGAIN (Resource temporarily
unavailable)
[pid 14606] <... futex resumed> )       = 0
[pid 14611] futex(0x4c62e0, FUTEX_WAKE, 1 <unfinished ...>
[pid 14606] epoll_wait(12,  <unfinished ...>
[pid 14611] <... futex resumed> )       = 0
[pid 14611] futex(0x4c6264, FUTEX_WAKE, 1) = 0
[pid 14611] fcntl64(13, F_GETFL)        = 0x2 (flags O_RDWR)
[pid 14611] fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 14611] read(13, "\2\0\0\0\3\0\0\0\4\0\0\0", 12) = 12
[pid 14611] read(13, "102\0", 4)        = 4
[pid 14611] getsockopt(13, SOL_SOCKET, SO_PEERSEC,
"user_u:system_r:unconfined_t\0", [29]) = 0
[pid 14611] open("/proc/self/attr/current", O_RDONLY) = 14
[pid 14611] read(14, "user_u:system_r:unconfined_t\0", 4095) = 29
[pid 14611] close(14)                   = 0
[pid 14611] open("/etc/group", O_RDONLY) = 14
[pid 14611] fcntl64(14, F_GETFD)        = 0
[pid 14611] fcntl64(14, F_SETFD, FD_CLOEXEC) = 0
[pid 14611] fstat64(14, {st_mode=S_IFREG|0644, st_size=729, ...}) = 0
[pid 14611] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb6cab000
[pid 14611] read(14, "root:x:0:root\nbin:x:1:root,bin,d"..., 4096) = 729
[pid 14611] close(14)                   = 0
[pid 14611] munmap(0xb6cab000, 4096)    = 0
[pid 14611] time(NULL)                  = 1148914960
[pid 14611] send(13,
"\2\0\0\0\1\0\0\0\7\0\0\0\2\0\0\0f\0\0\0\1\0\0\0\7\0\0\0"..., 68, MSG_NOSIGNAL) = 68
[pid 14611] msync(0xb72b2000, 1260, MS_ASYNC) = 0
[pid 14611] msync(0xb72b2000, 836, MS_ASYNC) = 0
[pid 14611] msync(0xb72b2000, 584, MS_ASYNC) = 0
[pid 14611] close(13)                   = 0
[pid 14611] futex(0x4c62e4, FUTEX_WAIT, 24, NULL <unfinished ...>
[pid 14606] <... epoll_wait resumed> {{EPOLLRDNORM, {u32=11, u64=11}}}, 100,
29988) = 1
[pid 14606] time(NULL)                  = 1148914960
[pid 14606] accept(11, 0, NULL)         = 13
[pid 14606] epoll_ctl(12, EPOLL_CTL_ADD, 13, {EPOLLRDNORM, {u32=13, u64=13}}) = 0
[pid 14606] epoll_wait(12, {{EPOLLRDNORM, {u32=13, u64=13}}}, 100, 29976) = 1
[pid 14606] time(NULL)                  = 1148914960
[pid 14606] epoll_ctl(12, EPOLL_CTL_DEL, 13, {0, {u32=0, u64=0}}) = 0
[pid 14606] futex(0x4c62e4, FUTEX_WAKE, 1 <unfinished ...>
[pid 14612] <... futex resumed> )       = 0
[pid 14606] <... futex resumed> )       = 1
[pid 14612] futex(0x4c62e0, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid 14606] futex(0x4c62e0, FUTEX_WAKE, 1 <unfinished ...>
[pid 14612] <... futex resumed> )       = -1 EAGAIN (Resource temporarily
unavailable)
[pid 14606] <... futex resumed> )       = 0
[pid 14612] futex(0x4c62e0, FUTEX_WAKE, 1 <unfinished ...>
[pid 14606] epoll_wait(12,  <unfinished ...>
[pid 14612] <... futex resumed> )       = 0
[pid 14612] futex(0x4c6264, FUTEX_WAKE, 1) = 0
[pid 14612] fcntl64(13, F_GETFL)        = 0x2 (flags O_RDWR)
[pid 14612] fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 14612] read(13, "\2\0\0\0\v\0\0\0\7\0\0\0", 12) = 12
[pid 14612] read(13, "passwd\0", 7)     = 7
[pid 14612] getsockopt(13, SOL_SOCKET, SO_PEERSEC,
"user_u:system_r:unconfined_t\0", [29]) = 0
[pid 14612] open("/proc/self/attr/current", O_RDONLY) = 14
[pid 14612] read(14, "user_u:system_r:unconfined_t\0", 4095) = 29
[pid 14612] close(14)                   = 0
[pid 14612] sendmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"passwd\0", 7}],
msg_controllen=16, {cmsg_len=16, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS,
{8}}, msg_flags=0}, MSG_NOSIGNAL) = 7
[pid 14612] close(13)                   = 0
[pid 14612] futex(0x4c62e4, FUTEX_WAIT, 26, NULL <unfinished ...>
[pid 14606] <... epoll_wait resumed> {{EPOLLRDNORM, {u32=11, u64=11}}}, 100,
29988) = 1
[pid 14606] time(NULL)                  = 1148914960
[pid 14606] accept(11, 0, NULL)         = 13
[pid 14606] epoll_ctl(12, EPOLL_CTL_ADD, 13, {EPOLLRDNORM, {u32=13, u64=13}}) = 0
[pid 14606] epoll_wait(12, {{EPOLLRDNORM, {u32=13, u64=13}}}, 100, 29976) = 1
[pid 14606] time(NULL)                  = 1148914960
[pid 14606] epoll_ctl(12, EPOLL_CTL_DEL, 13, {0, {u32=0, u64=0}}) = 0
[pid 14606] futex(0x4c62e4, FUTEX_WAKE, 1 <unfinished ...>
[pid 14609] <... futex resumed> )       = 0
[pid 14606] <... futex resumed> )       = 1
[pid 14609] futex(0x4c62e0, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid 14606] futex(0x4c62e0, FUTEX_WAKE, 1 <unfinished ...>
[pid 14609] <... futex resumed> )       = -1 EAGAIN (Resource temporarily
unavailable)
[pid 14606] <... futex resumed> )       = 0
[pid 14609] futex(0x4c62e0, FUTEX_WAKE, 1 <unfinished ...>
[pid 14606] epoll_wait(12,  <unfinished ...>
[pid 14609] <... futex resumed> )       = 0
[pid 14609] futex(0x4c6264, FUTEX_WAKE, 1) = 0
[pid 14609] fcntl64(13, F_GETFL)        = 0x2 (flags O_RDWR)
[pid 14609] fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 14609] read(13, "\2\0\0\0\f\0\0\0\6\0\0\0", 12) = 12
[pid 14609] read(13, "group\0", 6)      = 6
[pid 14609] getsockopt(13, SOL_SOCKET, SO_PEERSEC,
"user_u:system_r:unconfined_t\0", [29]) = 0
[pid 14609] open("/proc/self/attr/current", O_RDONLY) = 14
[pid 14609] read(14, "user_u:system_r:unconfined_t\0", 4095) = 29
[pid 14609] close(14)                   = 0
[pid 14609] sendmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"group\0", 6}],
msg_controllen=16, {cmsg_len=16, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS,
{10}}, msg_flags=0}, MSG_NOSIGNAL) = 6
[pid 14609] close(13)                   = 0
[pid 14609] clock_gettime(CLOCK_MONOTONIC, {2331770, 477292944}) = 0
[pid 14609] futex(0x4c62e4, FUTEX_WAIT, 28, {11, 410645000} <unfinished ...>
[pid 14606] <... epoll_wait resumed> bfecb90c, 100, 29988) = -1 EINTR
(Interrupted system call)
[pid 14606] --- SIGHUP (Hangup) @ 0 (0) ---
[pid 14606] stat64("/etc/passwd", {st_mode=S_IFREG|0644, st_size=2126, ...}) = 0
[pid 14606] msync(0xb72e7000, 1152, MS_ASYNC) = 0
[pid 14606] stat64("/etc/group", {st_mode=S_IFREG|0644, st_size=729, ...}) = 0
[pid 14606] msync(0xb72b2000, 1328, MS_ASYNC) = 0
[pid 14606] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 14606 detached
[pid 14608] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid 14613] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid 14610] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid 14611] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid 14612] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid 14609] <... futex resumed> )       = -1 EINTR (Interrupted system call)
[pid 14608] +++ killed by SIGSEGV +++
Process 14608 detached
[pid 14613] +++ killed by SIGSEGV +++
Process 14613 detached
[pid 14610] +++ killed by SIGSEGV +++
Process 14610 detached
[pid 14611] +++ killed by SIGSEGV +++
Process 14611 detached
[pid 14612] +++ killed by SIGSEGV +++
Process 14612 detached
[pid 14609] +++ killed by SIGSEGV +++
Process 14609 detached
[pid 14607] <... recvfrom resumed> 0xb7d1c020, 1024, 0, 0xb7d1c420, 0xb7d1c01c)
= ? ERESTARTSYS (To be restarted)
[pid 14607] +++ killed by SIGSEGV +++
Process 14607 detached
Process 14606 detached


Unfortunately, "nscd -d" doesn't crash.

Following causes the crash:
# id clamav
uid=100(clamav) gid=101(clamav) groups=101(clamav),102(amavis)

# /usr/sbin/usermod -G clamav,amavis,amavis clamav
 -> crash

# /usr/sbin/usermod -G clamav,amavis clamav
 -> crash

# /usr/sbin/usermod -G clamav,amavis,wheel clamav
 -> crash (but changes were made in /etc/group and /etc/gshadow)

I can't produce a core file.

And the strace output between the non-crashing "nscd -d" and the crashing "nscd"
is too different, didn't find the piece of strace output in "-d" short before
the crash.

Any hints how to further debug this issue?

Comment 2 Peter Bieringer 2006-06-29 09:20:11 UTC
Same happen after upgrade to amavisd-new-2.4.2 today.

Is there no one able to help me?

Comment 7 Bastien Nocera 2006-08-04 11:50:43 UTC
This looks like this FC5 bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=191464

Which will be fixed in RHEL4.4:
"
* Mon Jul 03 2006 Jakub Jelinek <jakub> 2.3.4-2.24

<snip>
- fix nscd -i {passwd,group,hosts}, protect db prunning with a lock
  (#191464)
"