Bug 238682

Summary: nss_ldap lookup hang in _nss_ldap_readconfigfromdns
Product: Red Hat Enterprise Linux 4 Reporter: Georg Moritz <georg.moritz>
Component: nss_ldapAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 4.4CC: jorton, jplans
Target Milestone: ---   
Target Release: ---   
Hardware: i586   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 16:09:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
typescript of 'ps fauxw' and killing child processes
none
coredump of a httpd child (UID 0) produced with kill -ILL none

Description Georg Moritz 2007-05-02 09:18:51 UTC
Description of problem:

Forked workers of httpd sometimes fail to drop privileges and setuid() to the
UID of the apache user. Such child processes - fortunately - are not responsive.

If there are enough processes accumulated with UID 0 (as per StartServers etc),
the webserver is not responsive any more, this being effectively an internal
DOS.

After killing the child processes with UID 0, httpd starts forking processes
again, of which some change UID, and some don't (see attached file).

Version-Release number of selected component (if applicable):

httpd-2.0.52-28.ent

How reproducible:

??? random ocurrence

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

citylap0002 [root] 11:11 AM /root # uname -a
Linux citylap0002.city.net.ffm 2.6.9-34.0.2.ELsmp #1 SMP Fri Jun 30 10:33:58 EDT
2006 i686 i686 i386 GNU/Linux
citylap0002 [root] 11:11 AM /root # cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 4 (Nahant Update 4)

Comment 1 Georg Moritz 2007-05-02 09:18:51 UTC
Created attachment 153925 [details]
typescript of 'ps fauxw' and killing child processes

Comment 2 Joe Orton 2007-05-02 09:43:05 UTC
Thanks for the report.  Can you attach a sysreport for this server?

Can you determine what the hung children are doing:

1) "echo CoreDumpDirectory /tmp > /etc/httpd/conf.d/coredump.conf" (or similar)
2) "kill -SEGV <pid>" a hung root process
3) bzip and attach resultant coredump from /tmp/core.<pid> 


Comment 3 Georg Moritz 2007-05-02 10:16:34 UTC
> Can you determine what the hung children are doing:

I did configure the CoreDumpDirectory, then:

citylap0002 [root] 11:59 PM /root # service httpd graceful 
citylap0002 [root] 12:00 PM /root # ps fauxw | grep httpd
root     20703  0.0  0.0  5632  560 pts/4    S+   12:01   0:00      \_ grep httpd
root      5181  0.0  0.1 12132 4972 ?        Ss   Apr27   0:00 /usr/sbin/httpd
apache   20347  0.0  0.1 12264 5124 ?        S    11:59   0:00  \_ /usr/sbin/httpd
apache   20348  0.0  0.1 12264 5124 ?        S    11:59   0:00  \_ /usr/sbin/httpd
apache   20349  0.0  0.1 12264 5120 ?        S    11:59   0:00  \_ /usr/sbin/httpd
apache   20350  0.0  0.1 12264 5120 ?        S    11:59   0:00  \_ /usr/sbin/httpd
apache   20351  0.0  0.1 12264 5120 ?        S    11:59   0:00  \_ /usr/sbin/httpd
root     20352  0.0  0.1 12132 4988 ?        S    11:59   0:00  \_ /usr/sbin/httpd
apache   20353  0.0  0.1 12132 5004 ?        S    11:59   0:00  \_ /usr/sbin/httpd
root     20356  0.0  0.1 12132 4988 ?        S    11:59   0:00  \_ /usr/sbin/httpd
citylap0002 [root] 12:02 PM /root # strace -p 20352
Process 20352 attached - interrupt to quit
select(1024, [6], [], NULL, NULL <unfinished ...>
Process 20352 detached
citylap0002 [root] 12:02 PM /root # strace -p 20353
Process 20353 attached - interrupt to quit
semop(11304968, 0x8f8740, 1 <unfinished ...>
Process 20353 detached
citylap0002 [root] 12:03 PM /root # strace -p 20351
Process 20351 attached - interrupt to quit
semop(11304968, 0x8f8740, 1 <unfinished ...>
Process 20351 detached
citylap0002 [root] 12:03 PM /root # strace -p 20356
Process 20356 attached - interrupt to quit
read(6,  <unfinished ...>
Process 20356 detached
citylap0002 [root] 12:09 PM /root # ls -l /proc/2035{1,2,6}/fd/6
lrwx------  1 apache apache 64 May  2 12:09 /proc/20351/fd/6 -> socket:[24194848]
lrwx------  1 root   root   64 May  2 12:01 /proc/20352/fd/6 -> socket:[24194848]
lrwx------  1 root   root   64 May  2 12:08 /proc/20356/fd/6 -> socket:[24194848]

If I get apache to dump core, I'll attach that.


Comment 4 Joe Orton 2007-05-02 10:22:43 UTC
Can you also attach the sysreport so the httpd configuration is apparent.

Simply attaching gdb to the process and getting a backtrace will also help.

It is likely this is some third-party module; httpd will call poll() rather than
select() in almost all cases.  It looks like you have a FastCGI running; can you
reproduce the issue without the FastCGI module loaded?

Comment 5 Georg Moritz 2007-05-02 10:37:35 UTC
I don't seem to get httpd to coredump - not with SEGV,ABRT,BUS etc. Got a hint?

A backtrace from a running child with UID 0:

gdb /usr/sbin/httpd 21653
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /usr/sbin/httpd, process 21653
(no debugging symbols found)
Loaded symbols for /usr/sbin/httpd
Reading symbols from /lib/libpcre.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpcre.so.0
Reading symbols from /usr/lib/libpcreposix.so.0...(no debugging symbols
found)...done.
[...]
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
0x00c577a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
(gdb) bt
#0  0x00c577a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00b0a473 in __read_nocancel () from /lib/tls/libpthread.so.0
#2  0x0103475d in _nss_ldap_readconfigfromdns () from /lib/libnss_ldap.so.2
#3  0x010350c0 in _nss_ldap_readconfigfromdns () from /lib/libnss_ldap.so.2
#4  0x01034251 in _nss_ldap_readconfigfromdns () from /lib/libnss_ldap.so.2
#5  0x01031f66 in _nss_ldap_readconfigfromdns () from /lib/libnss_ldap.so.2
#6  0x01010462 in _nss_ldap_readconfigfromdns () from /lib/libnss_ldap.so.2
#7  0x0101192f in _nss_ldap_readconfigfromdns () from /lib/libnss_ldap.so.2
#8  0x010064b7 in _nss_ldap_init () from /lib/libnss_ldap.so.2
#9  0x0100716c in _nss_ldap_getent_ex () from /lib/libnss_ldap.so.2
#10 0x01009690 in _nss_ldap_initgroups_dyn () from /lib/libnss_ldap.so.2
#11 0x0043ac1c in internal_getgrouplist () from /lib/tls/libc.so.6
#12 0x0043aece in initgroups () from /lib/tls/libc.so.6
#13 0x003a4523 in unixd_setup_child () from /usr/sbin/httpd
#14 0x0038551f in ap_graceful_stop_signalled () from /usr/sbin/httpd
#15 0x00385b5c in ap_graceful_stop_signalled () from /usr/sbin/httpd
#16 0x003867f0 in ap_mpm_run () from /usr/sbin/httpd
#17 0x0038d36a in main () from /usr/sbin/httpd
(gdb) 

sysreport follows. How do I post it (does it contain sensitive information)?

Comment 6 Georg Moritz 2007-05-02 10:39:23 UTC
The FastCGI server is a business critical application. Need a downtime window
to check without it :-(

Comment 7 Georg Moritz 2007-05-02 10:56:52 UTC
Created attachment 153928 [details]
coredump of a httpd child (UID 0) produced with kill -ILL

The backtrace of this one is a bit different:

# gdb /usr/sbin/httpd /tmp/core.21653
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Core was generated by `/usr/sbin/httpd'.
Program terminated with signal 4, Illegal instruction.
(no debugging symbols found)
Loaded symbols for /usr/sbin/httpd
[...]
Loaded symbols for /lib/libnsl.so.1
#0  0x00c577a2 in _dl_sysinfo_int80 ()
   from /lib/ld-linux.so.2
(gdb) bt
#0  0x00c577a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0047119d in poll () from /lib/tls/libc.so.6
#2  0x00e17f24 in apr_poll () from /usr/lib/libapr-0.so.0
#3  0x00385734 in ap_graceful_stop_signalled () from /usr/sbin/httpd
#4  0x00385b5c in ap_graceful_stop_signalled () from /usr/sbin/httpd
#5  0x003867f0 in ap_mpm_run () from /usr/sbin/httpd
#6  0x0038d36a in main () from /usr/sbin/httpd
(gdb) q

Comment 8 Joe Orton 2007-05-02 11:05:19 UTC
If you check the "private" box when attaching the sysreport, it can only be
viewed within Red Hat.

The first backtrace looks relevant - this is a hang somewhere doing the group
lookup via nss_ldap:

- from the function names, it appears to be reading LDAP configuration from DNS;
could this be a generic problem with the NSS configuration on the box; does
running "id apache" hang similarly?
- does your /etc/nsswitch.conf have "files" before "ldap" in the "group" line? 
- is the "apache" user in any groups which require LDAP lookup?


Comment 9 Georg Moritz 2007-05-02 13:15:50 UTC
I'll upload the sysreport, then.

As for your questions - no, yes, and no - 
neither UID apache nor GID apache are dependant on LDAP:

citylap0002 [root] 02:45 PM /root # id apache
uid=48(apache) gid=48(apache) groups=48(apache)
citylap0002 [root] 02:46 PM /root # groups apache
apache : apache
citylap0002 [root] 02:46 PM /root # perl -ne 'print unless /^\s*(#|$)/'
/etc/nsswitch.conf
passwd:     files ldap 
shadow:     files ldap
group:      files ldap
hosts:      files dns
bootparams: nisplus [NOTFOUND=return] files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   files
publickey:  nisplus
automount:  files
aliases:    files nisplus

Hm. In /etc/ldap/ldap.conf, a Windows Active Directory Server is named in the URI.
Apache loads mod_ldap - does that module perform any LDAP lookup on startup?
But that shouldn't be an issue - LDAP lookups with GSSAPI and Kerberos tickets 
work like a charm; apache has a valid keytab with keys for both the services
HTTP and LDAP.

Comment 10 Georg Moritz 2007-05-02 14:09:37 UTC
Darn.

I just straced -f -ff httpd, and it reads /etc/ldap.conf -
*not* /etc/openldap/ldap.conf.

I found the childs writing to fh 6 "dc=example,dc=com", then reading from it...
Adding a correct ldap base to /etc/ldap.conf seems to solve the problem.

But then, that problem never occurs starting httpd, only doing 'graceful' or 
with normal child replacement (after MaxRequestsPerChild).

A bogus /etc/ldap.conf doesn't affect initial startup, but after a 'graceful',
a invalid ldap base causes the child processes not to change UID.


Comment 11 Joe Orton 2007-05-02 14:22:40 UTC
I would think it should time out rather than hang indefinitely, in any case. 
Re-assigning to nss_ldap maintainer to see if further analysis is needed.

Comment 12 Jiri Pallich 2012-06-20 16:09:44 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.