Description of problem: I noticed this in /var/log/messages: Jun 10 08:00:17 localhost nscd: 6638 Access Vector Cache (AVC) started Jun 10 08:11:57 localhost kernel:<6>nscd[6643]: segfault at e42ed04 ip 0022275f sp af14cdf8 error 4 in libc-2.8.90.so[1ac000+16d000] I restarted nscd and connected gdb to it ("gdb program PROCNUM"). Got this segfault/backtrace: (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) n Program not restarted. (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xaf14db90 (LWP 6643)] obstack_free (h=<value optimized out>, obj=<value optimized out>) at obstack.c:366 366 plp = lp->prev; (gdb) bt #0 obstack_free (h=<value optimized out>, obj=<value optimized out>) at obstack.c:366 #1 0xb7f80434 in gc (db=0xb7f90200) at mem.c:542 #2 0xb7f7f198 in prune_cache (table=0xb7f90200, now=1213110354, fd=-1) at cache.c:521 #3 0xb7f73f49 in nscd_run_prune (p=0x2) at connections.c:1451 #4 0x0014351f in start_thread (arg=<value optimized out>) at pthread_create.c:297 #5 0x0028fbde in clone () from /lib/libc.so.6 (gdb) Version-Release number of selected component (if applicable): nscd-2.8.90-5.i386 How reproducible: Seems to happen after a while each time I start it. Steps to Reproduce: 1. Start nscd ("service nscd start") 2. connect gdb to it 3. wait while doing normal browsing, etc. Actual results: Expected results: Additional info:
This can only happen if memory is corrupted. I cannot see anything wrong in the use of the obstack. So it has to come from somewhere else. What NSS modules do you use? If you have any which don't come with glibc (like LDAP, samba stuff, mdns, ...) remove them and try again. Also, run nscd in debug mode (-d) under valgrind.
[root@localhost ~]# rpm -qa \*nss\* | grep -v openss nss-mdns-0.10-4.fc9.i386 nss-tools-3.12.0.3-2.fc10.i386 nss-debuginfo-3.12.0.3-2.fc10.i386 nss-mdns-debuginfo-0.10-4.fc9.i386 nss-devel-3.12.0.3-2.fc10.i386 nss_compat_ossl-0.9.2-4.fc9.i386 nss_db-2.2-40.fc9.i386 nss-3.12.0.3-2.fc10.i386 nss_ldap-259-3.fc9.i386 [root@localhost ~]# I removed nss-mdns*, and nss_db. ipa-client requires nss_ldap, and slrn requires nss_compat_ossl (I could also remove these, if you recommend). So, my system now looks like: [root@localhost ~]# rpm -qa \*nss\* | grep -v openss nss-tools-3.12.0.3-2.fc10.i386 nss-debuginfo-3.12.0.3-2.fc10.i386 nss-devel-3.12.0.3-2.fc10.i386 nss_compat_ossl-0.9.2-4.fc9.i386 nss-3.12.0.3-2.fc10.i386 nss_ldap-259-3.fc9.i386 [root@localhost ~]# I'll retest with this setup. Let me know if I should remove nss_ldap and nss_compat_ossl as well.
Trying valgrind, I get: [root@localhost ~]# valgrind nscd -d valgrind: mmap(0x0, 135168) failed in UME with error 13 (Permission denied). [root@localhost ~]# when running in SELinux enforcing mode. I'm running in permissive mode......
OK, running as described in #3, I get the follow from valgrind: 12876: provide access to FD 6, for passwd 12876: remove GETAI entry "www.google.com" 12876: remove GETAI entry "news.google.com" 12876: remove GETAI entry "cm.my.yahoo.com" 12876: remove GETAI entry "www.mail.com" ==12876== Thread 5: ==12876== Conditional jump or move depends on uninitialised value(s) ==12876== at 0xACC730: obstack_free (obstack.c:364) ==12876== by 0x12433: gc (mem.c:542) ==12876== by 0x11197: prune_cache (cache.c:521) ==12876== by 0x5F48: nscd_run_prune (connections.c:1451) ==12876== by 0xC0351E: start_thread (pthread_create.c:297) ==12876== by 0xB39BDD: clone (in /lib/libc-2.8.90.so) ==12876== ==12876== Conditional jump or move depends on uninitialised value(s) ==12876== at 0xACC752: obstack_free (obstack.c:364) ==12876== by 0x12433: gc (mem.c:542) ==12876== by 0x11197: prune_cache (cache.c:521) ==12876== by 0x5F48: nscd_run_prune (connections.c:1451) ==12876== by 0xC0351E: start_thread (pthread_create.c:297) ==12876== by 0xB39BDD: clone (in /lib/libc-2.8.90.so) ==12876== ==12876== Use of uninitialised value of size 4 ==12876== at 0xACC75F: obstack_free (obstack.c:366) ==12876== by 0x12433: gc (mem.c:542) ==12876== by 0x11197: prune_cache (cache.c:521) ==12876== by 0x5F48: nscd_run_prune (connections.c:1451) ==12876== by 0xC0351E: start_thread (pthread_create.c:297) ==12876== by 0xB39BDD: clone (in /lib/libc-2.8.90.so) ==12876== ==12876== Invalid read of size 4 ==12876== at 0xACC75F: obstack_free (obstack.c:366) ==12876== by 0x12433: gc (mem.c:542) ==12876== by 0x11197: prune_cache (cache.c:521) ==12876== by 0x5F48: nscd_run_prune (connections.c:1451) ==12876== by 0xC0351E: start_thread (pthread_create.c:297) ==12876== by 0xB39BDD: clone (in /lib/libc-2.8.90.so) ==12876== Address 0x3e7ff004 is not stack'd, malloc'd or (recently) free'd ==12876== ==12876== Process terminating with default action of signal 11 (SIGSEGV) ==12876== Access not within mapped region at address 0x3E7FF004 ==12876== at 0xACC75F: obstack_free (obstack.c:366) ==12876== by 0x12433: gc (mem.c:542) ==12876== by 0x11197: prune_cache (cache.c:521) ==12876== by 0x5F48: nscd_run_prune (connections.c:1451) ==12876== by 0xC0351E: start_thread (pthread_create.c:297) ==12876== by 0xB39BDD: clone (in /lib/libc-2.8.90.so) ==12876== ==12876== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 26 from 1) ==12876== malloc/free: in use at exit: 33,794 bytes in 487 blocks. ==12876== malloc/free: 2,450 allocs, 1,963 frees, 550,531 bytes allocated. ==12876== For counts of detected errors, rerun with: -v ==12876== searching for pointers to 487 not-freed blocks. ==12876== checked 19,848,208 bytes. ==12876== ==12876== LEAK SUMMARY: ==12876== definitely lost: 0 bytes in 0 blocks. ==12876== possibly lost: 1,368 bytes in 9 blocks. ==12876== still reachable: 32,426 bytes in 478 blocks. ==12876== suppressed: 0 bytes in 0 blocks. ==12876== Rerun with --leak-check=full to see details of leaked memory. Killed [root@localhost ~]# I attach the complete output of "valgrind nscd -d" below. Is this useful?
Created attachment 308849 [details] Complete output of "valgrind nscd -d" I did the following: 1.) start "valgrind nscd -d" 2.) restart firefox 3.) got above output with firefox bringing up my home pages 4.) firefox reported "couldn't find pages" messages for each of my home pages.
(In reply to comment #2) > So, my system now looks like: > > [root@localhost ~]# rpm -qa \*nss\* | grep -v openss > nss-tools-3.12.0.3-2.fc10.i386 > nss-debuginfo-3.12.0.3-2.fc10.i386 > nss-devel-3.12.0.3-2.fc10.i386 > nss_compat_ossl-0.9.2-4.fc9.i386 > nss-3.12.0.3-2.fc10.i386 > nss_ldap-259-3.fc9.i386 You don't have to remove the NSS packages, just remove the appropriate entries in /etc/nsswitch.conf. In the passwd, group, hosts, and services lines there should only be glibc-provided entries.
I believe /etc/nsswitch.conf is "stock" on my system: # # /etc/nsswitch.conf # # An example Name Service Switch config file. This file should be # sorted with the most-used services at the beginning. # # The entry '[NOTFOUND=return]' means that the search for an # entry should stop if the search in the previous entry turned # up nothing. Note that if the search failed due to some other reason # (like no NIS server responding) then the search continues with the # next entry. # # Legal entries are: # # nisplus or nis+ Use NIS+ (NIS version 3) # nis or yp Use NIS (NIS version 2), also called YP # dns Use DNS (Domain Name Service) # files Use the local files # db Use the local database (.db) files # compat Use NIS on compat mode # hesiod Use Hesiod for user lookups # [NOTFOUND=return] Stop searching if not found so far # # To use db, put the "db" in front of "files" for entries you want to be # looked up first in the databases # # Example: #passwd: db files nisplus nis #shadow: db files nisplus nis #group: db files nisplus nis passwd: files shadow: files group: files #hosts: db files nisplus nis dns hosts: files dns # Example - obey only what nisplus tells us... #services: nisplus [NOTFOUND=return] files #networks: nisplus [NOTFOUND=return] files #protocols: nisplus [NOTFOUND=return] files #rpc: nisplus [NOTFOUND=return] files #ethers: nisplus [NOTFOUND=return] files #netmasks: nisplus [NOTFOUND=return] files bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files netgroup: files publickey: nisplus automount: files aliases: files
Created attachment 308930 [details] Another "valgrind nscd -d" run (with segfault) Another segfault running "nscd -d"
Created attachment 308931 [details] nscd.log from crashed/exited run of nscd Not sure this is interesting, but I uncommented the 'logfile' option in /etc/nscd.conf, set nscd to start in runlevel 5 and rebooted. The last lines in the log file are: 2566: handle_request: request received (Version = 2) from PID 2801 2566: INITGROUPS (gdm) 2566: remove GETAI entry "" I do see these type of messages in /var/log/messages: Jun 11 07:11:25 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./2799 (xdm_t). For complete SELinux messages. run sealert -l 8e30d5b3-d350-47ae-9797-abc57816e649 Jun 11 07:11:25 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./2801 (xdm_dbusd_t). For complete SELinux messages. run sealert -l 204af7c5-e663-4b2e-927a-e623abe3b543 Jun 11 07:11:25 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./2801 (xdm_dbusd_t). For complete SELinux messages. run sealert -l 204af7c5-e663-4b2e-927a-e623abe3b543 #============= nscd_t ============== allow nscd_t avahi_t:dir search; allow nscd_t cupsd_t:dir search; allow nscd_t hald_t:dir search; allow nscd_t ntpd_t:dir search; allow nscd_t pam_console_t:dir search; allow nscd_t setroubleshootd_t:dir search; allow nscd_t sshd_t:dir search; allow nscd_t xdm_dbusd_t:dir search; allow nscd_t xdm_t:dir search; Don't remember seeing these before. I'll reboot in permissive mode.
Created attachment 308937 [details] /var/log/nscd.log.... nscd stops on boot up OK. Changed loglevel to 9 and rebooted in permissive mode (no more AVCs). nscd seems to stop just as ntpd starts. Believe this from /var/log/messages applies: Jun 11 07:36:43 localhost gdm-simple-slave[2724]: DEBUG: GdmSlave: Stopping slave Jun 11 07:36:43 localhost gdm-simple-slave[2724]: DEBUG: GdmSignalHandler: Finalizing signal handler Jun 11 07:36:43 localhost gdm-simple-slave[2724]: DEBUG: Slave finished Jun 11 07:36:44 localhost avahi-daemon[2607]: Got SIGTERM, quitting. Jun 11 07:36:44 localhost avahi-daemon[2607]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.101. Jun 11 07:36:45 localhost ntpd[2582]: ntpd exiting on signal 15 Jun 11 07:36:45 localhost NetworkManager: <WARN> nm_signal_handler(): Caught signal 15, shutting down normally. Jun 11 07:36:45 localhost NetworkManager: <info> (eth0): now unmanaged Jun 11 07:36:45 localhost NetworkManager: <info> (eth0): device state change: 8 -> 1 Jun 11 07:36:45 localhost NetworkManager: <info> (eth0): deactivating device. Jun 11 07:36:45 localhost NetworkManager: <info> eth0: canceled DHCP transaction, dhclient pid 2518 Jun 11 07:36:45 localhost kernel:<6>nscd[3888]: segfault at 10 ip 00000010 sp af04f800 error 4 in ld-2.8.90.so[110000+20000] Jun 11 07:36:45 localhost NetworkManager: <info> (eth0): cleaning up... Jun 11 07:36:45 localhost NetworkManager: <info> (eth0): taking down device. Jun 11 07:36:45 localhost NetworkManager: <info> (wlan0): now unmanaged Here are the last few lines of nscd.log: 2566: considering GETAI entry "localhost", timeout 1213198701 2566: considering GETAI entry "0.fedora.pool.ntp.org", timeout 1213195245 2566: considering GETAI entry "localhost.localdomain", timeout 1213198702 2566: considering GETAI entry "", timeout 1213195124 2566: handle_request: request received (Version = 2) from PID 2665 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2665 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2704 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: pruning services cache; time 1213195112 2566: considering GETSERVBYNAME entry "ntp/udp", timeout 1213223282 2566: considering GETSERVBYNAME entry "ipp/", timeout 1213219037 2566: handle_request: request received (Version = 2) from PID 2728 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2665 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2665 2566: GETFDGR 2566: provide access to FD 9, for group 2566: handle_request: request received (Version = 2) from PID 2736 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2736 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2764 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2772 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2772 2566: GETFDGR 2566: provide access to FD 9, for group 2566: handle_request: request received (Version = 2) from PID 2773 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2773 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: handle_request: request received (Version = 2) from PID 2777 2566: GETFDPW 2566: provide access to FD 7, for passwd 2566: pruning hosts cache; time 1213195126 2566: considering GETAI entry "localhost", timeout 1213198701 2566: considering GETAI entry "0.fedora.pool.ntp.org", timeout 1213195245 2566: considering GETAI entry "localhost.localdomain", timeout 1213198702 2566: considering GETAI entry "", timeout 1213195124 2566: remove GETAI entry ""
This is wrong: > 2566: considering GETAI entry "", timeout 1213195124 Not that I think it should cause a crash but it's wrong. The entire log as far nscd is concerned would be good. There must be a line for the time when the entry was added. And just to be sure, can you show the content of /proc/PID/maps from a time when nscd is up and running (PID is the PID of the nscd process).
Believe I attached complete nscd log files in #10 above. The nscd process that is started via boot is always dead by the time I log in. Here is the context of /proc/XXXX/maps for nscd when I manually start it after the system has been running for a while. [root@localhost ~]# cat /proc/5038/maps 00110000-00130000 r-xp 00000000 fd:00 7209254 /lib/ld-2.8.90.so 00130000-00131000 r--p 00020000 fd:00 7209254 /lib/ld-2.8.90.so 00131000-00132000 rw-p 00021000 fd:00 7209254 /lib/ld-2.8.90.so 00132000-00133000 r-xp 00132000 00:00 0 [vdso] 00133000-0013b000 r-xp 00000000 fd:00 7209459 /lib/librt-2.8.90.so 0013b000-0013c000 r--p 00007000 fd:00 7209459 /lib/librt-2.8.90.so 0013c000-0013d000 rw-p 00008000 fd:00 7209459 /lib/librt-2.8.90.so 0013d000-00153000 r-xp 00000000 fd:00 7209457 /lib/libpthread-2.8.90.so 00153000-00154000 r--p 00015000 fd:00 7209457 /lib/libpthread-2.8.90.so 00154000-00155000 rw-p 00016000 fd:00 7209457 /lib/libpthread-2.8.90.so 00155000-00157000 rw-p 00155000 00:00 0 00157000-0016d000 r-xp 00000000 fd:00 7209677 /lib/libnsl-2.8.90.so 0016d000-0016e000 r--p 00016000 fd:00 7209677 /lib/libnsl-2.8.90.so 0016e000-0016f000 rw-p 00017000 fd:00 7209677 /lib/libnsl-2.8.90.so 0016f000-00171000 rw-p 0016f000 00:00 0 00171000-0018b000 r-xp 00000000 fd:00 7209428 /lib/libselinux.so.1 0018b000-0018c000 r--p 00019000 fd:00 7209428 /lib/libselinux.so.1 0018c000-0018d000 rw-p 0001a000 fd:00 7209428 /lib/libselinux.so.1 0018d000-001a5000 r-xp 00000000 fd:00 7209705 /lib/libaudit.so.0.0.0 001a5000-001a6000 r--p 00017000 fd:00 7209705 /lib/libaudit.so.0.0.0 001a6000-001a7000 rw-p 00018000 fd:00 7209705 /lib/libaudit.so.0.0.0 001a7000-001ab000 r-xp 00000000 fd:00 7209005 /lib/libcap.so.2.10 001ab000-001ac000 rw-p 00003000 fd:00 7209005 /lib/libcap.so.2.10 001ac000-00319000 r-xp 00000000 fd:00 7209423 /lib/libc-2.8.90.so 00319000-0031b000 r--p 0016d000 fd:00 7209423 /lib/libc-2.8.90.so 0031b000-0031c000 rw-p 0016f000 fd:00 7209423 /lib/libc-2.8.90.so 0031c000-0031f000 rw-p 0031c000 00:00 0 0031f000-00322000 r-xp 00000000 fd:00 7209426 /lib/libdl-2.8.90.so 00322000-00323000 r--p 00002000 fd:00 7209426 /lib/libdl-2.8.90.so 00323000-00324000 rw-p 00003000 fd:00 7209426 /lib/libdl-2.8.90.so 00324000-0032f000 r-xp 00000000 fd:00 10420519 /lib/libnss_files-2.8.90.so 0032f000-00330000 r--p 0000a000 fd:00 10420519 /lib/libnss_files-2.8.90.so 00330000-00331000 rw-p 0000b000 fd:00 10420519 /lib/libnss_files-2.8.90.so aeafd000-aeafe000 ---p aeafd000 00:00 0 aeafe000-aebfe000 rw-p aeafe000 00:00 0 aebfe000-aebff000 ---p aebfe000 00:00 0 aebff000-aecff000 rw-p aebff000 00:00 0 aecff000-aed00000 ---p aecff000 00:00 0 aed00000-aee00000 rw-p aed00000 00:00 0 aee00000-aee01000 ---p aee00000 00:00 0 aee01000-aef01000 rw-p aee01000 00:00 0 aef01000-aef02000 ---p aef01000 00:00 0 aef02000-af002000 rw-p aef02000 00:00 0 af002000-af003000 ---p af002000 00:00 0 af003000-af103000 rw-p af003000 00:00 0 af103000-af104000 ---p af103000 00:00 0 af104000-af204000 rw-p af104000 00:00 0 af204000-af205000 ---p af204000 00:00 0 af205000-af305000 rw-p af205000 00:00 0 af305000-b1305000 rw-s 00000000 fd:00 66827 /var/db/nscd/services b1305000-b3305000 rw-s 00000000 fd:00 66825 /var/db/nscd/hosts b3305000-b5305000 rw-s 00000000 fd:00 66823 /var/db/nscd/group b5305000-b7305000 rw-s 00000000 fd:00 66814 /var/db/nscd/passwd b7305000-b7306000 ---p b7305000 00:00 0 b7306000-b7d06000 rw-p b7306000 00:00 0 b7d06000-b7f06000 r--p 00000000 fd:00 5485971 /usr/lib/locale/locale-archive b7f06000-b7f09000 rw-p b7f06000 00:00 0 b7f22000-b7f23000 rw-p b7f22000 00:00 0 b7f23000-b7f44000 r-xp 00000000 fd:00 5489047 /usr/sbin/nscd b7f44000-b7f45000 r--p 00020000 fd:00 5489047 /usr/sbin/nscd b7f45000-b7f46000 rw-p 00021000 fd:00 5489047 /usr/sbin/nscd b8fd9000-b8ffa000 rw-p b8fd9000 00:00 0 [heap] bfa2f000-bfa44000 rw-p bffeb000 00:00 0 [stack] [root@localhost ~]# Is that helpful? Also, (believe nscd.log is for several "runs"): [tbl@localhost ~]$ grep '""' /var/log/nscd.log 2566: Haven't found "" in hosts cache! 2566: remove GETAI entry "" 2556: Haven't found "" in hosts cache! 2556: add new entry "" of type GETAI for hosts to cache (first) 2556: considering GETAI entry "", timeout 1213194512 2556: considering GETAI entry "", timeout 1213194512 2556: considering GETAI entry "", timeout 1213194512 2556: remove GETAI entry "" 2566: Haven't found "" in hosts cache! 2566: add new entry "" of type GETAI for hosts to cache (first) 2566: considering GETAI entry "", timeout 1213195124 2566: considering GETAI entry "", timeout 1213195124 2566: remove GETAI entry "" 2658: Haven't found "" in hosts cache! 2658: add new entry "" of type GETAI for hosts to cache (first) 2658: considering GETAI entry "", timeout 1213199888 2658: considering GETAI entry "", timeout 1213199888 2658: considering GETAI entry "", timeout 1213199888 2658: remove GETAI entry "" [tbl@localhost ~]$
Oh, and please make sure you remove all /var/db/nscd/* files before starting nscd. We check for validity of the files but who knows, maybe something is missing.
Created attachment 308961 [details] Clean nscd.log after cleaning out /var/db/nscd/* and rebooting OK. Removed /var/db/nscd/* (drat, should have renamed instead.....) and rebooted. This time I have what appears to be a different nscd.log file, and nscd was running for a while after gnome login. But, some DNS look ups failed, that is, my firefox home pages came up with the "can't find this domain" page. However, this time I booted in enforcing mode, and I got a bunch of AVCs, e.g.: Jun 11 09:48:41 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3338 (system_dbusd_t). For complete SELinux messages. run sealert -l 3ccfc02e-bf12-4fe8-8ac2-8021b91626bc Jun 11 09:48:41 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3338 (system_dbusd_t). For complete SELinux messages. run sealert -l 3ccfc02e-bf12-4fe8-8ac2-8021b91626bc Jun 11 09:48:41 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3353 (setroubleshootd_t). For complete SELinux messages. run sealert -l 71303d2f-2b4c-495c-a033-6f64d4265fe7 Jun 11 09:48:44 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3375 (hald_t). For complete SELinux messages. run sealert -l c66afb2f-294c-4009-9fa1-133ff8da3e38 Jun 11 09:48:44 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3375 (hald_t). For complete SELinux messages. run sealert -l c66afb2f-294c-4009-9fa1-133ff8da3e38 Jun 11 09:48:49 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3389 (hald_t). For complete SELinux messages. run sealert -l 1cdf56a2-2ce6-45c4-bf22-321596de42d8 Jun 11 09:48:49 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3389 (hald_t). For complete SELinux messages. run sealert -l 1cdf56a2-2ce6-45c4-bf22-321596de42d8 Jun 11 09:48:56 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3406 (hald_t). For complete SELinux messages. run sealert -l eaea69c7-6404-4172-a4a0-9d6f30d21e9d Jun 11 09:48:57 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3406 (hald_t). For complete SELinux messages. run sealert -l eaea69c7-6404-4172-a4a0-9d6f30d21e9d Jun 11 09:49:02 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3463 (hald_t). For complete SELinux messages. run sealert -l 378f6336-cd56-4828-bde1-00da6dcd2ad5 Jun 11 09:49:02 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3463 (hald_t). For complete SELinux messages. run sealert -l 378f6336-cd56-4828-bde1-00da6dcd2ad5 Jun 11 09:49:07 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3467 (hald_t). For complete SELinux messages. run sealert -l 39dfc119-7cfc-40d8-838b-dd52dea06a59 Jun 11 09:49:07 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3467 (hald_t). For complete SELinux messages. run sealert -l 39dfc119-7cfc-40d8-838b-dd52dea06a59 Jun 11 09:49:13 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3472 (hald_t). For complete SELinux messages. run sealert -l bc1facd9-0d9f-41ae-86ae-bb5c8a4402ca Jun 11 09:49:13 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3472 (hald_t). For complete SELinux messages. run sealert -l bc1facd9-0d9f-41ae-86ae-bb5c8a4402ca Jun 11 09:49:19 localhost setroubleshoot: SELinux is preventing nscd (nscd_t) "search" to ./3476 (hald_t). For complete SELinux messages. run sealert -l 288fd2eb-3149-4099-ac21-3266b26f2df4 <<<<<SNIP>>>>> Here is the complete list: [root@localhost ~]# check-avcs #============= nscd_t ============== allow nscd_t avahi_t:dir search; allow nscd_t hald_t:dir search; allow nscd_t ntpd_t:dir search; allow nscd_t setroubleshootd_t:dir search; allow nscd_t sshd_t:dir search; allow nscd_t system_dbusd_t:dir search; [root@localhost ~]# Is that an issue? I'll reboot in permissive.
Similar behavior with no AVCs when I boot in permissive. Something must be scanning /proc looking for a particular process or something....
Created attachment 308967 [details] Valgrind log of nscd getting an Illegal Instruction fault (SIGILL) I rebooted in permissive mode, killed off the flaky nscd that gets started during boot, and ran "valgrind nscd -d". I then started firefox. I have 4 tabbed home pages, one is news.google.com and another is gmail. I got "Address not found" pages for http://news.google.com/ and for https://mail.google.com, followed quickly by nscd faulting with SIGILL. Here is a snippet from the attached complete valgrind log: vex x86->IR: unhandled instruction bytes: 0x6C 0x69 0x62 0x63 ==4200== valgrind: Unrecognised instruction at address 0xB94663. ==4200== Your program just tried to execute an instruction that Valgrind ==4200== did not recognise. There are two possible reasons for this. ==4200== 1. Your program has a bug and erroneously jumped to a non-code ==4200== location. If you are running Memcheck and you just saw a ==4200== warning about a bad jump, it's probably your program's fault. ==4200== 2. The instruction is legitimate but Valgrind doesn't handle it, ==4200== i.e. it's Valgrind's fault. If you think this is the case or ==4200== you are not sure, please let us know and we'll try to fix it. ==4200== Either way, Valgrind will now raise a SIGILL signal which will ==4200== probably kill your program. ==4200== ==4200== Process terminating with default action of signal 4 (SIGILL) ==4200== Illegal opcode at address 0xB94663 ==4200== at 0xB94663: (within /lib/libc-2.8.90.so) ==4200== by 0x12433: gc (mem.c:542) ==4200== by 0x11197: prune_cache (cache.c:521) ==4200== by 0x5F48: nscd_run_prune (connections.c:1451) ==4200== by 0xC0351E: start_thread (pthread_create.c:297) ==4200== by 0xB39BDD: clone (in /lib/libc-2.8.90.so) ==4200== This follows a bunch of "uninitialized memory" messages from valgrind.
The obstack-related problem should be fixed upstream now. We'll have to see whether there are more problems. Jakub will build a new glibc sometime soon.
With nscd-2.8.90-6.i386 (and related glibc packages), I no longer get seg faults, etc, but DNS searches still don't work: [root@localhost ~]# setenforce 0 [root@localhost ~]# rm -rf /var/db/nscd [root@localhost ~]# curl https://mail.google.com <html><head><meta http-equiv="Refresh" content="0;URL=https://mail.google.com/mail/" /></head><body><script type="text/javascript" language="javascript"><!-- location.replace("https://mail.google.com/mail/") --></script></body></html>[root@localhost ~]# [root@localhost ~]# service nscd start Starting nscd: [ OK ] [root@localhost ~]# curl https://mail.google.com curl: (6) Couldn't resolve host 'mail.google.com' [root@localhost ~]# curl https://mail.google.com curl: (6) Couldn't resolve host 'mail.google.com' [root@localhost ~]# service nscd stop Stopping nscd: [ OK ] [root@localhost ~]# curl https://mail.google.com <html><head><meta http-equiv="Refresh" content="0;URL=https://mail.google.com/mail/" /></head><body><script type="text/javascript" language="javascript"><!-- location.replace("https://mail.google.com/mail/") --></script></body></html>[root@localhost ~]# nslookup mail.google.com Server: 10.10.3.10 Address: 10.10.3.10#53 Non-authoritative answer: mail.google.com canonical name = googlemail.l.google.com. Name: googlemail.l.google.com Address: 209.85.201.19 Name: googlemail.l.google.com Address: 209.85.201.83 Name: googlemail.l.google.com Address: 209.85.201.18 [root@localhost ~]# Some other data/info useful?
(In reply to comment #18) > With nscd-2.8.90-6.i386 (and related glibc packages), I no longer get seg > faults, etc, but DNS searches still don't work: That's a transient problem which is why I didn't see it. All the stars have to align for it. I think it's fixed in cvs. Maybe Jakub has time tomorrow to build another glibc.
Try 2.8.90-7.
2.8.90-7 works in permissive mode!!!!!! But not in enforcing mode. Seems to want: #============= hald_t ============== allow hald_t var_t:file read; #============= nscd_t ============== allow nscd_t var_t:dir { write add_name }; allow nscd_t var_t:file { read write create getattr }; [The directory and create privs are for the case where there is no nscd directory in /var/db, and it and the files under it must be created.] Thanks for tracking this down and fixing it!
I've managed to get nscd "up and running" on boot to runlevel 5. Two sets of issues (at least on my system): 1. nscd needs new SELinux policy to get up and running this way. Examples of allow rules include: allow nscd_t inotifyfs_t:dir read; allow nscd_t hald_acl_t:lnk_file read; and allow nscd_t *:lnk_file read; <---- This is just shorthand.... The last rule appears to come from nscd wanting to read /proc/PROCNUM/exe. I'm guessing it does this for every process that wants to connect to it. I can collect and post all the individual rules I've added, but is there a better way to do this? 2. Often, on reboot, I notice (via "ps agx | grep nscd") several nscd processes running, with the early ones eating up cpu cycles, etc. If I kill all the running nscd processes and restart the service, all appears to work. The best I can figure out from the log is that each of the spinning nscd processes is spinning after the following log entry: 2578: considering GETAI entry "0.fedora.pool.ntp.org", timeout 1213798101 2578: considering GETHOSTBYNAME entry "localhost.localdomain", timeout 1213800081 2578: considering GETAI entry "localhost.localdomain", timeout 1213798631 2578: considering GETAI entry "www.mail.com", timeout 1213798107 2578: considering GETAI entry "scs.msg.yahoo.com", timeout 1213797432 2578: considering GETAI entry "", timeout 1213797342 2578: remove GETAI entry "" 2578: freed 88 bytes in hosts cache 3298: Access Vector Cache (AVC) started 3298: handle_request: request received (Version = 2) from PID 3308 3298: INVALIDATE (hosts) 3298: pruning hosts cache; time 2147483647 3298: considering GETAI entry "weather.noaa.gov", timeout 1213797792 3298: considering GETAI entry "address.yahoo.com", timeout 1213797545 3298: considering GETHOSTBYNAMEv6 entry "localhost", timeout 1213800938 3298: considering GETAI entry "localhost", timeout 1213800491 Is this something particular to my system?
Example of multiple nscd procs on boot: [tbl@localhost ~]$ sudo -i [root@localhost ~]# ps agx | grep nscd 2603 ? Ssl 6:24 /usr/sbin/nscd 2661 ? Ssl 0:36 /usr/sbin/nscd 3515 ? Ssl 0:00 /usr/sbin/nscd 3671 pts/0 S+ 0:00 grep nscd [root@localhost ~]# killall nscd [root@localhost ~]# ps agx | grep nscd 3686 pts/0 S+ 0:00 grep nscd [root@localhost ~]#
Here's another reboot: [tbl@localhost ~]$ sudo -i [root@localhost ~]# ps agx | grep nscd 2605 ? Ssl 10:12 /usr/sbin/nscd 2646 ? Rsl 8:21 /usr/sbin/nscd 3334 ? Rsl 1:20 /usr/sbin/nscd 3744 ? Ssl 1:12 /usr/sbin/nscd 3764 ? Ssl 0:00 /usr/sbin/nscd 3979 pts/0 S+ 0:00 grep nscd [root@localhost ~]# killall nscd [root@localhost ~]# service nscd start Starting nscd: [ OK ] [root@localhost ~]# ps agx | grep nscd 4004 ? Ssl 0:00 /usr/sbin/nscd 4016 pts/0 S+ 0:00 grep nscd [root@localhost ~]#
Happened again, this time way after boot: [root@localhost ~]# ps agx | grep nscd 4004 ? Rsl 77:55 /usr/sbin/nscd 27354 ? Ssl 0:01 /usr/sbin/nscd 30793 pts/0 S+ 0:00 grep nscd [root@localhost ~]# Here are lines from nscd.log at the time 27354 was started: 4004: considering GETAI entry "mail.google.com", timeout 1213824121 4004: considering GETHOSTBYNAMEv6 entry "localhost.localdomain", timeout 1213822450 4004: considering GETHOSTBYNAME entry "localhost.localdomain", timeout 1213822402 4004: considering GETAI entry "localhost.localdomain", timeout 1213822328 4004: considering GETAI entry "news.yahoo.com", timeout 1213821466 4004: considering GETAI entry "suggestqueries.google.com", timeout 1213821584 4004: considering GETAI entry "www.halflifesource.com", timeout 1213828473 4004: considering GETAI entry "jakarta.apache.org", timeout 1213822994 4004: considering GETAI entry "www.bloglines.com", timeout 1213821934 4004: considering GETAI entry "share.skype.com", timeout 1213822172 4004: considering GETHOSTBYADDRv6 entry "::1", timeout 1213823224 4004: remove GETAI entry "cm.my.yahoo.com" 4004: freed 128 bytes in hosts cache 4004: handle_request: request received (Version = 2) from PID 27338 4004: GETFDPW 4004: provide access to FD 8, for passwd 4004: handle_request: request received (Version = 2) from PID 27338 4004: GETFDPW 4004: provide access to FD 8, for passwd 4004: pruning hosts cache; time 1213821433 4004: considering GETAI entry "hv3.webstat.com", timeout 1213823673 4004: considering GETAI entry "now.eloqua.com", timeout 1213900282 4004: considering GETAI entry "weather.noaa.gov", timeout 1213821862 4004: considering GETAI entry "weather.noaa.gov", timeout 1213821862 4004: considering GETAI entry "weather.noaa.gov", timeout 1213821862 4004: considering GETAI entry "weather.noaa.gov", timeout 1213821862 4004: considering GETAI entry "weather.noaa.gov", timeout 1213821862 4004: considering GETAI entry "geo.yahoo.com", timeout 1213823478 4004: considering GETAI entry "koji.fedoraproject.org", timeout 1213823074 4004: considering GETAI entry "www.seamonkey-project.org", timeout 1213824901 4004: considering GETAI entry "localhost", timeout 1213823224 4004: considering GETHOSTBYNAME entry "localhost", timeout 1213822486 4004: considering GETAI entry "ai.vmware.com", timeout 1213852275 4004: considering GETAI entry "www.google.com", timeout 1213821687 4004: considering GETAI entry "www.nccgroup.com", timeout 1213906464 4004: considering GETAI entry "l.yimg.com", timeout 1213821432 4004: considering GETAI entry "vmware-www.baynote.net", timeout 1213835482 4004: considering GETAI entry "ocsp.thawte.com", timeout 1213824266 4004: considering GETAI entry "www.newsgator.com", timeout 1213822190 4004: considering GETAI entry "media3.washingtonpost.com", timeout 1213824689 4004: considering GETAI entry "news.google.com", timeout 1213821673 4004: considering GETAI entry "us.lrd.yahoo.com", timeout 1213821448 4004: considering GETAI entry "mail.google.com", timeout 1213824121 4004: considering GETHOSTBYNAMEv6 entry "localhost.localdomain", timeout 1213822450 4004: considering GETHOSTBYNAME entry "localhost.localdomain", timeout 1213822402 4004: considering GETAI entry "localhost.localdomain", timeout 1213822328 4004: considering GETAI entry "news.yahoo.com", timeout 1213821466 4004: considering GETAI entry "suggestqueries.google.com", timeout 1213821584 4004: considering GETAI entry "www.halflifesource.com", timeout 1213828473 4004: considering GETAI entry "jakarta.apache.org", timeout 1213822994 4004: considering GETAI entry "www.bloglines.com", timeout 1213821934 4004: considering GETAI entry "share.skype.com", timeout 1213822172 4004: considering GETHOSTBYADDRv6 entry "::1", timeout 1213823224 4004: remove GETAI entry "l.yimg.com" 4004: freed 120 bytes in hosts cache 27354: Access Vector Cache (AVC) started 27354: handle_request: request received (Version = 2) from PID 27365 27354: INVALIDATE (hosts) 27354: pruning hosts cache; time 2147483647 27354: considering GETAI entry "hv3.webstat.com", timeout 1213823673 27354: considering GETAI entry "now.eloqua.com", timeout 1213900282 27354: considering GETAI entry "weather.noaa.gov", timeout 1213821862 27354: considering GETAI entry "weather.noaa.gov", timeout 1213821862 27354: considering GETAI entry "weather.noaa.gov", timeout 1213821862 27354: considering GETAI entry "weather.noaa.gov", timeout 1213821862 27354: considering GETAI entry "weather.noaa.gov", timeout 1213821862 If I 'strace -p 4004', I get an endless number of: read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) read(6, 0xbff05350, 100) = -1 EAGAIN (Resource temporarily unavailable) If I 'gdb -p 4004', I get this from 'where': 0x00132416 in __kernel_vsyscall () (gdb) where #0 0x00132416 in __kernel_vsyscall () #1 0x0014a13b in read () from /lib/libpthread.so.0 #2 0xb7eea598 in main_loop_epoll (efd=16) at ../posix/bits/unistd.h:45 #3 0xb7eeaad0 in start_threads () at connections.c:2136 #4 0xb7ee92ad in main (argc=1, argv=0xbff055d4) at nscd.c:279 (gdb)
The endless read loop problem should be fixed upstream.
glibc-2.8.90-8 seems to fix endless read loop problem. I now get only one nscd process, and all seems well (no more segfaults either).....
All reported problems should indeed be fixed in 2.8.90-8.
glibc-2.8-7 has been submitted as an update for Fedora 9
glibc-2.8-8 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.