Bug 700507
Summary: | nscd start produce USER_AVC | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Karel Srot <ksrot> | ||||
Component: | glibc | Assignee: | Andreas Schwab <schwab> | ||||
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-tools-bugs | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.1 | CC: | dwalsh, eparis, fweimer, jpazdziora, mfranc, mmalik, schwab, sgrubb | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glibc-2.12-1.47.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-12-06 17:47:18 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 752757 | ||||||
Attachments: |
|
Description
Karel Srot
2011-04-28 14:50:02 UTC
Did you modify nscd at all? Does it seem to be working? This looks like nscd is not being allowed to talk to netlink audit socket. Is nscd running as not root user? Do you see this in permissive mode? I agree, it needs to be root or retain CAP_AUDIT_WRITE in order to send events. This looks like 2 problems in one though because once it is sending the USER_AVC...now you have a USER_AVC to deal with. :) You are right, Dan. nscd is running as nscd user, these errors appears also in permissive mode. This is a nscd bug then. [root@rhel61 ~]# getenforce Enforcing [root@rhel61 ~]# service nscd start Starting nscd: [ OK ] [root@rhel61 ~]# service nscd status nscd (pid 1848) is running... [root@rhel61 ~]# ps -eZ | grep nscd unconfined_u:system_r:nscd_t:s0 1848 ? 00:00:00 nscd [root@rhel61 ~]# ps -ef | grep nscd nscd 1848 1 0 03:20 ? 00:00:00 /usr/sbin/nscd [root@rhel61 ~]# cat /var/log/audit/audit.log type=USER_AVC msg=audit(1304061610.473:40): user pid=1848 uid=0 auid=0 ses=2 subj=unconfined_u:system_r:nscd_t:s0 msg='avc: netlink poll: error 4 : exe="/usr/sbin/nscd" sauid=0 hostname=? addr=? terminal=?' [root@rhel61 ~]# cat /var/log/messages Apr 29 03:20:10 rhel61 nscd: 1848 Access Vector Cache (AVC) started Apr 29 03:20:10 rhel61 nscd: Can't send to audit system: USER_AVC avc: netlink poll: error 4#012: exe="?" sauid=28 hostname=? addr=? terminal=? Apr 29 03:20:10 rhel61 nscd: Can't send to audit system: USER_AVC avc: netlink recvfrom: error 1#012: exe="?" sauid=28 hostname=? addr=? terminal=? Apr 29 03:20:10 rhel61 nscd: Can't send to audit system: USER_AVC avc: netlink thread: errors encountered, terminating#012: exe="?" sauid=28 hostname=? addr=? terminal=? [root@rhel61 ~]# [root@rhel61 ~]# service nscd stop Stopping nscd: [ OK ] [root@rhel61 ~]# setenforce 0 [root@rhel61 ~]# echo '' > /var/log/messages ; echo '' > /var/log/audit/audit.log [root@rhel61 ~]# service nscd start Starting nscd: [ OK ] [root@rhel61 ~]# service nscd status nscd (pid 1901) is running... [root@rhel61 ~]# cat /var/log/audit/audit.log type=USER_AVC msg=audit(1304061667.759:42): user pid=1901 uid=0 auid=0 ses=2 subj=unconfined_u:system_r:nscd_t:s0 msg='avc: netlink poll: error 4 : exe="/usr/sbin/nscd" sauid=0 hostname=? addr=? terminal=?' [root@rhel61 ~]# cat /var/log/messages Apr 29 03:21:07 rhel61 nscd: 1901 Access Vector Cache (AVC) started Apr 29 03:21:07 rhel61 nscd: Can't send to audit system: USER_AVC avc: netlink poll: error 4#012: exe="?" sauid=28 hostname=? addr=? terminal=? Apr 29 03:21:07 rhel61 nscd: Can't send to audit system: USER_AVC avc: netlink recvfrom: error 1#012: exe="?" sauid=28 hostname=? addr=? terminal=? Apr 29 03:21:07 rhel61 nscd: Can't send to audit system: USER_AVC avc: netlink thread: errors encountered, terminating#012: exe="?" sauid=28 hostname=? addr=? terminal=? nscd does have CAP_AUDIT_WRITE. If it linked against libcap-ng, its 3 lines of code: capng_clear(CAPNG_SELECT_BOTH); capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_AUDIT_WRITE); if (capng_change_id(uid, uid, CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING)) error(...); The above code assumes that nscd does not exec child processes. If you do this using libcap, it takes about 60 lines of code. If you do this without either, you have a lot more code to write. Ooops...you are right. pscap shows this: 1 9403 nscd nscd audit_write + Karel, are you still getting this issue? [root@hp-dl2x170g6-02 ~]# rpm -q selinux-policy nscd selinux-policy-3.7.19-98.el6.noarch nscd-2.12-1.31.el6.x86_64 [root@hp-dl2x170g6-02 ~]# semodule -DB [root@hp-dl2x170g6-02 ~]# echo > /var/log/messages; echo > /var/log/audit/audit.log [root@hp-dl2x170g6-02 ~]# service nscd restartStopping nscd: [ OK ] Starting nscd: [ OK ] [root@hp-dl2x170g6-02 ~]# cat /var/log/messages ; echo; grep nscd /var/log/audit/audit.log Jun 15 10:19:36 hp-dl2x170g6-02 nscd: 7566 Access Vector Cache (AVC) started Jun 15 10:19:36 hp-dl2x170g6-02 nscd: Can't send to audit system: USER_AVC avc: netlink poll: error 4#012: exe="?" sauid=28 hostname=? addr=? terminal=? Jun 15 10:19:36 hp-dl2x170g6-02 nscd: Can't send to audit system: USER_AVC avc: netlink recvfrom: error 1#012: exe="?" sauid=28 hostname=? addr=? terminal=? Jun 15 10:19:36 hp-dl2x170g6-02 nscd: Can't send to audit system: USER_AVC avc: netlink thread: errors encountered, terminating#012: exe="?" sauid=28 hostname=? addr=? terminal=? type=USER_AVC msg=audit(1308147576.863:207460): user pid=7566 uid=0 auid=0 ses=8 subj=unconfined_u:system_r:nscd_t:s0 msg='avc: netlink poll: error 4 : exe="/usr/sbin/nscd" sauid=0 hostname=? addr=? terminal=?' There are 2 problems here: 1) nscd is not being allowed to send audit events. To get the 'can't send to the audit system' message means that errno == EPERM. If this is being run in permissive mode, then SE Linux should not be the problem. I see that nscd has CAP_AUDIT_WRITE. But looking at the nscd code, it seems to say something about creating an AVC thread. I wonder if that thread has no capabilities since the uid was changed? I also wonder if that thread disappears real quick and its hard to check the capabilities it has. 2) The other problem is the actual AVC. But fixing that would hide the other problem. nscd has the right permissions, verified all threads. The audit code only writes the can't send message if we get eperm during send. Looking at the kernel code, it seems to be failing a netlink permission check - but it has the right DAC permissions so a problem in policy is suspected. Any chance we can get the full audit log so I can look for other oddities? Created attachment 505084 [details]
console log and audit.log
Adding another console log and the audit.log content since system reboot.
There is one strange thing. Please notice, that the system is in permissive mode and I got nscd errors anyway almost every second nscd restart (sometimes it passes, sometimes it fails, sometime AVC appears, sometimes not). AVC message from audit.log is dontaudited (did I mention it before?). And this behavior is the same for enforcing mode. Is it just because the error wasn't logged? I recall, I know nothing about nscd, maybe I am missing something obvious (I didn't touch /etc/nscd.conf at all).
Not sure how we should fix this one. I think this is a nscd problem. nscd hasn't changed in any way. This bug reminds me of a problem we had on dbus where the audit socket was getting created on a thread and wasn't available on others. I'd have to go back and look what we did to dbus to fix it on it. I think it was moving the init to earlier in the startup. The dbus problem was bug 237289. There is some kind of multi-threaded race that is causing this problem. I can see nscd sending avcs successfully, it then starts changing credentials and then no longer can send avcs. [pid 2497] setgid(28) Gid was just changed [pid 2497] futex(0x7f36e8d01d1c, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 2497] rt_sigreturn(0x7f36e8d01d1c) = -1 EINTR (Interrupted system call) [pid 2497] getuid() = 0 [pid 2497] readlink("/proc/self/exe", "/usr/sbin/nscd", 4096) = 14 [pid 2497] ioctl(0, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 [pid 2497] fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [pid 2497] readlink("/proc/self/fd/0", "/dev/pts/0", 31) = 10 [pid 2497] stat("/dev/pts/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [pid 2497] lstat("/dev/pts/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [pid 2497] sendto(4, "l\0\0\0S\4\5\0\2\0\0\0\0\0\0\0avc: netlink po"..., 108, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 108 [pid 2497] poll([{fd=4, events=POLLIN}], 1, 500) = 1 ([{fd=4, revents=POLLIN}]) [pid 2497] recvfrom(4, "$\0\0\0\2\0\0\0\2\0\0\0\300\t\0\0\0\0\0\0l\0\0\0S\4\5\0\2\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 We send audit event and the reply ^^^ is fine [pid 2497] setuid(28) = 0 Changes uid [pid 2497] futex(0x7f36e8d01d1c, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 2497] rt_sigreturn(0x7f36e8d01d1c) = -1 EINTR (Interrupted system call) [pid 2497] getuid() = 28 [pid 2497] readlink("/proc/self/exe", 0x7f36e8cfb490, 4096) = -1 EACCES (Permission denied) [pid 2497] ioctl(0, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 [pid 2497] fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [pid 2497] readlink("/proc/self/fd/0", 0x7f36e8d00840, 31) = -1 EACCES (Permission denied) [pid 2497] stat("/dev/pts/", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 2497] open("/dev/pts/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 15 [pid 2497] getdents(15, /* 7 entries */, 32768) = 168 [pid 2497] stat("/dev/pts/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [pid 2497] close(15) = 0 [pid 2497] lstat("/dev/pts/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [pid 2497] sendto(4, "`\0\0\0S\4\5\0\3\0\0\0\0\0\0\0avc: netlink po"..., 96, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 96 [pid 2497] poll([{fd=4, events=POLLIN}], 1, 500) = 1 ([{fd=4, revents=POLLIN}]) [pid 2497] recvfrom(4, "t\0\0\0\2\0\0\0\3\0\0\0\300\t\0\0\377\377\377\377`\0\0\0S\4\5\0\3\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 116 Get another AVC to log, but now we get EPERM (that's the\377\377\377\377 in the recvfrom it was 0 before which is success). What I think is happening is the 2497 thread is the first thread created. It looks like its setting up to do things and the avc's are coming asynchronously (due to the callback) before the thread is done. Looking in nscd/connections.c, there seems to be some credential changing in restart() and it looks like it does not preserve capabilities across a uid change. There is also one place where setuid is called before setgid. I would say that straightening out the credential changing is all that is needed. I don't know exactly the path ncsd takes on start up, but maybe initializing selinux later after everything is setup is better. Also prctl has to be involved in any setgid/setuid calling since it has to retain capabilities. Transferring back to glibc... This is not a bug in libaudit, EPERM comes from the kernel. This problem is probably caused by a multithreaded race. I explained in comment #20 what I am seeing. You probably need to rearrange the startup so that selinux is initialized a little later. Also, do you know what thread the selinux callbacks are handled on? When you get EPERM returned, its because that thread does not have permissions. If the thread has CAP_AUDIT_WRITE, the kernel will allow the audit event. 4 is EINTR, not EPERM. Hmm...if the AVC is getting into the audit logs, then nscd is fixed. The next issue is the selinux policy and that should be a new bug. This bug fixes the transfer of the avc to the audit system, a second bug should fix the policy that cause nscd to want to write to the audit system in the first place. Of course policy may ultimately need code changes rather than a policy fix. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1526.html |