Bug 1287800

Summary:	RFC: audit log prevents PAM login when euid != uid
Product:	Red Hat Enterprise Linux 7	Reporter:	Paulo Andrade <pandrade>
Component:	pam	Assignee:	Tomas Mraz <tmraz>
Status:	CLOSED ERRATA	QA Contact:	Dalibor Pospíšil <dapospis>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.1	CC:	cww, dapospis, pandrade, pkis, roland.kaiser, sgrubb, tmraz
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	pam-1.1.8-14.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-04 03:31:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1203710, 1296594, 1313485

Description Paulo Andrade 2015-12-02 17:22:14 UTC

User has a perl script that works on Solaris.

  The script uses the perl dialect:

$) = $id;
$> = $id;

then authenticates successfully with PAM. It even prints
the proper audit message of success, but then causes the
PAM auth to fail.

  The failure is like this:

static int check_ack(int fd, int seq)
[...]
	rc = audit_get_reply(fd, &rep, GET_REPLY_NONBLOCKING, MSG_PEEK);
[...]
	else if (rc > 0 && rep.type == NLMSG_ERROR) {
		int error = rep.error->error;
		/* Eat the message */
		(void)audit_get_reply(fd, &rep, GET_REPLY_NONBLOCKING, 0);

		/* NLMSG_ERROR can indicate success, only report nonzero */ 
		if (error) {
			errno = -error;
			return error;
		}
	}
	return 0;

what happens is that "rc > 0" and "rep.type == NLMSG_ERROR";
where rc is the message length. Then it returns non zero, and
PAM fails.

  The rep.error->error is set in kernel, in:

int audit_get_reply(int fd, struct audit_reply *rep, reply_t block, int peek)
[...]
	len = recvfrom(fd, &rep->msg, sizeof(rep->msg), block|peek,
		(struct sockaddr*)&nladdr, &nladdrlen);
[...]
	len = adjust_reply(rep, len);

above, len is the message length;

  But the error flag is set at:

static int adjust_reply(struct audit_reply *rep, int len)
{
	rep->type     = rep->msg.nlh.nlmsg_type;

because rep->msg.nlh.nlmsg_type == NLMSG_ERROR

  This is a Request For Comment bug report because I can only
see that the entire PAM connection dies at that point, but
I could only track the issue following the execution under gdb.

  Since it is only the audit log that at the end causes the
entire PAM authentication to return PAM_SYSTEM_ERR, due to

int pam_authenticate(pam_handle_t *pamh, int flags)
[...]
    retval = _pam_dispatch(pamh, flags, PAM_AUTHENTICATE);

int _pam_dispatch(pam_handle_t *pamh, int flags, int choice)
[...]
#ifdef HAVE_LIBAUDIT
    if (choice != PAM_CHAUTHTOK || flags & PAM_UPDATE_AUTHTOK || retval != PAM_SUCCESS) {
	retval = _pam_auditlog(pamh, choice, retval, flags, h);
    }
#endif

int
_pam_auditlog(pam_handle_t *pamh, int action, int retval, int flags, struct handler *h)
[...]
  if (_pam_audit_writelog(pamh, audit_fd, type, message,
      grantors ? grantors : "?", retval) < 0)
    retval = PAM_SYSTEM_ERR;

  So, from my understanding, if PAM were built without
HAVE_LIBAUDIT defined, it would have worked.

  Either way, the root of the issue is euid != uid.


  As extra notes, written after above chunk, I did check a bit the
kernel side, and it looks like the bad value is returned due to:

---8<---
SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
		unsigned int, flags, struct sockaddr __user *, addr,
		int __user *, addr_len)
[...]
	err = sock_recvmsg(sock, &msg, size, flags);

	if (err >= 0 && addr != NULL) {
		err2 = move_addr_to_user(&address,
					 msg.msg_namelen, addr, addr_len);
		if (err2 < 0)
			err = err2;
	}

	fput_light(sock->file, fput_needed);
out:
	return err;
---8<---

  Following up, I see:

---8<---
static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
				 struct msghdr *msg, size_t size, int flags)
{
	int err = security_socket_recvmsg(sock, msg, size, flags);

	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags);
}
---8<---

int security_socket_recvmsg(struct socket *sock, struct msghdr *msg,
			    int size, int flags)
{
	return security_ops->socket_recvmsg(sock, msg, size, flags);
}

and then it will depend on "backend" security. Likely
selinux could flag an error due to euid != uid, e.g.:

static int selinux_socket_recvmsg(struct socket *sock, struct msghdr *msg,
				  int size, int flags)
{
	return sock_has_perm(current, sock->sk, SOCKET__READ);
}

static int sock_has_perm(struct task_struct *task, struct sock *sk, u32 perms)
{
[...]
	return avc_has_perm(tsid, sksec->sid, sksec->sclass, perms, &ad);
}


  I will ask the user about selinux, and possible related
messages.

Comment 1 Steve Grubb 2015-12-02 19:22:36 UTC

This is a very confusing bug report. So, you are saying that the user has a perl script that connects to a system and tries to authenticate and pam does not allow the script to access the system? What login service is this script connecting to?

Comment 2 Paulo Andrade 2015-12-02 19:38:43 UTC

  I understand it is difficult to understand the description.
  I tried to list my findings debugging the script, and
where the error code came from.

  The error code comes from the recvfrom call in audit_get_reply.
The message of success in pam authentication is printed to
/var/log/audit/audit.log but after that, due to the recvfrom
call, it ends with pam error code PAM_SYSTEM_ERR

  The RFC bug is to try to ask for some feedback, as I have
very few knowledge of pam and audit, just tried to describe
as much as possible what was going on.

  From my understanding, if pam were not built with audit
support, it likely would work, like the user says, it works
on Solaris.

  Maybe Roland Kaiser, the user with the problem :), can make
some extra comments?

Comment 3 Steve Grubb 2015-12-02 20:02:03 UTC

Is the perl script connecting to a system or is it a service that clients are connecting to? If its connecting to a system, I need to know what the service is to see what's wrong with it. If the script is the service authenticating clients, then it needs to have CAP_AUDIT_WRITE.

Comment 4 Roland 2015-12-04 09:46:01 UTC

The perl Script is a Service which is intended to authenticate incoming radius requests against an externel Active Directory using Kerberos.
So it is Kind of both...
What is CAP_AUDIT_WRITE and how do I enable it?

Comment 5 Steve Grubb 2015-12-04 18:51:41 UTC

CAP_AUDIT_WRITE is a posix capability. But before we go there, what uid & euid is the service running as when its hits this piece of code?

Comment 6 Roland 2015-12-07 07:38:12 UTC

This happens when uid=0 and euid=70
It works if uid=euid=0 or uid=euid=70

Comment 7 Roland 2015-12-11 13:19:36 UTC

Steve, do you have an update for me?

Comment 8 Steve Grubb 2015-12-14 22:38:44 UTC

The pam code has a loophole like this:

  if (rc < 0) {
      if (rc == -EPERM && getuid() != 0)
          return 0;

In reality, the getuid() != 0 is a quick and dirty replacement for checking the capabilities for CAP_AUDIT_WRITE. This is intended to be used by screensaver applications which run as the user and could not have the right capability.

Looking at man 7 capabilities, "Effect of user ID changes on capabilities", there is this section:

  2. If the effective user ID is changed from  0  to  nonzero,  then  all
     capabilities are cleared from the effective set.

I'm wondering if the loophole for screensavers should be using geteuid() != 0.

Comment 9 Tomas Mraz 2015-12-15 10:13:19 UTC

The question is whether the loophole should not be simply rc == -EPERM - is there any other possible circumstance than missing CAP_AUDIT_WRITE where the audit call would return -EPERM?

Comment 10 Steve Grubb 2015-12-15 13:57:46 UTC

The only thing I can think of in terms of pam is if selinux blocked it for some reason.

Comment 11 Roland 2015-12-15 14:40:50 UTC

selinux is disabled (triplechecked it...)
Can I disble writing to Audit somewhere - or - can I disable the cap_audit_write check globally just to test?

Comment 12 Steve Grubb 2015-12-15 14:48:03 UTC

Even though you have disabled it, we have to take into consideration everyone that uses it. You really ought to have it on but have your daemon in a permissive domain until there is policy for it. This way you have both selinux protection for most of the system and flexibility to run new things.

That said, I think the correct fix is to change pam's code as Tomas mentioned. I will also change the audit library over to using geteuid() for its checks. But libaudit is not your problem in this particular case.

Comment 13 Steve Grubb 2015-12-15 16:25:04 UTC

Moved this to pam to get the specific problem above fixed in the right component. Similar cases of this issue were fixed in the audit package with upstream commit 1135.

Comment 21 errata-xmlrpc 2016-11-04 03:31:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2314.html