Bug 1352064 - Thread local state doesn't get reset across fork
Summary: Thread local state doesn't get reset across fork
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: libcap-ng
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Steve Grubb
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1351995 1352066
TreeView+ depends on / blocked
 
Reported: 2016-07-01 14:23 UTC by Daniel Berrangé
Modified: 2017-11-14 16:42 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1352066 (view as bug list)
Environment:
Last Closed: 2017-11-14 16:42:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Use pthread_atfork to reset global state (833 bytes, patch)
2016-07-01 14:55 UTC, Daniel Berrangé
no flags Details | Diff

Description Daniel Berrangé 2016-07-01 14:23:42 UTC
Description of problem:
The latest version of audit-libs has started using capng_have_capability() before sending audit messages.

This has in turn exposed a bug in libcap-ng's use of thread local state.

Specifically, if you call any capng_* function and then call fork(), the 

static __thread struct cap_ng m

will get initialized with the current PID.

If you then fork() and call another capng API, (for example, you're trying to drop caps for a child process you're spawning), this thread local doesn't get reset. 

As a result capng_apply() will try to set capabilities on the parent PID and gets an error

strace clearly shows the problem:

[pid 26690] capset({_LINUX_CAPABILITY_VERSION_3, 26689}, {0, 0, 0}) = -1 EPERM (Operation not permitted)


not it is runing pid 26690 but setting caps on 26689.

AFAICT, this bug has existed forever. The new audit-libs 2.6.1 has just started calling capng_have_capability() which has in turn broken libvirt which uses capng when spawning processes after fork.

The following demo shows the problem

#include <libaudit.h>
#include <cap-ng.h>

#include <stdio.h>
#include <sys/wait.h>


int main(int argc, char **argv)
{
  int fd = audit_open();

  if (fd < 0) {
    perror("audit_open");
    return 1;
  }

  audit_log_user_message(fd, AUDIT_VIRT_CONTROL, "test", NULL,
			 NULL, NULL, 1);

  close(fd);

  pid_t child = fork();

  if (child == 0) {
    capng_clear(CAPNG_SELECT_CAPS);
    if (capng_apply(CAPNG_SELECT_CAPS) < 0) {
      perror("capng_apply");
      _exit(1);
    }
    
    _exit(0);
  }

  int status = 0;
  waitpid(child, &status, 0);

  fprintf(stderr, "Child exited %d\n", status);
  return 0;
}


$ gcc -lcap-ng -laudit -o cap cap.c
[berrange@t530wlan ~]$ ./cap
capng_apply: Operation not permitted
Child exited 256


If you downgrade to audit-libs-2.6 the problem goes away, though clearly the bug in cap-ng still exists.


Version-Release number of selected component (if applicable):
audit-libs-2.6.1-1.fc24.x86_64
audit-libs-2.6.1-1.fc24.i686
libcap-ng-0.7.7-4.fc24.x86_64
libcap-ng-0.7.7-4.fc24.i686

Comment 1 Steve Grubb 2016-07-01 14:46:14 UTC
As for the libvirt issue, the audit package has been fixed to save and restore state in upstream commit 1304. A new audit package will be released shortly. See bz 1351954.

Comment 2 Daniel Berrangé 2016-07-01 14:55:30 UTC
Created attachment 1174946 [details]
Use pthread_atfork to reset global state

Comment 3 Jan Kurik 2016-07-26 05:04:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 4 Steve Grubb 2017-11-14 16:42:01 UTC
Fixed in upstream commit 7759e6f. Going to close this as upstream.

Daniel, thanks for pointing out the bug and a potential solution. What was committed used pthread_atfork as a weak symbol to fix things for people using pthreads. But also to document the issue in a man page explaining the problem and 2 potential solutions.


Note You need to log in before you can comment on or make changes to this bug.