1063939 – We need a get_peer_cgroup syscall to get container information from calling process.

Bug 1063939 - We need a get_peer_cgroup syscall to get container information from calling process. [NEEDINFO]

Summary: We need a get_peer_cgroup syscall to get container information from calling p...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	22
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1142480
TreeView+	depends on / blocked

Reported:	2014-02-11 16:29 UTC by Daniel Walsh
Modified:	2015-11-23 17:08 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-11-23 17:08:02 UTC
Type:	Bug
Embargoed:
Dependent Products:
Flags:	jforbes: needinfo?

Attachments	(Terms of Use)

Description Daniel Walsh 2014-02-11 16:29:52 UTC

As we move towards using docker, we need a mechanism to identify the name of the container a process is calling from.  Systemd is setting this name up based on the default container for the process.  

We need a race free way for tools like sssd to be able to get information about the calling process from another daemon like the docker daemon or systemd. 

sssd wants to allow an administrator to be able to set rules governing the information provided to a process by sssd based on the container the process is running within.

I have heard that there was a patch to provide cgroups and other information based on the calling socket, but it did not get upstream.

Comment 1 Josh Boyer 2014-02-12 18:49:58 UTC

Tejun, do you know what patches (and what functionality) Dan is talking about here?

Comment 2 Kay Sievers 2014-02-12 21:14:51 UTC

The patch was about *attaching* SCM data to a sent af_unix packet:
  http://lists.openwall.net/netdev/2014/01/13/32

To *query* peer data, something else it needed, which I don't think
exists at the moment.

This is basically SO_PASSCRED vs. SO_PEERCRED, one is *passing* from the sender
attaching to the packet, the other one is *querying* one connection from the
other.

LSMs can do SO_PASSSEC vs. SO_PEERSEC for the seclabel (string), which is
conceptually similar to cgroup information.

So I guess SO_PASSCGROUP and SO_PEERCGROUP is what Dan wants.

Comment 3 Alexander Larsson 2014-02-17 11:08:14 UTC

I think SO_PEERCGROUP is what we need really. Current users of e.g. SSSD will not be attaching cgroup data when they connect to ask for user data, yet we still want to be able to read back the remote cgroup so that we can match it to a particular container.

Comment 4 Vivek Goyal 2014-02-20 22:28:16 UTC

(In reply to Daniel Walsh from comment #0)
> As we move towards using docker, we need a mechanism to identify the name of
> the container a process is calling from.  Systemd is setting this name up
> based on the default container for the process.  
> 
> We need a race free way for tools like sssd to be able to get information
> about the calling process from another daemon like the docker daemon or
> systemd. 

What does race free mean here. IOW, why we could not lookup /proc/pid/cgroup.

If process is on other end of socket, then this information should still be
available.

Process might change the cgroup in mean time, but some kind of SO_PEERCGROUP
mechanism will not protect against that either.

Comment 5 Alexander Larsson 2014-02-20 22:36:23 UTC

Its the standard race involving pids. The process may have exited and the pid reused, so you'd be looking at data for another process.

Comment 6 Simo Sorce 2014-02-21 03:06:26 UTC

(In reply to Alexander Larsson from comment #5)
> Its the standard race involving pids. The process may have exited and the
> pid reused, so you'd be looking at data for another process.

Exactly, and to make it more concrete:

PID-1000 connects to the party that wants to do the check and that party gets the peer's pid and then starts trolling /proc

In the meanwhile PID-1000 forks PID-1001 and exits.

The checking party still have the socket open (the child, PID-1001 hasn't closed it) however PID-1000 has gone, now any other process can be spawn as pid 1000.

You have 2 possible races here:
1. you get the wrong cgroups because another process raced.
2. you fail to find cgroups information because /proc/1000 is gone before you are done.

In both cases you fail.
The first case opens up potential security issues, the second just very hard to manage failure cases.

Comment 7 Vivek Goyal 2014-02-21 13:33:33 UTC

- What controller cgroup you are looking for. IIUC, patch mentioned above will return cgroup of one hierarchy that too lowest numbered hierarchy.

- How a cgroup path maps to a container.

Comment 8 Vivek Goyal 2014-02-21 14:05:35 UTC

can we take uid, figure out uid namespace it belongs to and use that to map to right container. 

This ofcourse assumes that we are using uid namespaces.

Comment 9 Daniel Walsh 2014-02-21 15:55:43 UTC

We want the first cgroup.

cat /proc/self/cgroup 
11:hugetlb:/
10:perf_event:/
9:blkio:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:cpu,cpuacct:/
3:cpuset:/
2:name=systemd:/user.slice/user-3267.slice/session-2.scope

Meaning

2:name=systemd:/user.slice/user-3267.slice/session-2.scope

Or give us them all.

Systemd creates the first cgroup to stick processes in with a name that can be used to lookup docker information when we are running in a container.

Comment 10 Vivek Goyal 2014-02-21 17:58:07 UTC

I am reading man socket(7). It says following.


       SO_PEERCRED
              Return the credentials of the foreign process connected to  this
              socket.   This  is  possible  only  for connected AF_UNIX stream
              sockets and AF_UNIX stream and  datagram  socket  pairs  created
              using  socketpair(2); see unix(7).  The returned credentials are
              those that were in effect at the time of the call to  connect(2)
              or socketpair(2).  The argument is a ucred structure; define the
              GNU_SOURCE feature test macro to obtain the definition  of  that
              structure from <sys/socket.h>.  This socket option is read-only.


So SO_PEERCRED gives credentials of that were in effect at the time of connect() call.

That means SO_PEERCGROUP() will also return the cgroup of process which called
connect().

After that that process could change cgroups, pass the socket fd to another
process to communicate etc.

I am wondering, if it possible that after opening the connection, this processes passes file descriptor to another process running in a different container and in effect fake the cgroup/container request is coming from.

Comment 11 Simo Sorce 2014-02-21 18:07:07 UTC

(In reply to Daniel Walsh from comment #9)
> We want the first cgroup.
> 
> cat /proc/self/cgroup 
> 11:hugetlb:/
> 10:perf_event:/
> 9:blkio:/
> 8:net_cls:/
> 7:freezer:/
> 6:devices:/
> 5:memory:/
> 4:cpu,cpuacct:/
> 3:cpuset:/
> 2:name=systemd:/user.slice/user-3267.slice/session-2.scope
> 
> Meaning
> 
> 2:name=systemd:/user.slice/user-3267.slice/session-2.scope
> 
> Or give us them all.
> 
> Systemd creates the first cgroup to stick processes in with a name that can
> be used to lookup docker information when we are running in a container.

For a system call we want all of them, we can't predict what it will be used for beyond our need.

Comment 12 Simo Sorce 2014-02-21 18:08:52 UTC

(In reply to Vivek Goyal from comment #10)
> I am reading man socket(7). It says following.
> 
> 
>        SO_PEERCRED
>               Return the credentials of the foreign process connected to 
> this
>               socket.   This  is  possible  only  for connected AF_UNIX
> stream
>               sockets and AF_UNIX stream and  datagram  socket  pairs 
> created
>               using  socketpair(2); see unix(7).  The returned credentials
> are
>               those that were in effect at the time of the call to 
> connect(2)
>               or socketpair(2).  The argument is a ucred structure; define
> the
>               GNU_SOURCE feature test macro to obtain the definition  of 
> that
>               structure from <sys/socket.h>.  This socket option is
> read-only.
> 
> 
> So SO_PEERCRED gives credentials of that were in effect at the time of
> connect() call.
> 
> That means SO_PEERCGROUP() will also return the cgroup of process which
> called
> connect().
> 
> After that that process could change cgroups, pass the socket fd to another
> process to communicate etc.
> 
> I am wondering, if it possible that after opening the connection, this
> processes passes file descriptor to another process running in a different
> container and in effect fake the cgroup/container request is coming from.

It is ok to just get the orginal process context. you can always play games with untrusted applications (they can proxy all communication if they are malicious). What matters is that we can reliably detect that it is untrusted (for example by virtue of running in a container).

Comment 13 Vivek Goyal 2014-02-21 18:29:17 UTC

(In reply to Simo Sorce from comment #12)
> (In reply to Vivek Goyal from comment #10)
> > I am reading man socket(7). It says following.
> > 
> > 
> >        SO_PEERCRED
> >               Return the credentials of the foreign process connected to 
> > this
> >               socket.   This  is  possible  only  for connected AF_UNIX
> > stream
> >               sockets and AF_UNIX stream and  datagram  socket  pairs 
> > created
> >               using  socketpair(2); see unix(7).  The returned credentials
> > are
> >               those that were in effect at the time of the call to 
> > connect(2)
> >               or socketpair(2).  The argument is a ucred structure; define
> > the
> >               GNU_SOURCE feature test macro to obtain the definition  of 
> > that
> >               structure from <sys/socket.h>.  This socket option is
> > read-only.
> > 
> > 
> > So SO_PEERCRED gives credentials of that were in effect at the time of
> > connect() call.
> > 
> > That means SO_PEERCGROUP() will also return the cgroup of process which
> > called
> > connect().
> > 
> > After that that process could change cgroups, pass the socket fd to another
> > process to communicate etc.
> > 
> > I am wondering, if it possible that after opening the connection, this
> > processes passes file descriptor to another process running in a different
> > container and in effect fake the cgroup/container request is coming from.
> 
> It is ok to just get the orginal process context. you can always play games
> with untrusted applications (they can proxy all communication if they are
> malicious). What matters is that we can reliably detect that it is untrusted
> (for example by virtue of running in a container).

But the requirement here seems to be that if we find cgroup, we can find associated container and then we can implement container specific policies and based on that decide whether to service the request or not or how to service it.

So all request are coming from containers but we want to differentiate one container from other.

And passing socketfd between processes of container breaks the assumptions here. Then one can as well just do /proc/pid/cgroup. In some cases you will get wrong container (if pid got reassigned). And same thing will happen with file descriptor passing. You will believe request is coming from one container but actually it is coming from another one.

Comment 14 Vivek Goyal 2014-02-21 18:32:40 UTC

(In reply to Simo Sorce from comment #11)
> > 
> > Systemd creates the first cgroup to stick processes in with a name that can
> > be used to lookup docker information when we are running in a container.
> 
> For a system call we want all of them, we can't predict what it will be used
> for beyond our need.

I heard that there is a suggestion that a new systemcall be implemented to get this data. Are there any details for that discussion. What kind of system call, what are the parameters, what systemcall is supposed to return.

And why system call is better than what has already been suggested. SO_PEERCGROUP.

Comment 15 Simo Sorce 2014-02-21 19:23:20 UTC

(In reply to Vivek Goyal from comment #13)
> (In reply to Simo Sorce from comment #12)
> > (In reply to Vivek Goyal from comment #10)
> > > I am reading man socket(7). It says following.
> > > 
> > > 
> > >        SO_PEERCRED
> > >               Return the credentials of the foreign process connected to 
> > > this
> > >               socket.   This  is  possible  only  for connected AF_UNIX
> > > stream
> > >               sockets and AF_UNIX stream and  datagram  socket  pairs 
> > > created
> > >               using  socketpair(2); see unix(7).  The returned credentials
> > > are
> > >               those that were in effect at the time of the call to 
> > > connect(2)
> > >               or socketpair(2).  The argument is a ucred structure; define
> > > the
> > >               GNU_SOURCE feature test macro to obtain the definition  of 
> > > that
> > >               structure from <sys/socket.h>.  This socket option is
> > > read-only.
> > > 
> > > 
> > > So SO_PEERCRED gives credentials of that were in effect at the time of
> > > connect() call.
> > > 
> > > That means SO_PEERCGROUP() will also return the cgroup of process which
> > > called
> > > connect().
> > > 
> > > After that that process could change cgroups, pass the socket fd to another
> > > process to communicate etc.
> > > 
> > > I am wondering, if it possible that after opening the connection, this
> > > processes passes file descriptor to another process running in a different
> > > container and in effect fake the cgroup/container request is coming from.
> > 
> > It is ok to just get the orginal process context. you can always play games
> > with untrusted applications (they can proxy all communication if they are
> > malicious). What matters is that we can reliably detect that it is untrusted
> > (for example by virtue of running in a container).
> 
> But the requirement here seems to be that if we find cgroup, we can find
> associated container and then we can implement container specific policies
> and based on that decide whether to service the request or not or how to
> service it.
> 
> So all request are coming from containers but we want to differentiate one
> container from other.
> 
> And passing socketfd between processes of container breaks the assumptions
> here. Then one can as well just do /proc/pid/cgroup. In some cases you will
> get wrong container (if pid got reassigned). And same thing will happen with
> file descriptor passing. You will believe request is coming from one
> container but actually it is coming from another one.

There is nothing we can do in that case anyway, again a container can siply "proxy" all the information to another container anyway.

If the container is compromised all the information and access level of that container are, that's a given.

That is not an issue we really (can/should) care about.

Comment 16 Simo Sorce 2014-02-21 19:24:35 UTC

(In reply to Vivek Goyal from comment #14)
> (In reply to Simo Sorce from comment #11)
> > > 
> > > Systemd creates the first cgroup to stick processes in with a name that can
> > > be used to lookup docker information when we are running in a container.
> > 
> > For a system call we want all of them, we can't predict what it will be used
> > for beyond our need.
> 
> I heard that there is a suggestion that a new systemcall be implemented to
> get this data. Are there any details for that discussion. What kind of
> system call, what are the parameters, what systemcall is supposed to return.
> 
> And why system call is better than what has already been suggested.
> SO_PEERCGROUP.

I am not advocating anything different, I just improperly called "SO_PEERCGROUP" a syustem call, which is in my mind as it requires kernel work.
It's just that it is not a new one, but just wrapped into an existing call.

Comment 17 Vivek Goyal 2014-02-21 19:38:59 UTC

(In reply to Simo Sorce from comment #15)
> 
> There is nothing we can do in that case anyway, again a container can siply
> "proxy" all the information to another container anyway.
> 
> If the container is compromised all the information and access level of that
> container are, that's a given.
> 
> That is not an issue we really (can/should) care about

Sorry, I still don't understand. Are we not assuming by default that we are
not trusting root process inside container. If that's not the case, and we are trusting the process inside the container.

- Then we can trust that it will be around till it gets the service and /proc/pid/cgroup will be valid.

- Or we can just ask the requester for the cgroup in a message. (We trust the caller).

And if we don't trust the requester, then it does not matter whether requester got compromised or not and SO_PEERCGROUP is as good as doing /proc/pid/cgroup.

Comment 18 Vivek Goyal 2014-02-21 20:10:11 UTC

I am wondering if it is true that if short lived process exited after establishing connection with server, then exited process's pid can be reused.

I am looking at SO_PEERCRED code in net/core/sock.c


        case SO_PEERCRED:
        {
                struct ucred peercred;
                if (len > sizeof(peercred))
                        len = sizeof(peercred);
                cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
                if (copy_to_user(optval, &peercred, len))
                        return -EFAULT;
                goto lenout;
        }


sk->sk_peer_pid contains the pid information which is copied to user space. And sk->sk_peer_pid is a pointer to "struct pid". If that's the case, then we must
have a reference to "struct pid" which probably will be released only when
connection/socket is closed.

If we are holding reference to "struct pid" of the process which called connect() on socket, then I think we can't reuse that pid till laster reference to "struct pid" is dropped.

If yes, than we should not have the concern regarding the race.

I am not familiar with networking code. I will spend some more time on this.

Comment 19 Simo Sorce 2014-02-21 20:14:01 UTC

(In reply to Vivek Goyal from comment #17)

> Sorry, I still don't understand. Are we not assuming by default that we are
> not trusting root process inside container.

We are not, the semantics we need are identical to SO_PEERCRED, which is why we proposed SO_PEERCGROUPS

> If that's not the case, and we are trusting the process inside the container.

We are not trusting the process.

> - Then we can trust that it will be around till it gets the service and
> /proc/pid/cgroup will be valid.

No, going via /proc has nothing to do with trusting the process or not, it is about races.

> - Or we can just ask the requester for the cgroup in a message. (We trust
> the caller).

We do *not* trust the caller, privilege escalation lay there.

> And if we don't trust the requester, then it does not matter whether
> requester got compromised or not and SO_PEERCGROUP is as good as doing
> /proc/pid/cgroup.

Please read documentation about SO_PEERCRED, it really is exactly the same trust scenario.

What we want to know is what are the "credentials/cgroups" of the process that made the first contact, we trust that a correctly configured system does not allow that process to gain more/different privileges than it has, only drop them or pass information around to other processes of the same privilege level.

In particular, an attacker cannot (through races) pretend to be the original process, that is all we care about.

HTH

Comment 20 Simo Sorce 2014-02-21 20:16:25 UTC

(In reply to Vivek Goyal from comment #18)
> I am wondering if it is true that if short lived process exited after
> establishing connection with server, then exited process's pid can be reused.
> 
> I am looking at SO_PEERCRED code in net/core/sock.c
> 
> 
>         case SO_PEERCRED:
>         {
>                 struct ucred peercred;
>                 if (len > sizeof(peercred))
>                         len = sizeof(peercred);
>                 cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
>                 if (copy_to_user(optval, &peercred, len))
>                         return -EFAULT;
>                 goto lenout;
>         }
> 
> 
> sk->sk_peer_pid contains the pid information which is copied to user space.
> And sk->sk_peer_pid is a pointer to "struct pid". If that's the case, then
> we must
> have a reference to "struct pid" which probably will be released only when
> connection/socket is closed.
> 
> If we are holding reference to "struct pid" of the process which called
> connect() on socket, then I think we can't reuse that pid till laster
> reference to "struct pid" is dropped.
> 
> If yes, than we should not have the concern regarding the race.
> 
> I am not familiar with networking code. I will spend some more time on this.

Once you qualify that "we" I think you'll find out that when "we" is the kernel in a blockind system call you have completely semantics than when "we" is a user space process that just has a reference to a file descriptor.

Comment 21 Vivek Goyal 2014-02-21 20:22:41 UTC

> What we want to know is what are the "credentials/cgroups" of the process
> that made the first contact, we trust that a correctly configured system
> does not allow that process to gain more/different privileges than it has,
> only drop them or pass information around to other processes of the same
> privilege level.
> 
> In particular, an attacker cannot (through races) pretend to be the original
> process, that is all we care about.

I get that. But you are not explaining that how cgroup information is useful
given the fact that subsequent messages on the same socket can come from a
different process in a different container.

Give me an example how this information will be used and why it is not a concern that next request came from a different process in a different container.

Comment 22 Vivek Goyal 2014-02-21 20:32:45 UTC

(In reply to Simo Sorce from comment #20)
> 
> Once you qualify that "we" I think you'll find out that when "we" is the
> kernel in a blockind system call you have completely semantics than when
> "we" is a user space process that just has a reference to a file descriptor.

By "we", I meant kernel. 

I am not sure what are you trying to say.

What I meant was that looks like kernel will have a reference to "struct pid" and that will be dropped when server closes the socket fd. (Which it must have received upon successful completion of accept()).

If that's the case, we should not have above said pid reuse race.

But again, I am not an networking code expert. I might be wrong here.

I will be glad if somebody with better understanding of code can pitch in here and clarify things.

Comment 23 Simo Sorce 2014-02-21 20:39:35 UTC

> I get that. But you are not explaining that how cgroup information is useful
> given the fact that subsequent messages on the same socket can come from a
> different process in a different container.
> 
> Give me an example how this information will be used and why it is not a
> concern that next request came from a different process in a different
> container.

It really is exactly the same as with SO_PEERCRED, what we care about is what are the original crede ntials of the process that opened the socket, those binds our "truste level" in the process.

For the docker case assume we have 2 containers, of 2 competitors.

Foo Inc and Bar LLC

What matters for us is to know that process A was executed in the containker associate to Foo Inc and not Bar LLC, so that we send information about Foo Inc only.

That information is encoded in the container name (through mapping of container name in cgroups to a map of who owns them, details out of scope here).

If the Foo Inc container process sudenly decides to betray its own company and hand all the data to a process belonging to Bar LLC by passing it a socket, that is not our problem.

HTH

Comment 24 Dmitri Pal 2014-02-21 22:09:56 UTC

Just to give a bigger picture:

1) A container is started by Docker
2) Docker invokes systemd
3) Systemd starts container processes and records cgroups
4) Processes inside the container connect to a shared service on the host
5) Shared service on the host needs to ask kernel who is connecting thus the call requested here
6) Once the cgroup is known to the shared service it can connect to docker and query metadata related to the container
7) Once matadata is acquired the service can make a decision what level of service can be provided to the process coming from that container.

HTH

Dmitri

Comment 25 Vivek Goyal 2014-02-24 20:18:46 UTC

(In reply to Simo Sorce from comment #23)
> 
> It really is exactly the same as with SO_PEERCRED, what we care about is
> what are the original crede ntials of the process that opened the socket,
> those binds our "truste level" in the process.
> 
> For the docker case assume we have 2 containers, of 2 competitors.
> 
> Foo Inc and Bar LLC
> 
> What matters for us is to know that process A was executed in the containker
> associate to Foo Inc and not Bar LLC, so that we send information about Foo
> Inc only.
> 
> That information is encoded in the container name (through mapping of
> container name in cgroups to a map of who owns them, details out of scope
> here).
> 
> If the Foo Inc container process sudenly decides to betray its own company
> and hand all the data to a process belonging to Bar LLC by passing it a
> socket, that is not our problem.

Thanks. This explanation makes lot more sense.

Vivek

Comment 26 Vivek Goyal 2014-02-24 20:47:14 UTC

(In reply to Vivek Goyal from comment #22)
> 
> What I meant was that looks like kernel will have a reference to "struct
> pid" and that will be dropped when server closes the socket fd. (Which it
> must have received upon successful completion of accept()).
> 

Ok, reading up the code little bit more. In short it looks like that while
we have reference to "struct pid" that does not mean associated pid_t
has not been freed and can't be reused.

It looks like that pid_t can be freed when task is exiting while "struct pid"
can still be around. If that's the case, the above said race will be present.

Comment 27 Lennart Poettering 2014-02-24 21:27:49 UTC

I am pretty sure SO_PEERCGROUPS is pretty useless, it's not any better than just combining SO_PEERCRED with reading /proc/$PID/cgroup. 

What would be really interesting though is Jan Kaluza's patches for SCM_CGROUPS, since they fix a real race: when we want to know the cgroup membership of a sender of a syslog message at the time when it sent it that's the only race-free way to do that.

I'd really like to see Jan's work finished, it would be of great benefit to the journal.

Also see bugs 963620, 1026830

Comment 28 Vivek Goyal 2014-02-24 21:53:13 UTC

(In reply to Lennart Poettering from comment #27)
> I am pretty sure SO_PEERCGROUPS is pretty useless, it's not any better than
> just combining SO_PEERCRED with reading /proc/$PID/cgroup. 

Why do you think it is useless. It tells you the cgroup property of process
who opened the connection. It is better than /proc/$PID/cgroup as it does
not seem susceptible to pid reuse race.

But it seems to be limited to stream unix sockets only. So it seems less
useful as compared to SCM_CREDENTIALS or SCM_CGROUPS.

IIUC, SO_PEERCRED is relatively lightweight as it is property of connection
at the time of opening the connection. While all SCM_* are per message
property of client. So they incur more overhead also.

Dave Miller does not seem to like Jan's patches because he is worried about
overhead it will incur or all other usages which don't require SCM_CGROUP
and other friends.

https://lkml.org/lkml/2014/1/15/480

If SO_PEERCRED is useful for the kind of application we are thinking, it
might impose less overhead as it is a per connection property and not per
message property.

> 
> What would be really interesting though is Jan Kaluza's patches for
> SCM_CGROUPS, since they fix a real race: when we want to know the cgroup
> membership of a sender of a syslog message at the time when it sent it
> that's the only race-free way to do that.
> 
> I'd really like to see Jan's work finished, it would be of great benefit to
> the journal.
>

If requirement is being able to get credentials per message even on datagram
sockets, then yes, SO_PEERCRED is no good. I am not sure though how would one
convince Dave Miller.

SO_PASSCRED seems to be a server side option which decides whether server
wants to receive ancillary messages or not. I am wondering if we can create
a client side socket option where client opts in whether to send SCM_CGROUP or not and somehow client needs to know that it is operating in container environment and to get service it needs to enable SCM_CGROUP option.

That way existing applications in existing setup will not incur penalty of
these new ancillary messages. 

> Also see bugs 963620, 1026830

I will look at these bugs.

Comment 29 Lennart Poettering 2014-02-24 21:57:54 UTC

(In reply to Vivek Goyal from comment #28)
> (In reply to Lennart Poettering from comment #27)
> > I am pretty sure SO_PEERCGROUPS is pretty useless, it's not any better than
> > just combining SO_PEERCRED with reading /proc/$PID/cgroup. 
> 
> Why do you think it is useless. It tells you the cgroup property of process
> who opened the connection. It is better than /proc/$PID/cgroup as it does
> not seem susceptible to pid reuse race.
> 
> But it seems to be limited to stream unix sockets only. So it seems less
> useful as compared to SCM_CREDENTIALS or SCM_CGROUPS.
> 
> IIUC, SO_PEERCRED is relatively lightweight as it is property of connection
> at the time of opening the connection. While all SCM_* are per message
> property of client. So they incur more overhead also.
> 
> Dave Miller does not seem to like Jan's patches because he is worried about
> overhead it will incur or all other usages which don't require SCM_CGROUP
> and other friends.
> 
> https://lkml.org/lkml/2014/1/15/480

Note the suggestsions Kay made:

https://lkml.org/lkml/2014/1/23/471

I.e. this could certainly be something to make opt-in, without too much effort. I'd really like to see that implemented. With that logic journald could enable this, but nobody else would have to pay the price for it.

Comment 30 Jaroslav Reznik 2015-03-03 15:28:26 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 31 Justin M. Forbes 2015-10-20 19:07:20 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 32 Fedora Kernel Team 2015-11-23 17:08:02 UTC

*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Note You need to log in before you can comment on or make changes to this bug.