Bug 1462371

Summary: systemd-nspawn cannot change user with EL6 chroot
Product: Red Hat Enterprise Linux 7 Reporter: Orion Poplawski <orion>
Component: systemdAssignee: Jan Synacek <jsynacek>
Status: CLOSED CANTFIX QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: fweimer, jskarvad, jsynacek, orion, pemensik, ralston, systemd-maint-list
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-08 10:32:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Orion Poplawski 2017-06-16 21:37:12 UTC
Description of problem:

# systemd-nspawn -D /var/lib/mock/epel-6-x86_64/root -u mockbuild /bin/bash
Spawning container root on /var/lib/mock/epel-6-x86_64/root.
Press ^] three times within 1s to kill container.
Container root failed with error code 1.

strace -f reveals:

[pid 29988] execve("/usr/bin/getent", ["getent", "initgroups", "mockbuild"], [/* 0 vars */]) = 0
...
[pid 29988] write(2, "Unknown database: initgroups\n", 29) = 29
[pid 29988] write(1, "Try `getent --help' or `getent -"..., 62) = 62
[pid 29988] exit_group(1)               = ?

Version-Release number of selected component (if applicable):
systemd-219-30.el7_3.9.x86_64

Latest mock uses systemd-nspawn by default so this causes with EL6 builds.

Comment 2 Jan Synacek 2017-06-29 05:37:26 UTC
There's no 'initgroups' database support in RHEL-6. That piece of code should probably be rewritten to use getgrouplist() directly, as spawning getent and then parsing the result seems a bit unnecessary. However, I'm not sure if that would help in case of RHEL-6.

Comment 3 Jan Synacek 2017-06-29 05:40:10 UTC
(In reply to Jan Synacek from comment #2)
> However, I'm not sure if that would help in case of RHEL-6.

The getgrouplist() function is present in RHEL-6.

Comment 4 Jan Synacek 2017-06-30 08:57:24 UTC
Orion, could you please try packages from https://jsynacek.fedorapeople.org/systemd/bz1462371/ ? I patched nspawn to not use getent.

Comment 5 Jan Synacek 2017-06-30 09:03:41 UTC
(In reply to Jan Synacek from comment #4)
> not use getent.

... not use 'getent initgroups' that is.

Comment 6 Jan Synacek 2017-06-30 11:31:26 UTC
https://github.com/systemd/systemd/pull/6248

Comment 7 Orion Poplawski 2017-06-30 20:45:25 UTC
That seems to do the trick.  Get some other error messages but the command succeeds:

# systemd-nspawn -D /var/lib/mock/epel-6-x86_64/root -u mockbuild /bin/bash
Spawning container root on /var/lib/mock/epel-6-x86_64/root.
Press ^] three times within 1s to kill container.
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
[mockbuild@barry /]$ 

Thanks for the quick response!

Comment 8 Jan Synacek 2017-07-03 06:59:10 UTC
My patch was denied upstream, but I still think that it's safe for RHEL7. The arguments against this patch:

1) Host and container differ in archs.
   I don't really get this one. Such container wouldn't run on the host without some kind of weird magic on the in-between layers.

2) NSS modules might be linked against a different libc.
   Not applicable / supported on RHEL-7 as far as I know.

3) /etc/nsswitch.conf lists a very different set of modules than the host, but if the configuration is already loaded earlier before the transition into the container namespace then everything will be confusion.
   This might be a problem, but I can't think of a way to break nspawn using this. Furthermore, "if the transition is loaded" is way too speculative. It either is, or is not loaded.

If we ever manage to show that the patch is still a bad idea in RHEL-7, we can write a small binary that exactly what "getent initgroups" does and ship it. Or the best solution - persuade glibc team to backport the getent functionality.

Comment 9 Jan Synacek 2017-07-03 07:02:16 UTC
Florian, would it be possible to backport "getent initgroups" to RHEL-7? It looks  just like a tiny wrapper to getgrouplist() to me.

Comment 10 Jan Synacek 2017-07-03 07:08:39 UTC
Orion, I'm just curios, do you also have the user "mockbuild" on the host, or only inside the container?

Comment 11 Florian Weimer 2017-07-03 08:55:02 UTC
(In reply to Jan Synacek from comment #9)
> Florian, would it be possible to backport "getent initgroups" to RHEL-7? It
> looks  just like a tiny wrapper to getgrouplist() to me.

What exactly do you need?  I see this on Red Hat Enterprise Linux 7.3:

$ getent initgroups fweimer
fweimer               1076 1070 5356 5845 18797 235 5319

Comment 12 Jan Synacek 2017-07-03 10:10:04 UTC
Oh, my bad... I meant RHEL-6. And to answer myself: Since RHEL-6 is in Production Phase 3 now, the backport is not going to happen. Sorry for the confusion.

Comment 13 Jan Synacek 2017-07-03 10:14:07 UTC
To work around the original problem, you would have to create /usr/local/bin/getent in the container, that would call the real getent for everything except the "initgroups" argument. The initgroups would then have to be implemented in the script. But that's way too hairy.

Comment 14 Jan Synacek 2017-07-03 10:15:42 UTC
(In reply to Jan Synacek from comment #8)
> If we ever manage to show that the patch is still a bad idea in RHEL-7, we
> can write a small binary that exactly what "getent initgroups" does and ship
> it. Or the best solution - persuade glibc team to backport the getent
> functionality.

To make things clear, the above would have to be done for RHEL-6 in this case, which is not possible anymore. Again, sorry for the confusion.

Comment 16 James Ralston 2017-07-21 00:04:21 UTC
This problem isn't limited to RHEL6. We have RHEL5 ELS, and are still building RHEL5 packages. And systemd-nspawn breaks RHEL5 mock builds, too.

Here's the fundamental issue: when systemd-nspawn executes a command (instead of a full OS instance), it assumes that the filesystem in the container is based on a Linux distro that is reasonably recent.

But mock is a tool to build packages for arbitrary Linux distributions, including ones that aren't in any way recent (like RHEL5).

Therefore, mock MUST NOT use systemd-nspawn to execute build commands. systemd-nspawn is not simply a "better" or "more powerful" flavor of chroot; it is a DIFFERENT flavor of chroot. And those differences break mock.

If the mock developers really want to use systemd-nspawn to perform builds, then they need to do it the right way: build a full OS image and use "systemd-nspawn --boot".

If they're unwilling to do that, then mock should revert to just using chroot(1) to perform builds.

Hacking systemd to not break mock's [mis]use of systemd-nspawn to execute simple commands is the wrong approach. This is a mock bug, not a systemd bug.

Comment 17 Jan Synacek 2017-07-24 08:00:19 UTC
(In reply to James Ralston from comment #16)
> Therefore, mock MUST NOT use systemd-nspawn to execute build commands.

Feel free to file a bugzilla for mock. I can't do anything with that in this one.

Comment 18 Petr Menšík 2018-01-17 15:45:26 UTC
Bug #1535550 for mock was filed.

Comment 19 Jaroslav Škarvada 2018-01-17 15:50:09 UTC
(In reply to Petr Menšík from comment #18)
> Bug #1535550 for mock was filed.

It's filled against rhpkg.