Bug 961052 - lvm2-activation-generator causes errors inside containers
Summary: lvm2-activation-generator causes errors inside containers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Peter Rajnoha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 927331 (view as bug list)
Depends On:
Blocks: 987753
TreeView+ depends on / blocked
 
Reported: 2013-05-08 15:54 UTC by Daniel Berrangé
Modified: 2013-07-24 05:48 UTC (History)
13 users (show)

Fixed In Version: lvm2-2.02.98-9.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 987753 (view as bug list)
Environment:
Last Closed: 2013-05-24 20:37:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Strace of /usr/lib/systemd/system-generators/lvm2-activation-generator (50.47 KB, text/plain)
2013-05-09 09:30 UTC, Daniel Berrangé
no flags Details
Skip generator if inside a container (1.31 KB, patch)
2013-05-09 09:38 UTC, Daniel Berrangé
no flags Details | Diff

Description Daniel Berrangé 2013-05-08 15:54:36 UTC
Description of problem:
When starting a Linux container sandbox on Fedora 19, we get permission denied errors about '/dev/mapper/control'

# virt-sandbox-service start myhttpd
RTNETLINK answers: Invalid argument
systemd 202 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Detected virtualization 'lxc-libvirt'.

Welcome to Fedora 19 (Schrödinger’s Cat)!

Set hostname to <myhttpd>.
  /dev/mapper/control: mknod failed: Operation not permitted
  Failure to communicate with kernel device-mapper driver.
  Check that device-mapper is available in the kernel.


strace'ing thing revealed that this comes from

/usr/lib/systemd/system-generators/lvm2-activation-generator

With containers the 'mknod' capability is blocked from everything inside the container, including systemd and its helper programs, since containers are not allowed to create device nodes. Any allowed device nodes are pre-created by libvirt, and this does *not* include /dev/mapper/control.

Systemd itself is careful to look at whether it has the 'mknod' capability before attempting to run something that requires that privilege, so it doesn't cause boot errors.

The lvm2-activation-generator is not being so careful though.  We need it to be fixed so that it is silent + error-free when run in a container environment where /dev/mapper/control does not exist & is not allowed to be created.


Version-Release number of selected component (if applicable):
lvm2-2.02.98-7.fc19.x86_64

How reproducible:
Always

Steps to Reproduce:
1. virt-sandbox-service create -C -u httpd.service myhttpd
2. virt-sandbox-service start myhttpd
3.
  
Actual results:
During container start the following errors appear

  /dev/mapper/control: mknod failed: Operation not permitted
  Failure to communicate with kernel device-mapper driver.
  Check that device-mapper is available in the kernel.


Expected results:
No errors from LVM

Additional info:

Comment 1 Alasdair Kergon 2013-05-08 16:53:25 UTC
So why is this trying to use lvm in a container, if you know it isn't allowed?  Shouldn't you be excluding lvm?

Comment 2 Alasdair Kergon 2013-05-08 16:56:07 UTC
IOW the error message seems correct: you're trying to run something that you know isn't going to work in the restricted environment and it's correctly reporting an error.  If you don't want the error, don't run any lvm code!

Comment 3 Daniel Berrangé 2013-05-08 16:59:19 UTC
We're not trying to use LVM in the container. The LVM RPM has put this binary into the /lib/systemd/system-generators directory which causes it to be run unconditionally by systemd whether the admin/sandbox wants LVM or not. Binaries which are installed as generators need to respect the environment that they are run in, and not throw errors like those shown above.

Comment 4 Alasdair Kergon 2013-05-08 17:03:55 UTC
So which component controls the set up of the sandbox?  virt- something?  and why is that then including lvm?

Comment 5 Alasdair Kergon 2013-05-08 17:06:02 UTC
- you're saying lvm *cannot* possibly work in the sandbox, so why hasn't virt- whatever been configured to exclude it?

Comment 6 Alasdair Kergon 2013-05-08 17:18:18 UTC
Put this another way.

Can this problem be reproduced *without* the complication of virt- sandboxes?

IOW Is there some normal system environment where the activation generator is doing something wrong?

And where are the errors you reported coming from?
- Are they coming from the activation generator itself, or are they appearing later on during boot as a result of the *output* of the activation generator being used?
(Perhaps you could attach the relevant strace?)

Comment 7 Alasdair Kergon 2013-05-08 17:20:53 UTC
And if you want messages suppressed specially in some virt- sandbox, how would you anticipate the code should identify this is the case?  And is it just messages, or is some error state returned that also you would need returning 'success'?

Comment 8 Alasdair Kergon 2013-05-08 17:41:42 UTC
Options:

1)  Fix virt- not to put/use things in the sandbox that it knows can't possibly work and it doesn't even want;

2)  Show that the lvm2 activation generator sometimes does the wrong things on a normal system and get that fixed;

3)  Find that some special code is required for lvm2 to work with virt- sandboxes and turn this into an RFE from virt- that specifies how such sandboxes should be detected and have lvm2 add the special code just for use in these cases (without affecting normal systems).

Comment 9 Daniel Berrangé 2013-05-08 18:40:34 UTC
(In reply to comment #8)
> Options:
> 
> 1)  Fix virt- not to put/use things in the sandbox that it knows can't
> possibly work and it doesn't even want;

The LVM2 RPM can be pulled in via a dependency of some package, even if the admin is not wishing/intending to use LVM in a particular deployment. As such we cannot assume that existence of the LVM2 RPM on a system implies that LVM2 is able/going to be used. 

> 2)  Show that the lvm2 activation generator sometimes does the wrong things
> on a normal system and get that fixed;
> 
> 3)  Find that some special code is required for lvm2 to work with virt-
> sandboxes and turn this into an RFE from virt- that specifies how such
> sandboxes should be detected and have lvm2 add the special code just for use
> in these cases (without affecting normal systems).

The sandbox creation tools do take care to configure systemd unit files such that only desired features are started. The problem is that the generator programs are not systemd configuration items. They are programs for auto-generating unit files, and it is not possible to turn individual generators on/off via configuration options. Because the generator programs are not configurable items, they need to take care to operate without error in any environment in which systemd can be run, which includes containers. 

This is not a libvirt / sandbox specific problem - it can be demonstrated via systemd-nsspawn too and will affect any other Linux virtualization technology which blocks the CAP_MKNOD capability (which they all want todo, to ensure security of the container).

# yum -y --releasever=19 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal lvm2
# systemd-nspawn -bD /srv/mycontainer
Spawning namespace container on /srv/mycontainer (console is /dev/pts/9).
Init process in the container running as PID 10591.
systemd 202 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Detected virtualization 'systemd-nspawn'.

Welcome to Fedora 19 (Schrödinger’s Cat)!

  /dev/mapper/control: mknod failed: Operation not permitted
  Failure to communicate with kernel device-mapper driver.
  Check that device-mapper is available in the kernel.
Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory. See system logs and 'systemctl status display-manager.service' for details.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Reached target Encrypted Volumes.
[  OK  ] Listening on Journal Socket.
         Mounting Huge Pages File System...
         Mounting Debug File System...
         Starting Journal Service...
...snip rest of boot messages...

Comment 10 Alasdair Kergon 2013-05-09 00:53:04 UTC
We work on the principle of always providing errors except in exactly-defined situations where they are known *never* to be needed.

If lvm is to suppress them here, we need first to know exactly what produces the errors (please attach the relevant part of the strace), then we can start to try to answer point (2) (which we can't yet as we don't know what the tool is doing when they appear - maybe it's running unnecessary code even in the normal case), and then define these *precise* circumstances.

"Because the generator programs are not configurable items, they need to take care to operate without error in any environment in which systemd can be run, which includes containers."

The *intent* not to use lvm in the container needs to be passed through somehow: what information about the container is available for query?  (Someone else might have a container where they *do* want to use lvm and do want to see any error messages, surely?)

Elsewhere, for example, we handled this matter of intent by checking for a special environment variable.

Comment 11 Daniel Berrangé 2013-05-09 09:30:56 UTC
Created attachment 745591 [details]
Strace of /usr/lib/systemd/system-generators/lvm2-activation-generator

The interesting part of the trace is

stat("/dev/mapper/control", 0x7fff0f3368c0) = -1 ENOENT (No such file or directory)
umask(022)                              = 077
stat("/dev/mapper", 0x7fff0f3367d0)     = -1 ENOENT (No such file or directory)
mkdir("/dev", 0777)                     = -1 EEXIST (File exists)
mkdir("/dev/mapper", 0777)              = 0
umask(077)                              = 022
umask(0177)                             = 077
mknod("/dev/mapper/control", S_IFCHR|0600, makedev(10, 236)) = -1 EPERM (Operation not permitted)
write(2, "  ", 2)                       = 2
write(2, "/dev/mapper/control: mknod failed: Operation not permitted", 58) = 58
write(2, "\n", 1)                       = 1
write(2, "  ", 2)                       = 2
write(2, "Failure to communicate with kernel device-mapper driver.", 56) = 56
write(2, "\n", 1)                       = 1
geteuid()                               = 0
write(2, "  ", 2)                       = 2
write(2, "Check that device-mapper is available in the kernel.", 52) = 52
write(2, "\n", 1)                       = 1

which corresponds to the code in _open_control and _create_control in libdm/ioctl/libdm-iface.c

Comment 12 Daniel Berrangé 2013-05-09 09:34:04 UTC
(In reply to comment #10)
> The *intent* not to use lvm in the container needs to be passed through
> somehow: what information about the container is available for query? 

Per the Systemd container interface specification, containers do not have CAP_MKNOD permission and the container manager will pre-create any device nodes in /dev that are permitted to be used. A container can be identified by the existence of the  'container' environment variable.

  http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

So if the admin intends to use LVM they will have ensured /dev/mapper/control exists. If it doesn't exist inside a container, then LVM should not try to create it, since that will fail due to aforementioned lack of CAP_MKNOD.

Comment 13 Daniel Berrangé 2013-05-09 09:38:18 UTC
Created attachment 745593 [details]
Skip generator if inside a container

This is a proof of concept patch which causes  lvm2-activation-generator to be a no-op if /dev/mapper/control does not exist, and the program is running inside a container.

If it wanted to be super paranoid, then it'd also check whether it had CAP_MKNOD before exiting. LVM2 doesn't currently link to libcap.so though, so I didn't add such code in this proof of concept patch. Technically (per the systemd container spec) it should also be checking /proc/1/environ rather than its own getenv(), but getenv() is probably ok for systemd generators since they'll be fine inheriting the env.

Comment 14 Alasdair Kergon 2013-05-09 12:22:10 UTC
Thanks for the information.  I still suspect number (2) is the problem though "Show that the lvm2 activation generator sometimes does the wrong things on a normal system and get that fixed;" and we might not even need to progress to (3).

Comment 15 Daniel Berrangé 2013-05-09 12:38:39 UTC
(In reply to comment #14)
> Thanks for the information.  I still suspect number (2) is the problem
> though "Show that the lvm2 activation generator sometimes does the wrong
> things on a normal system and get that fixed;" and we might not even need to
> progress to (3).

I guess that depends on who/what sets up /dev/mapper/control on a bare metal system. Given that the generators run before systemd launches any services, you'd probably have to be relying on the initrd to have created /dev/mapper/control. It isn't clear to me that this is a reasonable assumption, so it looks like the mknod(/dev/mapper/control) is needed in bare metal, but not wanted in containers.

Comment 16 Lennart Poettering 2013-05-09 12:59:07 UTC
What does the generator do with /dev/mapper/control anyway?

Generators are primarily intended to convert some form of foreign configuration files into native units, and beyond that there are obnly very few other uses for it. What precisely does LVM do from the generator?

Note that generatros are run really really early at boot, where udev hasn't been run yet, hence /dev is not populated properly yet or at least access modes/yadda yadda have not been applied yet. More specifically this means that if DM can be compiled as a module (can it?) the device node won't exist at all. Only if DM has been built into the kernel it will have been created via devtmpfs. 

So in general I have the suspicion that LVM's usage of a generator is not a good idea anyway.

Comment 17 Peter Rajnoha 2013-05-09 13:06:30 UTC
This is because of the lvm2app initialization that we use to open the lvm.conf (because reading lvm.conf has also support for cascading, it's better to do that via library to do the parsing).

There's a check during lvm2app init for dm driver version (which in turns tries to open mapper/control). That check should have been done later and so it won't be triggered when doing simple things like reading the config file...

We're just inspecting this so we can postpone that driver check and do it later when really needed....

Comment 18 Lennart Poettering 2013-05-09 14:10:36 UTC
I see. I guess in that case it would be good to either move the version check somewhere out of the generator codepaths, or to make it fail silently if run in a container. Best would probably to silently skip this thing if you get ENOENT. Other options are to check for CAP_SYS_MKNOD as Dan suggested.

Comment 19 Peter Rajnoha 2013-05-09 14:22:31 UTC
I'll move the driver version check so it's done lazily on first use. That check is actually needed only if activation code is used and this one's not hit if the only thing we need is reading the config.

Comment 20 Peter Rajnoha 2013-05-13 09:57:29 UTC
An upstream patch that check the dm version lazily (and prevents premature mapper/control access if there's no lvm2 activation code run):

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=44071331138a7ca3a7d775d8a091933404ee7509

Comment 21 Fedora Update System 2013-05-14 11:31:49 UTC
lvm2-2.02.98-9.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/lvm2-2.02.98-9.fc19

Comment 22 Peter Rajnoha 2013-05-14 13:12:02 UTC
*** Bug 927331 has been marked as a duplicate of this bug. ***

Comment 23 Fedora Update System 2013-05-24 20:37:58 UTC
lvm2-2.02.98-9.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.