Bug 1770763 - libvirt + lxc = Failed to create symlink /sys/fs/cgroup/net_cls: Operation not permitted
Summary: libvirt + lxc = Failed to create symlink /sys/fs/cgroup/net_cls: Operation no...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 32
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-11 10:08 UTC by lejeczek
Modified: 2020-11-18 16:49 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-11-18 16:47:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description lejeczek 2019-11-11 10:08:22 UTC
Description of problem:

$ virt-install  --connect lxc:/ --os-variant fedora30 --memory 2096 --filesystem /var/lib/lxc/chrome1/rootfs,/ --network network=default,model=virtio --graphics=vnc,listen=0.0.0.0 --video qxl --accelerate --cpuset 2 --cpu host-model --transient --import --name chrommee 
WARNING  Unable to connect to graphical console: virt-viewer not installed. Please install the 'virt-viewer' package.
WARNING  No console to launch for the guest, defaulting to --wait -1

Starting install...
ERROR    internal error: Unable to find 'devices' cgroups controller mount


$ virsh -c lxc:// start chrome1
error: Failed to start domain chrome1
error: internal error: Unable to find 'devices' cgroups controller mount


..
Nov 11 10:05:48 ccnr20180909 libvirtd[27045]: internal error: Unable to find 'devices' cgroups controller mount
..

Version-Release number of selected component (if applicable):

lxc-3.0.4-2.fc31.x86_64
libvirt-daemon-lxc-5.6.0-4.fc31.x86_64
libvirt-libs-5.6.0-4.fc31.x86_64
5.3.9-300.fc31.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Cole Robinson 2019-11-11 13:28:02 UTC
I believe cgroupv2 doesn't have a devices cgroup, and f31 is cgroupv2 by default. Not sure if lxc works in that case, or maybe this just needs to be made non-fatal

CCing phrdina

Comment 2 Daniel Berrangé 2019-11-11 14:08:52 UTC
We give containers a private /dev instance pre-populated with devices they requested. By default they also lack mknod() perm, so that's two lines of defence. The devices cgroup is a third line of defence against having pointed the container to a pre-built image with device nodes created. We could/should mount these with the "nodev" mount option to block that

IOW, running without devices cgroup is possible/reasonable from a technical POV.

Comment 3 lejeczek 2019-11-17 11:20:35 UTC
Is there any workaround someone can suggest which would work for now?
This is rather urgent and the fact that it slipped through into F31 is a bit.. I'd say embarrassing?

many thanks, L.

Comment 4 Cole Robinson 2019-11-17 19:20:06 UTC
You can boot your host kernel with systemd.unified_cgroup_hierarchy=0 to get cgroupv1 back until libvirt is fixed

Comment 5 Cole Robinson 2019-12-10 16:07:17 UTC
Issue still exists in f31. Error is different with upstream libvirtd using upstream libvirt_lxc binary:

error: internal error: guest failed to start: Failure in libvirt_lxc startup: failed to initialize device BPF map: Operation not permitted

Comment 6 Ryutaroh Matsumoto 2020-01-03 01:46:17 UTC
LXC (without libvirt) actually does work since version 3.0.3 at latest.
The problem is the lack of understandable instruction on how to use
LXC on host Linux booted with systemd.unified_cgroup_hierarchy=1.

I have filed a bug report against the Fedora documentation for the
lack of documentation at
https://bugzilla.redhat.com/show_bug.cgi?id=1787209

Comment 7 Ryutaroh Matsumoto 2020-01-03 22:30:26 UTC
This problem is fixed in 

[root@localhost ~]# dnf list --installed | grep lxc
libvirt-daemon-driver-lxc.x86_64 5.10.0-2.fc32 @rawhide 
libvirt-daemon-lxc.x86_64        5.10.0-2.fc32 @rawhide 
lxc.x86_64                       3.2.1-1.fc32  @rawhide 
lxc-libs.x86_64                  3.2.1-1.fc32  @rawhide 
lxc-templates.x86_64             3.2.1-1.fc32  @rawhide 

and should be closed. Specifically,

[root@localhost ~]# lxc-create -n fedora31test -t download -- -d fedora -r 31 -a amd64

[root@localhost ~]# virt-install --memory 2048  --connect lxc:/ --os-variant fedora31 --filesystem /var/lib/lxc/fedora31test/rootfs,/ --network none   --transient --import --name fedora31test

worked just file.

Comment 8 Cole Robinson 2020-01-24 19:44:52 UTC
rawhide still does not work for me, I am getting the error from comment #5

  error: internal error: guest failed to start: Failure in libvirt_lxc startup: failed to initialize device BPF map: Operation not permitted

In stock F31 libvirt, there is no cgroupv2 'devices' cgroup support, so any attempt to interact with it fails immediately.
If that was the only problem, we could extend lxc_cgroup.c to skip any devices operations if the DEVICES cgroup is not available, with:

    if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES))
        return 0;

Like is done for the qemu driver in qemu_cgroup.c

However in libvirt 5.10, Pavel added some cgroupv2 'devices' implementation based on eBPF. I can't seem to convince
this to work with LXC. cgroups are set up inside in the container context via libvirt_lxc, after dropping privs
and doing some other container magic, and the eBPF map create call fails with the error above.

With this new support, virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES) will not return 0, but
actually trying to use the controller will fail for LXC. So the proposed workaround above is not enough.

Pavel, Danpb, suggestions on how to proceed?

Comment 9 Ryutaroh Matsumoto 2020-01-25 10:02:25 UTC
(In reply to Cole Robinson from comment #8)
> rawhide still does not work for me, I am getting the error from comment #5

Today I again freshly installed Fedora Rawhide Server on qemu x86-64 and libvirt + lxc worked fine for me. Anyone can see the full transcript and qcow2 disk image at

https://drive.google.com/drive/folders/1BRs4rgFs3Kh5q9k2UgURfD788NxqZJIX

The root password of qemu disk image is root.

Comment 10 Cole Robinson 2020-01-28 20:22:20 UTC
(In reply to Ryutaroh Matsumoto from comment #9)
> (In reply to Cole Robinson from comment #8)
> > rawhide still does not work for me, I am getting the error from comment #5
> 
> Today I again freshly installed Fedora Rawhide Server on qemu x86-64 and
> libvirt + lxc worked fine for me. Anyone can see the full transcript and
> qcow2 disk image at
> 
> https://drive.google.com/drive/folders/1BRs4rgFs3Kh5q9k2UgURfD788NxqZJIX
> 
> The root password of qemu disk image is root.

Thanks for retesting. Indeed when I install a fresh rawhide VM and try your steps, LXC containers work!
But then I rebooted the VM and lxc containers fail like comment #5.

Turns out the culprit is firewalld. After 'systemctl stop firewalld', LXC containers will start
again. I don't know enough about bpf and firewalld/nftables usage of it to know if this
is expected or it is a bug somewhere.

Comment 11 Ryutaroh Matsumoto 2020-01-31 20:59:32 UTC
(In reply to Cole Robinson from comment #10)
> Thanks for retesting. Indeed when I install a fresh rawhide VM and try your
> steps, LXC containers work!
> But then I rebooted the VM and lxc containers fail like comment #5.

Umm... (after dnf upgrade on Fedora Rawhide) I was able to run libvirt-lxc under firewalld running as below...

[root@localhost ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
     Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2020-01-31 15:37:18 EST; 11min ago
       Docs: man:firewalld(1)
   Main PID: 576 (firewalld)
      Tasks: 2 (limit: 2303)
     Memory: 36.3M
        CPU: 8.202s
     CGroup: /system.slice/firewalld.service
             └─576 /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid

Jan 31 15:37:00 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewall daemon...
Jan 31 15:37:18 localhost.localdomain systemd[1]: Started firewalld - dynamic firewall daemon.
[root@localhost ~]# firewall-cmd --list-all
FedoraServer (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp1s0
  sources: 
  services: cockpit dhcpv6-client ssh
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 
	
[root@localhost ~]# virt-install --memory 2048 --connect lxc:/ --os-variant fedora31 --filesystem /var/lib/lxc/fedora31/rootfs,/ --network none --transient --import --name fedora31test

Starting install...
Connected to domain fedora31test
Escape character is ^]
systemd v243.5-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization lxc-libvirt.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Set hostname to <fedora31>.
Couldn't move remaining userspace processes, ignoring: Input/output error
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on Network Service Netlink Socket.
         Mounting POSIX Message Queue File System...
         Mounting Kernel Debug File System...
         Mounting Temporary Directory (/tmp)...
Attaching device control BPF program to cgroup /sys/fs/cgroup/machine.slice/machine-lxc\x2d3868\x2dfedora31test.scope/system.slice/systemd-journald.service failed: Operation not permitted
Unit systemd-journald.service configures device ACL, but the local system doesn't seem to support the BPF-based device controller.
Proceeding WITHOUT applying ACL (all devices will be accessible)!
(This warning is only shown for the first loaded unit using device ACL.)

Comment 12 Cole Robinson 2020-01-31 21:23:28 UTC
Yeah doesn't look like firewalld is related, my bad. But I can still hit the error, though it seems related to when another container is already running:

[root@localhost ~]# virt-install --connect lxc:/// --init /bin/bash --memory 2048 --network none --transient --noautoconsole
Using default --name container1

Starting install...
Domain creation completed.
[root@localhost ~]# virt-install --connect lxc:/// --init /bin/bash --memory 2048 --network none --transient --noautoconsole
Using default --name container2

Starting install...
ERROR    internal error: guest failed to start: Failure in libvirt_lxc startup: failed to initialize device BPF map: Operation not permitted

Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect lxc:/// start container2
otherwise, please restart your installation.
[root@localhost ~]# sudo virsh destroy container1
Domain container1 destroyed

[root@localhost ~]# virt-install --connect lxc:/// --init /bin/bash --memory 2048 --network none --transient --noautoconsole
Using default --name container1

Starting install...
Domain creation completed.

Comment 13 Ryutaroh Matsumoto 2020-01-31 23:04:51 UTC
(In reply to Cole Robinson from comment #12)
> Yeah doesn't look like firewalld is related, my bad. But I can still hit the
> error, though it seems related to when another container is already running:

Thanks. Yes, I agree and I was also able to reproduce the problem.
In addition, the same problem seems to exist with lxc-start -F...
So it can be a problem in LXC itself, independently of libvirt-lxc and virt-install.

Comment 14 Ryutaroh Matsumoto 2020-02-01 00:56:45 UTC
Sorry I was probably wrong.
Instead of lxc-copy, doing

lxc-create -n fedora31-1 -t download 
lxc-create -n fedora31-2 -t download

allowed me to run two Fedora31 containers simultaneously by lxc-start.
On the other hand, I observed the following. There seems to exist some
problem in libvirt-lxc....

[root@localhost ~]# lxc-start -n fedora31-1
[root@localhost ~]# lxc-start -n fedora31-2
[root@localhost ~]# lxc-info -n fedora31-1
Name:           fedora31-1
State:          RUNNING
PID:            1073
[root@localhost ~]# lxc-info -n fedora31-2
Name:           fedora31-2
State:          RUNNING
PID:            1106
[root@localhost ~]# lxc-stop fedora31-2
[root@localhost ~]# virt-install --memory 2048 --connect lxc:/ --os-variant fedora31 --filesystem /var/lib/lxc/fedora31-2/rootfs,/ --network none --transient --import --name fedora31-2

Starting install...
ERROR    内部エラー: ゲストの開始に失敗しました: Failure in libvirt_lxc startup: failed to initialize device BPF map: Operation not permitted

Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect lxc:/ start fedora31-2
otherwise, please restart your installation.

Comment 15 Pavel Hrdina 2020-02-26 15:12:43 UTC
Seems to be related to BZ 1807090 and the solution should be the same, I've just posted a upstream patch to fix this.

https://www.redhat.com/archives/libvir-list/2020-February/msg01086.html

Comment 16 Michal Privoznik 2020-02-27 12:01:28 UTC
Pushed upstream:

b379fee117 daemon: set default memlock limit for systemd service

v6.1.0-rc1-4-gb379fee117

Comment 17 Ben Cotton 2020-11-03 15:47:31 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Alex Villacís Lasso 2020-11-04 16:21:19 UTC
I am still unable to run LXC containers under Fedora 32 with virt-manager and kernel-5.8.17-200.fc32.x86_64 unless I add systemd.unified_cgroup_hierarchy=0 to the kernel parameters.

Comment 19 Michal Privoznik 2020-11-05 15:33:30 UTC
Can you please try Fedora 33? There were some fixes around that area.

Comment 20 Alex Villacís Lasso 2020-11-05 16:09:16 UTC
I would like to update to Fedora 33 myself, but I am still waiting on an issue with VirtualBox that prevents me from upgrading: https://www.virtualbox.org/ticket/19989

Meanwhile, I thought Fedora 32 was still supported, and not yet EOL, so a solution for Fedora 32 should exist as well.

Comment 21 Cole Robinson 2020-11-08 20:44:13 UTC
Alex, you can grab newer libvirt+qemu from the virt-preview repo: https://fedoraproject.org/wiki/Virtualization_Preview_Repository

This is similar to getting f33 virt on top of f32. It would be helpful to report if that fixes your issues or not

f32 is still supported, but the amount of work in this area is difficult to backport, and we don't update libvirt version in stable releases, so it may not be fixed in f32

Comment 22 Alex Villacís Lasso 2020-11-09 16:17:41 UTC
(In reply to Cole Robinson from comment #21)
> Alex, you can grab newer libvirt+qemu from the virt-preview repo:
> https://fedoraproject.org/wiki/Virtualization_Preview_Repository
> 
> This is similar to getting f33 virt on top of f32. It would be helpful to
> report if that fixes your issues or not
> 
> f32 is still supported, but the amount of work in this area is difficult to
> backport, and we don't update libvirt version in stable releases, so it may
> not be fixed in f32

I have just updated to the packages provided by this repository and it does NOT fix the bug in any way.

When opening the LXC console window, this is the only text that appears in the window, in red font:

Failed to create symlink /sys/fs/cgroup/net_cls: Operation not permitted

Comment 23 Pavel Hrdina 2020-11-18 16:47:02 UTC
(In reply to Alex Villacís Lasso from comment #22)
> (In reply to Cole Robinson from comment #21)
> > Alex, you can grab newer libvirt+qemu from the virt-preview repo:
> > https://fedoraproject.org/wiki/Virtualization_Preview_Repository
> > 
> > This is similar to getting f33 virt on top of f32. It would be helpful to
> > report if that fixes your issues or not
> > 
> > f32 is still supported, but the amount of work in this area is difficult to
> > backport, and we don't update libvirt version in stable releases, so it may
> > not be fixed in f32
> 
> I have just updated to the packages provided by this repository and it does
> NOT fix the bug in any way.
> 
> When opening the LXC console window, this is the only text that appears in
> the window, in red font:
> 
> Failed to create symlink /sys/fs/cgroup/net_cls: Operation not permitted

So my guess is that your host OS has cgroups v2 enabled and the OS inside the
container is old and doesn't know anything about cgroups v2. For example trying
to start centos 7 container on fedora 32 with cgroups v2 enabled.

In this case there is nothing that libvirt can do to make this work because
the OS inside the container doesn't know there is anything else then cgroups v1
and blindly assumes it will work. Your only options are:

  - switching back to cgroups v1 in your host OS,

  - stop using old OSes inside your containers (probably not possible)

  - configure the OS inside the container to not join cgroup controllers together,
    for example on systemd based OS you can edit "/etc/systemd/system.conf" and
    make sure that the option "JoinControllers=" is empty

Because this is a known limitation regarding cgroups and controllers I'm closing
this BZ as CANTFIX.

Pavel

Comment 24 Pavel Hrdina 2020-11-18 16:49:19 UTC
I'll update the subject to the issue mentioned in comment 22 as users will probably
hit this one on a regular basis with old OSes.


Note You need to log in before you can comment on or make changes to this bug.