Bug 1946982

Summary: unknown capability CAP_PERFMON
Product: Red Hat Enterprise Linux 8 Reporter: lejeczek <peljasz>
Component: libcapAssignee: Zoltan Fridrich <zfridric>
Status: CLOSED ERRATA QA Contact: Martin Zelený <mzeleny>
Severity: urgent Docs Contact:
Priority: medium    
Version: CentOS StreamCC: bstinson, carl, chref, daniel.vanderster, dapospis, djuarezg, esa, jlayton, johfulto, joseph.tingiris, jwboyer, mail, max.reininghaus, michele, morgan, mzeleny, pasteur, rkitover, rsroka
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libcap-2.26-5.el8 Doc Type: Bug Fix
Doc Text:
Cause: Header file capability.h does not contain definition of capability CAP_PERFMON Consequence: Tools that use CAP_PERFMON capability might not work and print out an error that this capability is unknown Fix: Add CAP_PERFMON definition into capability.h Result: Tools that use CAP_PERFMON now have a definition of this capability
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 20:06:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Adds CAP_PERFMON capability none

Description lejeczek 2021-04-07 12:36:37 UTC
Description of problem:

libcap?
I see the problem while run "rootless"

many thanks, L.

Version-Release number of selected component (if applicable):

podman-3.1.0-0.13.module_el8.5.0+733+9bb5dffa.x86_64
4.18.0-294.el8.x86_64 

How reproducible:

> $ podman network create
/home/podmania/.config/cni/net.d/cni-podman0.conflist

then created a pod:

-> $ podman pod create --network cni-podman0 --hostname some ---name some

then tried to create a container:

-> $ podman run -it --pod=some --name some-nettols docker.io/pmietlicki/nettools
ERRO[0000] error starting some container dependencies
ERRO[0000] "container_linux.go:370: starting container process caused: unknown capability \"CAP_PERFMON\": OCI runtime error"
Error: error starting some containers: internal libpod error


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Giuseppe Scrivano 2021-04-07 12:45:19 UTC
we need an updated libcap for the CAP_PERFMON definition

Comment 2 Łukasz Posadowski 2021-04-19 10:53:52 UTC
Hi. 

> we need an updated libcap for the CAP_PERFMON definition

THANK YOU. :) I have the same result on Podman 3.1.x branch on CentOS Stream. I mentioned that on a CentOS mailing list:
https://lists.centos.org/pipermail/centos/2021-April/353847.html

I rebuilded libcap From Fedora 33 locally: https://koji.fedoraproject.org/koji/buildinfo?buildID=1709471 . Pods created with older libcap are still bugged, but pods created after libcap upgrade are working. I didn't notice (yet) any downsides of using Fedora packages. System is booting fine and all services are running fine.

libcap has very few build-deps, so for everyone stuck with non-working containers, give it a try. 

cheers

Comment 3 John Fulton 2021-04-21 18:17:45 UTC
Affecting OpenStack TripleO too https://bugs.launchpad.net/tripleo/+bug/1922537 

We're working around it with:

 sudo dnf module enable -y container-tools:3.0

Comment 5 Zoltan Fridrich 2021-04-23 10:18:56 UTC
Created attachment 1774743 [details]
Adds CAP_PERFMON capability

Comment 6 Joseph Tingiris 2021-04-25 21:57:16 UTC
This is affecting cephadm for octopus, too

# systemctl status "ceph-$(cephadm shell ceph fsid)@<service name>.service";
Inferring fsid fc4b10a0-a60f-11eb-8c0d-18c04d074e4f
Inferring config /var/lib/ceph/fc4b10a0-a60f-11eb-8c0d-18c04d074e4f/mon.atl-mit-cnc01/config
Using recent ceph image docker.io/ceph/ceph@sha256:7bda5ef5bf4c06e8b720afefe24b22dd0fe2fdf7f3c34da265dc9238578563ff
Error: OCI runtime error: container_linux.go:370: starting container process caused: unknown capability "CAP_PERFMON"
Invalid unit name "ceph-@<service name>.service" was escaped as "ceph-@\x3cservice\x20name\x3e.service" (maybe you should use systemd-escape?)
Unit ceph-@\x3cservice\x20name\x3e.service could not be found.

Comment 7 Zoltan Fridrich 2021-04-26 12:00:47 UTC
Hello,

I was not able to reproduce this issue with the provided steps, however, I think the patch I have attached resolves the issue. I have built libcap with the patch for CAP_PERFMON and here is the link where you can find the rpm: https://copr-be.cloud.fedoraproject.org/results/zfridric/libcap-CAP_PERFMON/centos-stream-8-x86_64/02148511-libcap/

Could you please try installing this rpm and tell me if this fixed the issue?

Comment 8 Łukasz Posadowski 2021-04-26 12:28:45 UTC
Hello Zoltan.

I can confirm, it works. Thank You.

Downgrade of my rebuilded Fedora package do not went well. It seems that I really need at least one libcap library installed. :) But that's my problem. I figured that out.

Comment 10 Jeff Layton 2021-04-26 13:20:44 UTC
The new package did not work for me. I'm trying to use cephadm to install a new cluster, and I get this in the log. The host has your test package installed:

[jlayton@cephadm1 ~]$ sudo ./cephadm shell
Using recent ceph image docker.io/ceph/ceph@sha256:15b15fb7a708970f1b734285ac08aef45dcd76e86866af37412d041e00853743
Error: OCI runtime error: container_linux.go:370: starting container process caused: unknown capability "CAP_PERFMON"
[jlayton@cephadm1 ~]$ rpm -q libcap
libcap-2.26-5.el8.x86_64

Comment 11 Jeff Layton 2021-04-26 13:48:30 UTC
FWIW, this is the podman command being run by the cephadm script:

7360  execve("/bin/podman", ["/bin/podman", "run", "--rm", "--ipc=host", "--net=host", "--privileged", "--group-add=disk", "--init", "-i", "-t", "-e", "LANG=C", "-e", "PS1=[ceph: \\u@\\h \\W]\\$ ", "-e", "CONTAINER_IMAGE=docker.io/ceph/ceph@sha256:15b15fb7a708970f1b734285ac08aef45dcd76e86866af37412d041e00853743", "-e", "NODE_NAME=cephadm1", "-e", "CEPH_USE_RANDOM_NONCE=1", "-v", "/dev:/dev", "-v", "/run/udev:/run/udev", "-v", "/sys:/sys", "-v", "/var/lib/ceph/None/selinux:/sys/fs/selinux:ro", "-v", "/run/lvm:/run/lvm", "-v", "/run/lock/lvm:/run/lock/lvm", "--entrypoint", "bash", "docker.io/ceph/ceph@sha256:15b15fb7a708970f1b734285ac08aef45dcd76e86866af37412d041e00853743"], ["LS_COLORS=rs=0:di=38;5;33:ln=38;5;51:mh=00:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=01;05;37;41:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5"..., "LANG=en_US.UTF-8", "HOSTNAME=cephadm1", "MAIL=/var/spool/mail/jlayton", "TERM=xterm-256color", "HISTSIZE=1000", "PATH=/sbin:/bin:/usr/sbin:/usr/bin", "LOGNAME=root", "USER=root", "HOME=/root", "SHELL=/bin/bash", "SUDO_COMMAND=/bin/strace -f -s 256 -v -o /tmp/cephadm.strace ./cephadm shell", "SUDO_USER=jlayton", "SUDO_UID=4447", "SUDO_GID=4447"] <unfinished ...>

Comment 12 Jeff Layton 2021-04-26 14:11:07 UTC
Narrowing it down further, this fails:

    $ sudo podman run --rm  --privileged  -t -i --entrypoint bash docker.io/ceph/ceph@sha256:15b15fb7a708970f1b734285ac08aef45dcd76e86866af37412d041e00853743

...but if I remove the --privileged flag, then it works.

Comment 13 Joseph Tingiris 2021-04-26 15:08:01 UTC
The same workaround is valid for cephadm, too.

dnf -y remove podman
dnf -y module reset container-tools
dnf -y module enable container-tools:3.0
dnf -y install podman

Comment 14 Jeff Layton 2021-04-26 16:14:47 UTC
Confirmed. Downgrading podman to 3.0.1 seems to fix cephadm for me too. The question I guess is why Zoltan's fix doesn't fix privileged containers.

Comment 15 Zoltan Fridrich 2021-06-10 07:25:27 UTC
I wonder if this might be caused by kernel being too old. Because the patch adds the CAP_PERFMON definition into the libcap. If there is still an error that capability CAP_PERFMON is unknown then it's not an issue within libcap but it's probably caused by the kernel.

Comment 16 Rafael Kitover 2021-06-13 05:35:54 UTC
For anyone hitting this on a web search, I'd like to add my experience.

For me the error was also triggered only with --privileged.

It turned out that I had an older version of runc, 1.0.0_rc92, and upgrading to 1.0.0_rc95 fixed the problem.

HTH

Comment 21 Andrew G. Morgan 2021-09-09 18:12:29 UTC
I came across this in the mobi-engine tar.gz for the RPM sources. Could this be the reason these capabilities don't actually appear in a privileged container?

./oci/caps/utils.go:

        if last > capability.CAP_AUDIT_READ {
                // Prevents docker from setting CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE
                // capabilities on privileged (or CAP_ALL) containers on Kernel 5.8 and up.
                // While these kernels support these capabilities, the current release of
                // runc ships with an older version of /gocapability/capability, and does
                // not know about them, causing an error to be produced.
                //
                // FIXME remove once https://github.com/opencontainers/runc/commit/6dfbe9b80707b1ca188255e8def15263348e0f9a
                //       is included in a runc release and once we stop supporting containerd 1.3.x
                //       (which ships with runc v1.0.0-rc92)
                last = capability.CAP_AUDIT_READ
        }

Comment 23 errata-xmlrpc 2021-11-09 20:06:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libcap bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4515