Bug 1985499

Summary: podman: Cannot run Fedora 35/RHEL 9 Beta images due to clone3 incompatibility
Product: Red Hat Enterprise Linux 8 Reporter: Florian Weimer <fweimer>
Component: podmanAssignee: Jindrich Novy <jnovy>
Status: CLOSED ERRATA QA Contact: Alex Jia <ajia>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.5CC: afield, averi, bbaude, bdobreli, dwalsh, jligon, jnovy, jpretori, lsm5, mheon, mhofmann, pasik, praiskup, pthomas, redhat-bugzilla, tsweeney, umohnani, walters, ypu
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: podman-3.3.0-0.16.el8 or newer, skopeo-1.3.1-7.el8 or newer Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1985525 1986339 (view as bug list) Environment:
Last Closed: 2021-11-09 17:40:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Florian Weimer 2021-07-23 18:11:28 UTC
Fedora 35 and RHEL 9 Beta will first attempt to use the clone3 system call for thread creation.

$ podman run fedora:rawhide rpm -q glibc
glibc-2.33.9000-50.fc35.x86_64
$ podman run fedora:rawhide python3 -c 'import threading; threading.Thread(None, lambda: 0).start()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python3.10/threading.py", line 928, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

strace from outside the container shows the problematic EPERM error:

2667529 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fd5c687f910, parent_tid=0x7fd5c687f910, exit_signal=0, stack=0x7fd5c607f000, stack_size=0x7fff00, tls=0x7fd5c687f640}, 88) = -1 EPERM (Operation not permitted)

Comment 1 Florian Weimer 2021-07-23 18:14:08 UTC
The package versions that I forgot to mention (from the RHEL-8.5.0-20210721.n.0 compose):

podman-3.1.0-0.8.module+el8.5.0+10387+8d85dbaf.x86_64
libseccomp-2.5.1-1.el8.x86_64
containers-common-1.2.2-4.module+el8.5.0+10387+8d85dbaf.x86_64

Comment 2 Daniel Walsh 2021-07-24 10:36:43 UTC
Could you examine the audit.log to see which syscalls seccomp is blocking.

ausearch -m seccomp -ts recent

It looks like clone3 should be allowed in the current seccomp.json

grep clone3 /usr/share/containers/seccomp.json
				"clone3",

 rpm -qf /usr/share/containers/seccomp.json
containers-common-1-20.fc34.noarch

It could also be selinux, 

ausearch -m avc -ts recent

Comment 3 Florian Weimer 2021-07-24 11:29:45 UTC
(In reply to Daniel Walsh from comment #2)
> Could you examine the audit.log to see which syscalls seccomp is blocking.
> 
> ausearch -m seccomp -ts recent
> 
> It looks like clone3 should be allowed in the current seccomp.json
> 
> grep clone3 /usr/share/containers/seccomp.json
> 				"clone3",
> 
>  rpm -qf /usr/share/containers/seccomp.json
> containers-common-1-20.fc34.noarch

Fedora 34 and Fedora 35 work.  Fedora 33 and RHEL 8.5 (and presumably earlier; but I haven't checked) do not.

RHEL 8.5 currently has this in the buildroot:

$ rpm -qf /usr/share/containers/seccomp.json
containers-common-0.1.31-9.git0144aa8.el8.x86_64
$ grep -c clone3 /usr/share/containers/seccomp.json
0

This is a version from 2018, which isn't good at all.

A RHEL 8.5 test system from Beaker has this after “dnf install podman”:

$ rpm -qf /usr/share/containers/seccomp.json
containers-common-1.2.2-4.module+el8.5.0+10387+8d85dbaf.x86_64
$ grep -c clone3 /usr/share/containers/seccomp.json
0

This is from the container-tools:rhel8 module, as far as I can see.

With this containers-common version, I see no errors:

# ausearch -m seccomp -ts recent
<no matches>
# ausearch -m avc -ts recent
<no matches>

But strace reveals the EPERM error in the description. Furthermore, launching the container with 

# podman run --security-opt seccomp=unconfined -i -t registry.fedoraproject.org/fedora:rawhide

works, so this really does point towards a seccomp issue.


The container-tools:3.0 module has a higher containers-common version, but still does not list clone3:

$ rpm -qf /usr/share/containers/seccomp.json
containers-common-1.2.2-7.module+el8.4.0+11310+8c67a752.x86_64
$ grep -c clone3 /usr/share/containers/seccomp.json
0

Comment 4 Daniel Walsh 2021-07-25 10:32:28 UTC
Jindrich can you make sure the latest container-common is available for RHEL8.5 and RHEL9?

Comment 5 Florian Weimer 2021-07-25 22:07:16 UTC
The current fedora:rawhide image has been updated with a glibc that uses clone3, so this should be much easier to test. I have updated the description.

Comment 6 Jindrich Novy 2021-07-26 08:21:30 UTC
Dan, do you mean to sync seccomp.json to the Fedora state for RHEL9 and RHEL8.5?

Florian, the package versions in comment #1 are ancient (from 18the Mar). The reason these are 'latest' are the failing gating tests from that one on. More recent versions - closer to the 8.5.0 state can be found here: http://shell.bos.redhat.com/~santiago/mbhistory/module-container-tools.html

Comment 10 Daniel Walsh 2021-07-28 13:22:08 UTC
Yes this should be fixed.  You will definitely be able to run RHEL9 container images on RHEL8 for a while.  But NOTE: This is not guaranteed forever.  Their could be features in RHEL 9 kernels that RHEL8 does not support.  If applications in RHEL9 Container images need these new features, then they will NOT be able to run on a RHEL8 machine.

Containers images should always run on newer kernels, but no guarantee they will run on older kernels.

Comment 13 Alex Jia 2021-08-03 11:24:35 UTC
This bug has been verified on podman-3.3.0-0.17.module+el8.5.0+12014+438a5746.

[root@kvm-07-guest24 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.5 Beta (Ootpa)

[root@kvm-07-guest24 ~]# rpm -q podman runc skopeo kernel
podman-3.3.0-0.17.module+el8.5.0+12014+438a5746.x86_64
runc-1.0.1-3.module+el8.5.0+12014+438a5746.x86_64
skopeo-1.3.1-7.module+el8.5.0+12014+438a5746.x86_64
kernel-4.18.0-325.el8.x86_64

[root@kvm-07-guest24 ~]# podman run fedora:rawhide rpm -q glibc
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:rawhide...
Getting image source signatures
Copying blob f47f10b64957 done  
Copying config 47c4c56f32 done  
Writing manifest to image destination
Storing signatures
glibc-2.33.9000-56.fc35.x86_64

[root@kvm-07-guest24 ~]# grep -c clone3 /usr/share/containers/seccomp.json
0
[root@kvm-07-guest24 ~]# ausearch -m seccomp -ts recent
<no matches>
[root@kvm-07-guest24 ~]# ausearch -m avc -ts recent
<no matches>
[root@kvm-07-guest24 ~]# podman run --security-opt seccomp=unconfined -i -t registry.fedoraproject.org/fedora:rawhide
[root@ac998bdcae31 /]# rpm -q glibc
glibc-2.33.9000-56.fc35.x86_64
[root@ac998bdcae31 /]# exit
exit
[root@kvm-07-guest24 ~]# echo $?
0

Comment 15 errata-xmlrpc 2021-11-09 17:40:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4154