Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1914884

Summary: Fix gating tests of container-tools for 8.4.0
Product: Red Hat Enterprise Linux 8 Reporter: Jindrich Novy <jnovy>
Component: container-tools-rhel8-moduleAssignee: Jindrich Novy <jnovy>
Status: CLOSED CURRENTRELEASE QA Contact: Joy Pu <ypu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.4CC: cevich, dwalsh, gscrivan, jligon, jnovy, lmiksik, lsm5, mheon, mitr, mmarusak, nalin, pehunt, santiago, tsweeney, vrothber, ypu
Target Milestone: rcKeywords: Triaged
Target Release: 8.0Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1937863 (view as bug list) Environment:
Last Closed: 2021-03-18 18:26:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1937863    
Attachments:
Description Flags
Conmon Core Dump none

Description Jindrich Novy 2021-01-11 12:14:19 UTC
Description of problem:
All gating tests in 8.4.0 are currently failing.

Version-Release number of selected component (if applicable):
container-tools-8.4.0

How reproducible:
always

Steps to Reproduce:
1. make module build of container-tools-8.4.0


Actual results:
http://shell.bos.redhat.com/~santiago/mbhistory/09358.html

CI dashboard here:
https://dashboard.osci.redhat.com/#/artifact/redhat-module/aid/9358

Expected results:
All gating tests pass

Additional info:
We need to address these issues before 18th Jan when dev/feature freeze takes place.

Comment 6 Ed Santiago 2021-01-12 20:51:39 UTC
/tmp/bats.79244.src: line 45: socat: command not found

Solution: add 'Requires: socat' to podman-tests specfile (this has already been done on some branches, it needs to be done on all)

That's just one of the errors, I'll continue looking

Comment 9 Ed Santiago 2021-01-12 21:30:40 UTC
skopeo: looks like it needs the same openssl tweak.

Comment 10 Ed Santiago 2021-01-12 21:34:31 UTC
Cockpit: sorry, no clue.

I think this is as far as I can go. Clearing needinfo.

Comment 12 Ed Santiago 2021-01-14 01:55:14 UTC
podman: no idea

buildah: 13: looks like it needs the SANS/CN openssl fix. Maybe somewhere else. I can't look right not.

buildah: 209 (capabilities), 244 (ABCD): see these two upstream commits:
   https://github.com/containers/buildah/pull/2631/commits/c74084a21a602d4f03977b6c7043d315bfa3f67f
   https://github.com/containers/buildah/pull/2631/commits/abf3f0d554145f343932bf277145e2e5225c6914

buildah (all the others): needs @gscrivan attention; I believe he's working on exactly this problem in this podman PR:
   https://github.com/containers/podman/pull/8949

Comment 13 Ed Santiago 2021-01-14 02:05:21 UTC
Reminder: test logs are completely unreadable without my Greasemonkey extension:

    https://github.com/edsantiago/greasemonkey/tree/master/podman-ginkgo-highlight

Comment 14 Ed Santiago 2021-01-14 18:57:20 UTC
podman ping: please apply https://github.com/containers/podman/pull/8975/files

Comment 15 Matej Marušák 2021-01-15 09:58:36 UTC
> Tom, Matej, is there anything else we can do about cockpit-podman? http://artifacts.osci.redhat.com/baseos-ci/redhat-module/94/25/9425/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2226/work-tests.ymlGI3Dyy/tests-sRkftb/FAIL-str_cockpit-podman.log

Ahh, there is a massive thinko!
https://src.osci.redhat.com/rpms/cockpit-podman/blob/stream-container-tools-rhel8-rhel-8.4.0/f/tests/browser.sh#_53
This line deletes all __root__ containers and not all admin ones! It was supposed to delete all admin containers, now it does not make any sense and of course it fails to save any container.
I think replacing it with `sudo -i -u admin podman rmi --all` should do the trick.

Jindrich, can you please fix it or should I adjust it?

Comment 16 Jindrich Novy 2021-01-15 10:21:34 UTC
Thanks Matej, makes sense! Just applied your fix and doing a fresh build - fingers crossed it fixes the gating tests for cockpit-podman. I will let you know if not.

Comment 18 Matej Marušák 2021-01-16 06:38:03 UTC
It fails, because we expect state of container to be `exited` but now it is `stopped`.
Has there been any change lately regarding these states?
I wonder if our tests should just not care about this distinction, as done in podman in https://github.com/containers/podman/commit/88cd6488166bc799645efac8f2df389a352b2653 (and explained in https://github.com/containers/podman/issues/5336)

Comment 19 Matej Marušák 2021-01-16 18:19:56 UTC
In upstream we now accept both states: https://github.com/cockpit-project/cockpit-podman/pull/653

Comment 20 Jindrich Novy 2021-01-20 11:18:45 UTC
Thanks Matej, that seems to do the work: 
http://shell.bos.redhat.com/~santiago/mbhistory/09515.html - but not sure whether the cockpit-podman gating test was running at all?

Ed, Tom, for skopeo, there seems to be still docker.io references in tests we might want to get rid of:

jnovy@localhost .../skopeo-bdb117ded6d37f0a6b0a2e28ba3213c20264ab43/systemtest (test_docker_remove *)$ grep -r docker\.io
helpers.bash:REGISTRY_FQIN=${SKOPEO_TEST_REGISTRY_FQIN:-docker.io/library/registry:2}
060-delete.bats:    local remote_image=docker://docker.io/library/busybox:latest
050-signing.bats:    run_skopeo copy docker://docker.io/library/busybox:latest \
040-local-registry-auth.bats:               docker://docker.io/library/busybox:latest \
040-local-registry-auth.bats:               docker://docker.io/library/busybox:latest \
030-local-registry-tls.bats:               docker://docker.io/library/busybox:latest \
020-copy.bats:    local remote_image=docker://docker.io/library/busybox:latest
020-copy.bats:    local remote_image=docker://docker.io/library/busybox:latest
020-copy.bats:    local remote_image=docker://docker.io/library/busybox:latest
020-copy.bats:    local remote_image=docker://docker.io/library/busybox:latest
020-copy.bats:    local alpine=docker.io/library/alpine:latest
010-inspect.bats:        remote_image=docker://docker.io/$arch/golang

At least three tests are failing because of this:  http://artifacts.osci.redhat.com/baseos-ci/redhat-module/95/15/9515/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2265/work-tests.ymlbLF0sj/tests-mnKQ78/test.skopeo-root.bats.log

Dan, Matt, for podman-3.0.0rc1 root there is a mount error:
not ok 54 podman mount - basic test
# (from function `die' in file ./helpers.bash, line 363,
#  in test file ./060-mount.bats, line 33)
#   `die "Mounted file exists even after umount: $mount_path/$f_path"' failed
# # /usr/bin/podman rm --all --force
# # /usr/bin/podman ps --all --external --format {{.ID}} {{.Names}}
# # /usr/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# quay.io/libpod/testimage:20200929 766ff5a3a7e4
# # /usr/bin/podman run --name mount_test_qrlnV quay.io/libpod/testimage:20200929 sh -c echo OrY1H3NkdByHSXrRN1Zu9lEAVV3piC > /tmp/tmpfile_X1XJZP1O
# # /usr/bin/podman mount mount_test_qrlnV
# /var/lib/containers/storage/overlay/dcadffbd6bf6f53b893dcc42ca89f28deed37dd99e268b85b928cbe1469b51d3/merged
# # /usr/bin/podman mount --notruncate
# 72fcbfa10fdb646d81099ec647f1160546a0cf887422fd6a4b65a99c46d11db0  /var/lib/containers/storage/overlay/dcadffbd6bf6f53b893dcc42ca89f28deed37dd99e268b85b928cbe1469b51d3/merged
# # /usr/bin/podman umount mount_test_qrlnV
# mount_test_qrlnV
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: Mounted file exists even after umount: /var/lib/containers/storage/overlay/dcadffbd6bf6f53b893dcc42ca89f28deed37dd99e268b85b928cbe1469b51d3/merged//tmp/tmpfile_X1XJZP1O
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
# # /usr/bin/podman pod rm --all --force
# # /usr/bin/podman rm --all --force
# 72fcbfa10fdb646d81099ec647f1160546a0cf887422fd6a4b65a99c46d11db0

Ed, Giuseppe, for podman-3.0.0rc1 rootless there seems one test needs to be fixed as /sys/* got added?

not ok 98 podman diff
# (from function `is' in file ./helpers.bash, line 406,
#  in test file ./140-diff.bats, line 29)
#   `is "$result" "${expect[$field]}" "$field"' failed
# $ /usr/bin/podman rm --all --force
# $ /usr/bin/podman ps --all --external --format {{.ID}} {{.Names}}
# $ /usr/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# quay.io/libpod/testimage:20200929 766ff5a3a7e4
# $ /usr/bin/podman run --name iutGMFcPkR quay.io/libpod/testimage:20200929 sh -c touch /Z9Uphvxwil;rm /etc/services
# $ /usr/bin/podman diff --format json -l
# {"changed":["/etc"],"added":["/sys/fs","/sys/fs/cgroup","/Z9Uphvxwil"],"deleted":["/etc/services"]}
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: added
# #| expected: '/Z9Uphvxwil'
# #|   actual: '/sys/fs'
# #|         > '/sys/fs/cgroup'
# #|         > '/Z9Uphvxwil'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
# $ /usr/bin/podman pod rm --all --force
# $ /usr/bin/podman rm --all --force
# 2cc19d9cb8813906f9fa9fb1955232ca17f3a0a7a8252862125b3318f93dc116

Nalin, Tom, for buildah, does the cert check need to be amended any further?

not ok 13 authenticate: cert and credentials
# (from function `expect_output' in file ./helpers.bash, line 269,
#  in test file ./authenticate.bats, line 37)
#   `expect_output --substring " x509: certificate signed by unknown authority" \' failed with status 125
# /usr/share/buildah/test/system /usr/share/buildah/test/system
# # [checking for: alpine]
# # [podman pull alpine]
# Resolved short name "alpine" to a recorded short-name alias (origin: /etc/containers/registries.conf.d/shortnames.conf)
# Trying to pull docker.io/library/alpine:latest...
# Getting image source signatures
# Copying blob sha256:9d16cba9fb961d1aafec9542f2bf7cb64acfc55245f9e4eb5abecd4cdc38d749
# Copying config sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4
# Writing manifest to image destination
# Storing signatures
# 961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4
# # [podman save --format oci-archive alpine >/tmp/buildah-image-cache.180179/alpine-.tar ]
# $ /usr/bin/buildah push --signature-policy /usr/share/buildah/test/system/./policy.json --tls-verify=false --creds testuser:testpassword alpine localhost:5000/my-alpine
# Getting image source signatures
# Copying blob sha256:03901b4a2ea88eeaad62dbe59b072b28b6efa00491962b8741081c5df50c65e0
# Copying config sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4
# Writing manifest to image destination
# Storing signatures
# $ /usr/bin/buildah push --signature-policy /usr/share/buildah/test/system/./policy.json --tls-verify=true alpine localhost:5000/my-alpine
# Getting image source signatures
# Get "https://localhost:5000/v2/": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
# [ rc=125 (expected) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: push with --tls-verify=true
# #| expected: ' x509: certificate signed by unknown authority'
# #|   actual: 'Getting image source signatures'
# #|         > 'Get "https://localhost:5000/v2/": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# /usr/share/buildah/test/system

Note that:
-                -subj "/C=US/ST=Foo/L=Bar/O=Red Hat, Inc./CN=localhost"
+                -subj "/C=US/ST=Foo/L=Bar/O=Red Hat, Inc./CN=registry host certificate" \
+                -addext subjectAltName=DNS:localhost
is already applied in test_buildah.sh

There are more sysfs related issues in the log + the "ABCD" thing that needs fixing:  http://artifacts.osci.redhat.com/baseos-ci/redhat-module/95/15/9515/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2265/work-tests.ymlbLF0sj/tests-mnKQ78/test.buildah-root.bats.log

Comment 21 Tom Sweeney 2021-01-20 15:30:56 UTC
Jindrich, 

Just want to verify that I'm not reading between the lines too much.  The gating tests for Podman and Buildah are both passing in RHEL 8.4 now?  Only Skopeo is having problems still?

Comment 22 Jindrich Novy 2021-01-20 15:49:47 UTC
Tom, you read it correctly, basically all gating tests are currently failing for 8.4.0: http://shell.bos.redhat.com/~santiago/mbhistory/09515.html

Comment 23 Matthew Heon 2021-01-20 20:33:45 UTC
Digging into the Podman tests - seems to be an issue with `podman mount`. I can't replicate locally. The rootless system tests is a separate issue with `podman diff`. I have seen neither before, though rootless `podman diff` has been known to do wierd things.

Comment 24 Ed Santiago 2021-01-20 20:58:57 UTC
Buildah:
   https://github.com/containers/buildah/pull/2924
   - I've just submitted it now, so it'll take a while to go through CI.

Skopeo:
   https://github.com/containers/skopeo/pull/1169
   - failing CI due to problems (integration tests) beyond my control

Comment 25 Matthew Heon 2021-01-20 21:00:33 UTC
Podman mount issue: the cleanup process is failing to fire. Cause: Conmon is dumping core.



Jan 20 15:57:58 ci-vm-10-0-139-150.hosted.upshift.rdu2.redhat.com systemd-coredump[14175]: Process 14168 (conmon) of user 1000 dumped core.
                                                                                           
                                                                                           Stack trace of thread 14168:
                                                                                           #0  0x00007f799284f37f raise (libc.so.6)
                                                                                           #1  0x00007f7992839db5 abort (libc.so.6)
                                                                                           #2  0x00007f79928924e7 __libc_message (libc.so.6)
                                                                                           #3  0x00007f79928995ec malloc_printerr (libc.so.6)
                                                                                           #4  0x00007f799289b35d _int_free (libc.so.6)
                                                                                           #5  0x0000563bc6b18497 main (conmon)
                                                                                           #6  0x00007f799283b493 __libc_start_main (libc.so.6)
                                                                                           #7  0x0000563bc6b1892e _start (conmon)
                                                                                           
                                                                                           Stack trace of thread 14170:
                                                                                           #0  0x00007f7992909a31 __poll (libc.so.6)
                                                                                           #1  0x00007f7993187ab6 g_main_context_iterate.isra.21 (libglib-2.0.so.0)
                                                                                           #2  0x00007f7993187be0 g_main_context_iteration (libglib-2.0.so.0)
                                                                                           #3  0x00007f7993187c31 glib_worker_main (libglib-2.0.so.0)
                                                                                           #4  0x00007f79931afe1a g_thread_proxy (libglib-2.0.so.0)
                                                                                           #5  0x00007f7991f9e14a start_thread (libpthread.so.0)
                                                                                           #6  0x00007f7992914db3 __clone (libc.so.6)

Comment 26 Matthew Heon 2021-01-20 21:03:52 UTC
Created attachment 1749173 [details]
Conmon Core Dump

Core dump for Conmon from 9515 build

Comment 27 Matthew Heon 2021-01-20 21:04:39 UTC
Adding CC to Peter since we're in Conmon now.

Comment 28 Matthew Heon 2021-01-20 21:38:35 UTC
https://github.com/containers/conmon/pull/233 should fix Conmon

Comment 29 Jindrich Novy 2021-01-21 18:38:16 UTC
Situation is now looking a bit better: http://shell.bos.redhat.com/~santiago/mbhistory/09546.html

The last thing in podman (with conmon fixed) is:

not ok 129 sensitive mount points are masked without --privileged
# (from function `die' in file ./helpers.bash, line 363,
#  from function `run_podman' in file ./helpers.bash, line 181,
#  in test file ./400-unprivileged-access.bats, line 135)
#   `run_podman run --rm $IMAGE stat -c'%n:%F:%h:%T:%t' /dev/null ${subset[@]}' failed
# $ /usr/bin/podman rm --all --force
# $ /usr/bin/podman ps --all --external --format {{.ID}} {{.Names}}
# $ /usr/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# quay.io/libpod/testimage:20200929 766ff5a3a7e4
# $ /usr/bin/podman run --rm quay.io/libpod/testimage:20200929 stat -c%n:%F:%h:%T:%t /dev/null /proc/acpi /proc/kcore /proc/keys /proc/timer_list /proc/sched_debug /proc/scsi /sys/firmware /sys/fs/selinux /sys/dev/block
# /dev/null:character special file:1:3:1
# /proc/acpi:directory:2:0:0
# /proc/kcore:character special file:1:3:1
# /proc/keys:character special file:1:3:1
# /proc/timer_list:character special file:1:3:1
# /proc/sched_debug:character special file:1:3:1
# /proc/scsi:directory:2:0:0
# stat: can't stat '/sys/firmware': No such file or directory
# stat: can't stat '/sys/fs/selinux': No such file or directory
# stat: can't stat '/sys/dev/block': No such file or directory
# [ rc=1 (** EXPECTED 0 **) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: exit code is 1; expected 0
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
# $ /usr/bin/podman pod rm --all --force
# $ /usr/bin/podman rm --all --force

And there are multiple issues with buildah: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/95/46/9546/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2271/work-tests.yml7fLqIA/tests-teM05k/test.buildah-root.bats.log

Comment 30 Giuseppe Scrivano 2021-01-21 19:38:51 UTC
I don't think it is a failure, if these mounts are not present they don't need to be masked.

# stat: can't stat '/sys/firmware': No such file or directory
# stat: can't stat '/sys/fs/selinux': No such file or directory
# stat: can't stat '/sys/dev/block': No such file or directory

I think we need to change the test to verify only the paths that exist

Comment 31 Jindrich Novy 2021-01-22 09:00:15 UTC
Ed, do you mind having a look at comment #30?

Comment 32 Ed Santiago 2021-01-22 17:15:33 UTC
>I think we need to change the test to verify only the paths that exist

@gscrivan can you elaborate on what you mean by that?

Those paths all exist on the host: the test is careful to check.

Do you mean, paths that exist in the container? If so, why do they not exist? Did your PR get rid of everything under /sys? If so why? I thought the purpose of defaultMaskPaths in pkg/specgen/generate/config_linux.go was to block only a subset of /proc and /sys ? Why is this test failing all of a sudden, and why only on RHEL?

Comment 33 Giuseppe Scrivano 2021-01-24 10:32:07 UTC
these paths previously existed in the container because we were always bind mounting /sys from the host.  If they existed on the host they also existed in the container.

Now we don't bind mount /sys (unless it is necessary e.g. when using --net host), so these file systems are not created at all in the container.  The OCI runtimes check whether a path is present before masking it, so in this case /sys/firmware, /sys/fs/selinux and /sys/dev/block are ignored as they don't exist under /sys

Comment 34 Jindrich Novy 2021-01-26 08:29:32 UTC
Ed, thanks for looking into the gating tests for podman - noticed https://github.com/containers/podman/pull/9091 has landed in master. Do you think this could be merged into v3.0 branch - which is the to-be 8.4.0 so that I can run this through RHEL gating tests?

Comment 35 Ed Santiago 2021-01-26 18:19:05 UTC
https://github.com/containers/podman/pull/9108 has been merged into v3.0 branch, it is a subset of #9091 that should address RHEL gating-test failures.

Comment 36 Jindrich Novy 2021-01-28 12:51:17 UTC
*** Bug 1857367 has been marked as a duplicate of this bug. ***

Comment 38 Ed Santiago 2021-01-28 20:29:44 UTC
Buildah:

- not ok 13 authenticate: cert and credentials
  - this is https://github.com/containers/buildah/pull/2924/files as I mentioned in comment 24

- not ok 209 bud capabilities test
  - this is https://github.com/containers/buildah/pull/2631/files as I mentioned in comment 12

- not ok 244 config-flags-verification
  - same as 209 (not the same root cause, but fixed in the same PR)

The rest are /sys failures and I really just have no idea. 

Why are the above fixes not in the RHEL build yet? Do they need to be backported to some specific buildah branch on github? Do they need to be added as patches to the specfile? @jnovy please assign someone to make sure this happens; I don't want to look at those failures again.

Comment 39 Jindrich Novy 2021-01-29 10:51:15 UTC
Ed, 8.4.0 buildah was not switched to the release-1.19 branch yet. I switched it, let's see with the next build...

Comment 40 Jindrich Novy 2021-02-02 17:22:43 UTC
Tom, do you mind merging https://github.com/containers/buildah/pull/2924 and https://github.com/containers/buildah/pull/2631 into release-1.19 buildah branch so that I can consume this in RHEL? Thanks!

Ed, there is a bunch of others in the last build:

not ok 389 user-namespace
not ok 391 combination-namespaces

http://artifacts.osci.redhat.com/baseos-ci/redhat-module/97/69/9769/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2350/work-tests.yml8FbnCZ/tests-DFJxxk/test.buildah-root.bats.log

Do you mind having a quick look?

Then we have only the podman nonroot remaining:
not ok 99 podman diff

http://artifacts.osci.redhat.com/baseos-ci/redhat-module/97/69/9769/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2350/work-tests.yml8FbnCZ/tests-DFJxxk/test.podman-nonroot.bats.log

Comment 41 Ed Santiago 2021-02-02 21:48:03 UTC
podman rootless diff: should be addressed by https://github.com/containers/podman/pull/9209

buildah (several failures): I responded to those already, in comment 38

buildah (/sys failures): that's not something I can help with. @gscrivan  can you PTAL? There are two failures of the form:

    buildah run ...
    # error running container: error creating container for [some-command]: mount `sysfs` to '/sys': Operation not permitted

The failures don't seem to be consistent: other `buildah run` commands work just fine. This is weird, but matches my experience with #9209 in which the `podman diff` failure only happens about half the time.

Comment 42 Giuseppe Scrivano 2021-02-03 07:27:25 UTC
I think buildah is missing the logic podman has for deciding whether sys must be sysfs or bind mounted from the host

Comment 43 Ed Santiago 2021-02-03 14:32:31 UTC
https://github.com/containers/podman/pull/9213 is merged into v3.0; this should resolve the podman rootless failures.

Comment 44 Tom Sweeney 2021-02-03 22:46:09 UTC
Giuseeppe, do you think changes like Matt's: https://github.com/containers/podman/pull/8561/files - "Do not mount sysfs as rootless in more cases" or your PR: https://github.com/containers/podman/pull/8949/files - "specgen: improve heuristic for /sys bind mount"

Would fix this?

Comment 45 Tom Sweeney 2021-02-03 22:47:24 UTC
And Giuseppe, if it's one of those, especially if it's the one that you did, would you mind trying to up in a fix?

Comment 46 Giuseppe Scrivano 2021-02-04 13:35:11 UTC
the error "mount `sysfs` to '/sys': Operation not permitted" is coming from crun.  In this case having the libpod logic into buildah won't help as crun already attempts to bind mount /sys from the host if mounting a fresh sysfs fails. 

it must be something different to cause that error.  I've not managed to reproduce locally yet.

Comment 48 Matej Marušák 2021-02-08 12:53:00 UTC
There were indeed two issues, I addressed it in Fedora: https://src.fedoraproject.org/rpms/cockpit-podman/pull-request/22
Can you please sync this to RHEL as well?

Comment 49 Jindrich Novy 2021-02-09 07:24:52 UTC
Matej, with that applied the gating test now fails with: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/98/53/9853/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2382/work-tests.ymlhr7eyf/tests-T_pC3_/FAIL-str_cockpit-podman.log

+ dnf install -y https://kojipkgs.fedoraproject.org//packages/chromium/87.0.4280.141/1.el8/x86_64/chromium-common-87.0.4280.141-1.el8.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/chromium/87.0.4280.141/1.el8/x86_64/chromium-headless-87.0.4280.141-1.el8.x86_64.rpm
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.

Last metadata expiration check: 0:50:56 ago on Mon Feb  8 22:54:10 2021.
chromium-common-87.0.4280.141-1.el8.x86_64.rpm   14 MB/s |  17 MB     00:01    
chromium-headless-87.0.4280.141-1.el8.x86_64.rp  35 MB/s |  61 MB     00:01    
Error: 
 Problem 1: conflicting requests
  - nothing provides minizip(x86-64) needed by chromium-common-87.0.4280.141-1.el8.x86_64
 Problem 2: conflicting requests
  - nothing provides libminizip.so.1()(64bit) needed by chromium-headless-87.0.4280.141-1.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages)
cp: cannot stat '/var/str/logs/*': No such file or directory

Comment 50 Matej Marušák 2021-02-09 10:55:15 UTC
Meh, this is annoying that we cannot just open PR and let tests run but doing in rhel guess work...

Anyway I think, this might work:
```
 if grep -q 'ID=.*rhel' /etc/os-release; then
+    dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
+    dnf config-manager --enable epel 
     dnf install -y \
```

Comment 51 Jindrich Novy 2021-02-10 07:14:22 UTC
Matej, the installation of Chromium is now working: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/98/73/9873/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2394/work-tests.ymlCog447/tests-UakUC9/FAIL-str_cockpit-podman.log

But there seems to be another dependency missing?

DevTools listening on ws://127.0.0.1:10037/devtools/browser/9641b79d-6b09-4d68-93db-349164b90c97
[0209/221958.811658:ERROR:egl_util.cc(70)] Failed to load GLES library: /usr/lib64/chromium-browser/swiftshader/libGLESv2.so: /usr/lib64/chromium-browser/swiftshader/libGLESv2.so: cannot open shared object file: No such file or directory
[0209/221958.844921:ERROR:viz_main_impl.cc(150)] Exiting GPU process due to errors during initialization
CDP: {"source":"network","level":"error","text":"Failed to load resource: the server responded with a status of 404 (Not Found)","timestamp":1612909200076.4092,"url":"http://127.0.0.1:9090/podman","networkRequestId":"A7F796D6744BE03B5CAE217FB85E3211"}
Traceback (most recent call last):
  File "/var/str/source/test/check-application", line 170, in testBasicSystem
    self._testBasic(True)
  File "/var/str/source/test/check-application", line 294, in _testBasic
    self.login_and_go("/podman", superuser=auth)
  File "/var/str/source/test/common/testlib.py", line 975, in login_and_go
    self.browser.login_and_go(path, user=user, host=host, superuser=superuser, urlroot=urlroot, tls=tls)
  File "/var/str/source/test/common/testlib.py", line 580, in login_and_go
    self.wait_present('#content')
  File "/var/str/source/test/common/testlib.py", line 396, in wait_present
    self.wait_js_func('ph_is_present', selector)
  File "/var/str/source/test/common/testlib.py", line 390, in wait_js_func
    self.wait_js_cond("%s(%s)" % (func, ','.join(map(jsquote, args))))
  File "/var/str/source/test/common/testlib.py", line 387, in wait_js_cond
    self.raise_cdp_exception("timeout\nwait_js_cond", cond, result["exceptionDetails"], trailer)
  File "/var/str/source/test/common/testlib.py", line 176, in raise_cdp_exception
    raise Error("%s(%s): %s" % (func, arg, msg))
testlib.Error: timeout

Comment 53 Matej Marušák 2021-02-10 14:33:29 UTC
Yeah, `cockpit-system` which I included as per comment#48 but yesterday you kicked it out: https://src.osci.redhat.com/rpms/cockpit-podman/c/6806f1e44f344f72a22a35ba7a12e14aedae4303?branch=stream-container-tools-rhel8-rhel-8.4.0
Why did you remove it? It is needed for testing.

Comment 54 Jindrich Novy 2021-02-10 14:40:42 UTC
This issue: https://github.com/cockpit-project/cockpit-podman/pull/663 mislead me I need to remove it.

Comment 56 Matej Marušák 2021-02-11 13:39:52 UTC
This is due to https://github.com/containers/podman/issues/9251

In upstream we did a few hacks in https://github.com/cockpit-project/cockpit-podman/pull/669 which also works around this issue.

I just did upstream release 28.1 where these fixes are included.

Comment 57 Jindrich Novy 2021-02-12 16:41:14 UTC
Thanks Matej, gating tests for cockpit-podman work now!

Chris, I still don't see these commits:
https://github.com/containers/buildah/commit/abf3f0d554145f343932bf277145e2e5225c6914.patch
https://github.com/containers/buildah/commit/c74084a21a602d4f03977b6c7043d315bfa3f67f.patch
https://patch-diff.githubusercontent.com/raw/containers/buildah/pull/2924.patch

merged into release-1.19 branch of buildah, could you please merge these? It is the reason why these fail in RHEL8.4.

Ed, do you mind merging https://patch-diff.githubusercontent.com/raw/containers/skopeo/pull/1169.patch into the release-1.2 skopeo branch so that we get rid of the docker pull rate limits warnings in RHEL8.4?

Thanks!

Comment 58 Tom Sweeney 2021-02-13 23:20:28 UTC
Jindrich,

Ed's enjoying some well earned PTO this week.  I've just backported the patch in the Skopeo release-1.2 branch here: https://github.com/containers/skopeo/pull/1195

Comment 59 Ed Santiago 2021-02-15 15:29:34 UTC
Backport LGTM. Thank you very much Tom!

Comment 60 Tom Sweeney 2021-02-15 22:46:17 UTC
Chris provided the backport to Buildah's release-1.19 branch with: https://github.com/containers/buildah/pull/3014

Comment 61 Jindrich Novy 2021-02-16 17:03:59 UTC
Thanks Chris and Tom!

We are almost there - everything passes except 2 tests for buildah:

http://artifacts.osci.redhat.com/baseos-ci/redhat-module/99/80/9980/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2430/work-tests.ymlOMrjZn/tests-7tekKH/test.buildah-root.bats.log

not ok 392 user-namespace
not ok 394 combination-namespaces

Giuseppe was looking at these in comment #42 - do you think we can fix it somehow in release-1.19 branch?

Comment 62 Giuseppe Scrivano 2021-02-16 17:25:11 UTC
after further investigation I think we are hitting the same kernel issue as: https://bugzilla.redhat.com/show_bug.cgi?id=1903983

Comment 65 Matej Marušák 2021-02-19 19:54:57 UTC
> Matej, seems cockpit-podman started to fail again

It "seems **podman** started to fail again", cockpit-podman is just consumer of this failing service.
From screenshots it seems that podman user service fails to start.
We also store journal when tests fail: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/10/06/10067/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2443/work-tests.ymlAu8v9H/tests-EMLhta/cockpit-podman/TestApplication-testDownloadImage-rhel-8-4-127.0.0.1-22-FAIL.log.gz

It says `/run/user/1001/podman/podman.sock/v1.12/libpod/info?: couldn't connect: Could not connect: No such file or directory`
Then it seems it is actually started, but we never get proper reply from it?

I hoped the last week when I got tired of trying to make rootless service usable with one big hack, it was end of it. But seems not.
This service is just too brittle and there are 100 ways how one can break it.

Here are a few of my reports to show that the service is brittle and not tested properly and c-podman is the testing ground:
https://github.com/containers/podman/issues/9251
https://github.com/containers/podman/issues/8762
https://github.com/containers/podman/issues/8751
https://github.com/containers/podman/issues/6660
https://github.com/containers/podman/issues/5840

Back from my rant to this specific issue. Our CI does not yet see this, as this new version is not yet available. Or the testing machine just got a bit more busy and stuff take longer (as this one seems like race in podman)?
Are there these podman builds somewhere available that I can easily install into VM? But I am convinced this is going to be #9251.
Also can you please retry to make sure, that this is reproducible before I waste one more day in this?
P.S. Today Martin did another release of c-podman, so it can be included. It does not contain any fixes for this specific issue though.

Comment 73 Red Hat Bugzilla 2023-09-15 00:58:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days