Bug 1914884
| Summary: | Fix gating tests of container-tools for 8.4.0 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Jindrich Novy <jnovy> | ||||
| Component: | container-tools-rhel8-module | Assignee: | Jindrich Novy <jnovy> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Joy Pu <ypu> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.4 | CC: | cevich, dwalsh, gscrivan, jligon, jnovy, lmiksik, lsm5, mheon, mitr, mmarusak, nalin, pehunt, santiago, tsweeney, vrothber, ypu | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1937863 (view as bug list) | Environment: | |||||
| Last Closed: | 2021-03-18 18:26:20 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1937863 | ||||||
| Attachments: |
|
||||||
|
Description
Jindrich Novy
2021-01-11 12:14:19 UTC
/tmp/bats.79244.src: line 45: socat: command not found Solution: add 'Requires: socat' to podman-tests specfile (this has already been done on some branches, it needs to be done on all) That's just one of the errors, I'll continue looking skopeo: looks like it needs the same openssl tweak. Cockpit: sorry, no clue. I think this is as far as I can go. Clearing needinfo. Thanks Ed. I made all changes you suggested. podman-root and nonroot has only one (and the same) failure now: https://baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ci-openstack-mbs-sti/2226/artifact/work-tests.ymlGI3Dyy/tests-sRkftb/test.podman-root.bats.log For buildah there is still plenty of failures: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/94/25/9425/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2226/work-tests.ymlGI3Dyy/tests-sRkftb/test.buildah-root.bats.log Tom, Matej, is there anything else we can do about cockpit-podman? http://artifacts.osci.redhat.com/baseos-ci/redhat-module/94/25/9425/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2226/work-tests.ymlGI3Dyy/tests-sRkftb/FAIL-str_cockpit-podman.log podman: no idea buildah: 13: looks like it needs the SANS/CN openssl fix. Maybe somewhere else. I can't look right not. buildah: 209 (capabilities), 244 (ABCD): see these two upstream commits: https://github.com/containers/buildah/pull/2631/commits/c74084a21a602d4f03977b6c7043d315bfa3f67f https://github.com/containers/buildah/pull/2631/commits/abf3f0d554145f343932bf277145e2e5225c6914 buildah (all the others): needs @gscrivan attention; I believe he's working on exactly this problem in this podman PR: https://github.com/containers/podman/pull/8949 Reminder: test logs are completely unreadable without my Greasemonkey extension:
https://github.com/edsantiago/greasemonkey/tree/master/podman-ginkgo-highlight
podman ping: please apply https://github.com/containers/podman/pull/8975/files > Tom, Matej, is there anything else we can do about cockpit-podman? http://artifacts.osci.redhat.com/baseos-ci/redhat-module/94/25/9425/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2226/work-tests.ymlGI3Dyy/tests-sRkftb/FAIL-str_cockpit-podman.log Ahh, there is a massive thinko! https://src.osci.redhat.com/rpms/cockpit-podman/blob/stream-container-tools-rhel8-rhel-8.4.0/f/tests/browser.sh#_53 This line deletes all __root__ containers and not all admin ones! It was supposed to delete all admin containers, now it does not make any sense and of course it fails to save any container. I think replacing it with `sudo -i -u admin podman rmi --all` should do the trick. Jindrich, can you please fix it or should I adjust it? Thanks Matej, makes sense! Just applied your fix and doing a fresh build - fingers crossed it fixes the gating tests for cockpit-podman. I will let you know if not. Matej, it fails differently now: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/94/67/9467/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2245/work-tests.ymlnJGWQE/tests-79cSSp/FAIL-str_cockpit-podman.log Can you please take a look? It fails, because we expect state of container to be `exited` but now it is `stopped`. Has there been any change lately regarding these states? I wonder if our tests should just not care about this distinction, as done in podman in https://github.com/containers/podman/commit/88cd6488166bc799645efac8f2df389a352b2653 (and explained in https://github.com/containers/podman/issues/5336) In upstream we now accept both states: https://github.com/cockpit-project/cockpit-podman/pull/653 Thanks Matej, that seems to do the work: http://shell.bos.redhat.com/~santiago/mbhistory/09515.html - but not sure whether the cockpit-podman gating test was running at all? Ed, Tom, for skopeo, there seems to be still docker.io references in tests we might want to get rid of: jnovy@localhost .../skopeo-bdb117ded6d37f0a6b0a2e28ba3213c20264ab43/systemtest (test_docker_remove *)$ grep -r docker\.io helpers.bash:REGISTRY_FQIN=${SKOPEO_TEST_REGISTRY_FQIN:-docker.io/library/registry:2} 060-delete.bats: local remote_image=docker://docker.io/library/busybox:latest 050-signing.bats: run_skopeo copy docker://docker.io/library/busybox:latest \ 040-local-registry-auth.bats: docker://docker.io/library/busybox:latest \ 040-local-registry-auth.bats: docker://docker.io/library/busybox:latest \ 030-local-registry-tls.bats: docker://docker.io/library/busybox:latest \ 020-copy.bats: local remote_image=docker://docker.io/library/busybox:latest 020-copy.bats: local remote_image=docker://docker.io/library/busybox:latest 020-copy.bats: local remote_image=docker://docker.io/library/busybox:latest 020-copy.bats: local remote_image=docker://docker.io/library/busybox:latest 020-copy.bats: local alpine=docker.io/library/alpine:latest 010-inspect.bats: remote_image=docker://docker.io/$arch/golang At least three tests are failing because of this: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/95/15/9515/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2265/work-tests.ymlbLF0sj/tests-mnKQ78/test.skopeo-root.bats.log Dan, Matt, for podman-3.0.0rc1 root there is a mount error: not ok 54 podman mount - basic test # (from function `die' in file ./helpers.bash, line 363, # in test file ./060-mount.bats, line 33) # `die "Mounted file exists even after umount: $mount_path/$f_path"' failed # # /usr/bin/podman rm --all --force # # /usr/bin/podman ps --all --external --format {{.ID}} {{.Names}} # # /usr/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # quay.io/libpod/testimage:20200929 766ff5a3a7e4 # # /usr/bin/podman run --name mount_test_qrlnV quay.io/libpod/testimage:20200929 sh -c echo OrY1H3NkdByHSXrRN1Zu9lEAVV3piC > /tmp/tmpfile_X1XJZP1O # # /usr/bin/podman mount mount_test_qrlnV # /var/lib/containers/storage/overlay/dcadffbd6bf6f53b893dcc42ca89f28deed37dd99e268b85b928cbe1469b51d3/merged # # /usr/bin/podman mount --notruncate # 72fcbfa10fdb646d81099ec647f1160546a0cf887422fd6a4b65a99c46d11db0 /var/lib/containers/storage/overlay/dcadffbd6bf6f53b893dcc42ca89f28deed37dd99e268b85b928cbe1469b51d3/merged # # /usr/bin/podman umount mount_test_qrlnV # mount_test_qrlnV # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: Mounted file exists even after umount: /var/lib/containers/storage/overlay/dcadffbd6bf6f53b893dcc42ca89f28deed37dd99e268b85b928cbe1469b51d3/merged//tmp/tmpfile_X1XJZP1O # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # [teardown] # # /usr/bin/podman pod rm --all --force # # /usr/bin/podman rm --all --force # 72fcbfa10fdb646d81099ec647f1160546a0cf887422fd6a4b65a99c46d11db0 Ed, Giuseppe, for podman-3.0.0rc1 rootless there seems one test needs to be fixed as /sys/* got added? not ok 98 podman diff # (from function `is' in file ./helpers.bash, line 406, # in test file ./140-diff.bats, line 29) # `is "$result" "${expect[$field]}" "$field"' failed # $ /usr/bin/podman rm --all --force # $ /usr/bin/podman ps --all --external --format {{.ID}} {{.Names}} # $ /usr/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # quay.io/libpod/testimage:20200929 766ff5a3a7e4 # $ /usr/bin/podman run --name iutGMFcPkR quay.io/libpod/testimage:20200929 sh -c touch /Z9Uphvxwil;rm /etc/services # $ /usr/bin/podman diff --format json -l # {"changed":["/etc"],"added":["/sys/fs","/sys/fs/cgroup","/Z9Uphvxwil"],"deleted":["/etc/services"]} # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: added # #| expected: '/Z9Uphvxwil' # #| actual: '/sys/fs' # #| > '/sys/fs/cgroup' # #| > '/Z9Uphvxwil' # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # [teardown] # $ /usr/bin/podman pod rm --all --force # $ /usr/bin/podman rm --all --force # 2cc19d9cb8813906f9fa9fb1955232ca17f3a0a7a8252862125b3318f93dc116 Nalin, Tom, for buildah, does the cert check need to be amended any further? not ok 13 authenticate: cert and credentials # (from function `expect_output' in file ./helpers.bash, line 269, # in test file ./authenticate.bats, line 37) # `expect_output --substring " x509: certificate signed by unknown authority" \' failed with status 125 # /usr/share/buildah/test/system /usr/share/buildah/test/system # # [checking for: alpine] # # [podman pull alpine] # Resolved short name "alpine" to a recorded short-name alias (origin: /etc/containers/registries.conf.d/shortnames.conf) # Trying to pull docker.io/library/alpine:latest... # Getting image source signatures # Copying blob sha256:9d16cba9fb961d1aafec9542f2bf7cb64acfc55245f9e4eb5abecd4cdc38d749 # Copying config sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 # Writing manifest to image destination # Storing signatures # 961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 # # [podman save --format oci-archive alpine >/tmp/buildah-image-cache.180179/alpine-.tar ] # $ /usr/bin/buildah push --signature-policy /usr/share/buildah/test/system/./policy.json --tls-verify=false --creds testuser:testpassword alpine localhost:5000/my-alpine # Getting image source signatures # Copying blob sha256:03901b4a2ea88eeaad62dbe59b072b28b6efa00491962b8741081c5df50c65e0 # Copying config sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 # Writing manifest to image destination # Storing signatures # $ /usr/bin/buildah push --signature-policy /usr/share/buildah/test/system/./policy.json --tls-verify=true alpine localhost:5000/my-alpine # Getting image source signatures # Get "https://localhost:5000/v2/": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0 # [ rc=125 (expected) ] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: push with --tls-verify=true # #| expected: ' x509: certificate signed by unknown authority' # #| actual: 'Getting image source signatures' # #| > 'Get "https://localhost:5000/v2/": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0' # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # /usr/share/buildah/test/system Note that: - -subj "/C=US/ST=Foo/L=Bar/O=Red Hat, Inc./CN=localhost" + -subj "/C=US/ST=Foo/L=Bar/O=Red Hat, Inc./CN=registry host certificate" \ + -addext subjectAltName=DNS:localhost is already applied in test_buildah.sh There are more sysfs related issues in the log + the "ABCD" thing that needs fixing: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/95/15/9515/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2265/work-tests.ymlbLF0sj/tests-mnKQ78/test.buildah-root.bats.log Jindrich, Just want to verify that I'm not reading between the lines too much. The gating tests for Podman and Buildah are both passing in RHEL 8.4 now? Only Skopeo is having problems still? Tom, you read it correctly, basically all gating tests are currently failing for 8.4.0: http://shell.bos.redhat.com/~santiago/mbhistory/09515.html Digging into the Podman tests - seems to be an issue with `podman mount`. I can't replicate locally. The rootless system tests is a separate issue with `podman diff`. I have seen neither before, though rootless `podman diff` has been known to do wierd things. Buildah: https://github.com/containers/buildah/pull/2924 - I've just submitted it now, so it'll take a while to go through CI. Skopeo: https://github.com/containers/skopeo/pull/1169 - failing CI due to problems (integration tests) beyond my control Podman mount issue: the cleanup process is failing to fire. Cause: Conmon is dumping core.
Jan 20 15:57:58 ci-vm-10-0-139-150.hosted.upshift.rdu2.redhat.com systemd-coredump[14175]: Process 14168 (conmon) of user 1000 dumped core.
Stack trace of thread 14168:
#0 0x00007f799284f37f raise (libc.so.6)
#1 0x00007f7992839db5 abort (libc.so.6)
#2 0x00007f79928924e7 __libc_message (libc.so.6)
#3 0x00007f79928995ec malloc_printerr (libc.so.6)
#4 0x00007f799289b35d _int_free (libc.so.6)
#5 0x0000563bc6b18497 main (conmon)
#6 0x00007f799283b493 __libc_start_main (libc.so.6)
#7 0x0000563bc6b1892e _start (conmon)
Stack trace of thread 14170:
#0 0x00007f7992909a31 __poll (libc.so.6)
#1 0x00007f7993187ab6 g_main_context_iterate.isra.21 (libglib-2.0.so.0)
#2 0x00007f7993187be0 g_main_context_iteration (libglib-2.0.so.0)
#3 0x00007f7993187c31 glib_worker_main (libglib-2.0.so.0)
#4 0x00007f79931afe1a g_thread_proxy (libglib-2.0.so.0)
#5 0x00007f7991f9e14a start_thread (libpthread.so.0)
#6 0x00007f7992914db3 __clone (libc.so.6)
Created attachment 1749173 [details]
Conmon Core Dump
Core dump for Conmon from 9515 build
Adding CC to Peter since we're in Conmon now. https://github.com/containers/conmon/pull/233 should fix Conmon Situation is now looking a bit better: http://shell.bos.redhat.com/~santiago/mbhistory/09546.html The last thing in podman (with conmon fixed) is: not ok 129 sensitive mount points are masked without --privileged # (from function `die' in file ./helpers.bash, line 363, # from function `run_podman' in file ./helpers.bash, line 181, # in test file ./400-unprivileged-access.bats, line 135) # `run_podman run --rm $IMAGE stat -c'%n:%F:%h:%T:%t' /dev/null ${subset[@]}' failed # $ /usr/bin/podman rm --all --force # $ /usr/bin/podman ps --all --external --format {{.ID}} {{.Names}} # $ /usr/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # quay.io/libpod/testimage:20200929 766ff5a3a7e4 # $ /usr/bin/podman run --rm quay.io/libpod/testimage:20200929 stat -c%n:%F:%h:%T:%t /dev/null /proc/acpi /proc/kcore /proc/keys /proc/timer_list /proc/sched_debug /proc/scsi /sys/firmware /sys/fs/selinux /sys/dev/block # /dev/null:character special file:1:3:1 # /proc/acpi:directory:2:0:0 # /proc/kcore:character special file:1:3:1 # /proc/keys:character special file:1:3:1 # /proc/timer_list:character special file:1:3:1 # /proc/sched_debug:character special file:1:3:1 # /proc/scsi:directory:2:0:0 # stat: can't stat '/sys/firmware': No such file or directory # stat: can't stat '/sys/fs/selinux': No such file or directory # stat: can't stat '/sys/dev/block': No such file or directory # [ rc=1 (** EXPECTED 0 **) ] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: exit code is 1; expected 0 # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # [teardown] # $ /usr/bin/podman pod rm --all --force # $ /usr/bin/podman rm --all --force And there are multiple issues with buildah: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/95/46/9546/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2271/work-tests.yml7fLqIA/tests-teM05k/test.buildah-root.bats.log I don't think it is a failure, if these mounts are not present they don't need to be masked. # stat: can't stat '/sys/firmware': No such file or directory # stat: can't stat '/sys/fs/selinux': No such file or directory # stat: can't stat '/sys/dev/block': No such file or directory I think we need to change the test to verify only the paths that exist Ed, do you mind having a look at comment #30? >I think we need to change the test to verify only the paths that exist
@gscrivan can you elaborate on what you mean by that?
Those paths all exist on the host: the test is careful to check.
Do you mean, paths that exist in the container? If so, why do they not exist? Did your PR get rid of everything under /sys? If so why? I thought the purpose of defaultMaskPaths in pkg/specgen/generate/config_linux.go was to block only a subset of /proc and /sys ? Why is this test failing all of a sudden, and why only on RHEL?
these paths previously existed in the container because we were always bind mounting /sys from the host. If they existed on the host they also existed in the container. Now we don't bind mount /sys (unless it is necessary e.g. when using --net host), so these file systems are not created at all in the container. The OCI runtimes check whether a path is present before masking it, so in this case /sys/firmware, /sys/fs/selinux and /sys/dev/block are ignored as they don't exist under /sys Ed, thanks for looking into the gating tests for podman - noticed https://github.com/containers/podman/pull/9091 has landed in master. Do you think this could be merged into v3.0 branch - which is the to-be 8.4.0 so that I can run this through RHEL gating tests? https://github.com/containers/podman/pull/9108 has been merged into v3.0 branch, it is a subset of #9091 that should address RHEL gating-test failures. *** Bug 1857367 has been marked as a duplicate of this bug. *** Latest build of 8.4.0 container-tools (http://shell.bos.redhat.com/~santiago/mbhistory/09653.html) shows buildah as the only package failing RHEL gating tests: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/96/53/9653/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2305/work-tests.ymlwGW2ER/tests-rikjah/test.buildah-root.bats.log So the effort should shift there now. Buildah: - not ok 13 authenticate: cert and credentials - this is https://github.com/containers/buildah/pull/2924/files as I mentioned in comment 24 - not ok 209 bud capabilities test - this is https://github.com/containers/buildah/pull/2631/files as I mentioned in comment 12 - not ok 244 config-flags-verification - same as 209 (not the same root cause, but fixed in the same PR) The rest are /sys failures and I really just have no idea. Why are the above fixes not in the RHEL build yet? Do they need to be backported to some specific buildah branch on github? Do they need to be added as patches to the specfile? @jnovy please assign someone to make sure this happens; I don't want to look at those failures again. Ed, 8.4.0 buildah was not switched to the release-1.19 branch yet. I switched it, let's see with the next build... Tom, do you mind merging https://github.com/containers/buildah/pull/2924 and https://github.com/containers/buildah/pull/2631 into release-1.19 buildah branch so that I can consume this in RHEL? Thanks! Ed, there is a bunch of others in the last build: not ok 389 user-namespace not ok 391 combination-namespaces http://artifacts.osci.redhat.com/baseos-ci/redhat-module/97/69/9769/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2350/work-tests.yml8FbnCZ/tests-DFJxxk/test.buildah-root.bats.log Do you mind having a quick look? Then we have only the podman nonroot remaining: not ok 99 podman diff http://artifacts.osci.redhat.com/baseos-ci/redhat-module/97/69/9769/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2350/work-tests.yml8FbnCZ/tests-DFJxxk/test.podman-nonroot.bats.log podman rootless diff: should be addressed by https://github.com/containers/podman/pull/9209 buildah (several failures): I responded to those already, in comment 38 buildah (/sys failures): that's not something I can help with. @gscrivan can you PTAL? There are two failures of the form: buildah run ... # error running container: error creating container for [some-command]: mount `sysfs` to '/sys': Operation not permitted The failures don't seem to be consistent: other `buildah run` commands work just fine. This is weird, but matches my experience with #9209 in which the `podman diff` failure only happens about half the time. I think buildah is missing the logic podman has for deciding whether sys must be sysfs or bind mounted from the host https://github.com/containers/podman/pull/9213 is merged into v3.0; this should resolve the podman rootless failures. Giuseeppe, do you think changes like Matt's: https://github.com/containers/podman/pull/8561/files - "Do not mount sysfs as rootless in more cases" or your PR: https://github.com/containers/podman/pull/8949/files - "specgen: improve heuristic for /sys bind mount" Would fix this? And Giuseppe, if it's one of those, especially if it's the one that you did, would you mind trying to up in a fix? the error "mount `sysfs` to '/sys': Operation not permitted" is coming from crun. In this case having the libpod logic into buildah won't help as crun already attempts to bind mount /sys from the host if mounting a fresh sysfs fails. it must be something different to cause that error. I've not managed to reproduce locally yet. Matej, do you mind having a look at the gating tests errors for cockpit-podman-28? http://artifacts.osci.redhat.com/baseos-ci/redhat-module/98/42/9842/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2381/work-tests.ymlZhmBRP/tests-rPuKVL/FAIL-str_cockpit-podman.log There were indeed two issues, I addressed it in Fedora: https://src.fedoraproject.org/rpms/cockpit-podman/pull-request/22 Can you please sync this to RHEL as well? Matej, with that applied the gating test now fails with: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/98/53/9853/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2382/work-tests.ymlhr7eyf/tests-T_pC3_/FAIL-str_cockpit-podman.log + dnf install -y https://kojipkgs.fedoraproject.org//packages/chromium/87.0.4280.141/1.el8/x86_64/chromium-common-87.0.4280.141-1.el8.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/chromium/87.0.4280.141/1.el8/x86_64/chromium-headless-87.0.4280.141-1.el8.x86_64.rpm Updating Subscription Management repositories. Unable to read consumer identity This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. Last metadata expiration check: 0:50:56 ago on Mon Feb 8 22:54:10 2021. chromium-common-87.0.4280.141-1.el8.x86_64.rpm 14 MB/s | 17 MB 00:01 chromium-headless-87.0.4280.141-1.el8.x86_64.rp 35 MB/s | 61 MB 00:01 Error: Problem 1: conflicting requests - nothing provides minizip(x86-64) needed by chromium-common-87.0.4280.141-1.el8.x86_64 Problem 2: conflicting requests - nothing provides libminizip.so.1()(64bit) needed by chromium-headless-87.0.4280.141-1.el8.x86_64 (try to add '--skip-broken' to skip uninstallable packages) cp: cannot stat '/var/str/logs/*': No such file or directory Meh, this is annoying that we cannot just open PR and let tests run but doing in rhel guess work... Anyway I think, this might work: ``` if grep -q 'ID=.*rhel' /etc/os-release; then + dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm + dnf config-manager --enable epel dnf install -y \ ``` Matej, the installation of Chromium is now working: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/98/73/9873/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2394/work-tests.ymlCog447/tests-UakUC9/FAIL-str_cockpit-podman.log But there seems to be another dependency missing? DevTools listening on ws://127.0.0.1:10037/devtools/browser/9641b79d-6b09-4d68-93db-349164b90c97 [0209/221958.811658:ERROR:egl_util.cc(70)] Failed to load GLES library: /usr/lib64/chromium-browser/swiftshader/libGLESv2.so: /usr/lib64/chromium-browser/swiftshader/libGLESv2.so: cannot open shared object file: No such file or directory [0209/221958.844921:ERROR:viz_main_impl.cc(150)] Exiting GPU process due to errors during initialization CDP: {"source":"network","level":"error","text":"Failed to load resource: the server responded with a status of 404 (Not Found)","timestamp":1612909200076.4092,"url":"http://127.0.0.1:9090/podman","networkRequestId":"A7F796D6744BE03B5CAE217FB85E3211"} Traceback (most recent call last): File "/var/str/source/test/check-application", line 170, in testBasicSystem self._testBasic(True) File "/var/str/source/test/check-application", line 294, in _testBasic self.login_and_go("/podman", superuser=auth) File "/var/str/source/test/common/testlib.py", line 975, in login_and_go self.browser.login_and_go(path, user=user, host=host, superuser=superuser, urlroot=urlroot, tls=tls) File "/var/str/source/test/common/testlib.py", line 580, in login_and_go self.wait_present('#content') File "/var/str/source/test/common/testlib.py", line 396, in wait_present self.wait_js_func('ph_is_present', selector) File "/var/str/source/test/common/testlib.py", line 390, in wait_js_func self.wait_js_cond("%s(%s)" % (func, ','.join(map(jsquote, args)))) File "/var/str/source/test/common/testlib.py", line 387, in wait_js_cond self.raise_cdp_exception("timeout\nwait_js_cond", cond, result["exceptionDetails"], trailer) File "/var/str/source/test/common/testlib.py", line 176, in raise_cdp_exception raise Error("%s(%s): %s" % (func, arg, msg)) testlib.Error: timeout Yeah, `cockpit-system` which I included as per comment#48 but yesterday you kicked it out: https://src.osci.redhat.com/rpms/cockpit-podman/c/6806f1e44f344f72a22a35ba7a12e14aedae4303?branch=stream-container-tools-rhel8-rhel-8.4.0 Why did you remove it? It is needed for testing. This issue: https://github.com/cockpit-project/cockpit-podman/pull/663 mislead me I need to remove it. Matej, it is still failing even after cockpit-system was added back, please take a look: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/98/92/9892/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2403/work-tests.ymlPiWmcc/tests-v2nnvW/FAIL-str_cockpit-podman.log This is due to https://github.com/containers/podman/issues/9251 In upstream we did a few hacks in https://github.com/cockpit-project/cockpit-podman/pull/669 which also works around this issue. I just did upstream release 28.1 where these fixes are included. Thanks Matej, gating tests for cockpit-podman work now! Chris, I still don't see these commits: https://github.com/containers/buildah/commit/abf3f0d554145f343932bf277145e2e5225c6914.patch https://github.com/containers/buildah/commit/c74084a21a602d4f03977b6c7043d315bfa3f67f.patch https://patch-diff.githubusercontent.com/raw/containers/buildah/pull/2924.patch merged into release-1.19 branch of buildah, could you please merge these? It is the reason why these fail in RHEL8.4. Ed, do you mind merging https://patch-diff.githubusercontent.com/raw/containers/skopeo/pull/1169.patch into the release-1.2 skopeo branch so that we get rid of the docker pull rate limits warnings in RHEL8.4? Thanks! Jindrich, Ed's enjoying some well earned PTO this week. I've just backported the patch in the Skopeo release-1.2 branch here: https://github.com/containers/skopeo/pull/1195 Backport LGTM. Thank you very much Tom! Chris provided the backport to Buildah's release-1.19 branch with: https://github.com/containers/buildah/pull/3014 Thanks Chris and Tom! We are almost there - everything passes except 2 tests for buildah: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/99/80/9980/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2430/work-tests.ymlOMrjZn/tests-7tekKH/test.buildah-root.bats.log not ok 392 user-namespace not ok 394 combination-namespaces Giuseppe was looking at these in comment #42 - do you think we can fix it somehow in release-1.19 branch? after further investigation I think we are hitting the same kernel issue as: https://bugzilla.redhat.com/show_bug.cgi?id=1903983 > Matej, seems cockpit-podman started to fail again It "seems **podman** started to fail again", cockpit-podman is just consumer of this failing service. From screenshots it seems that podman user service fails to start. We also store journal when tests fail: http://artifacts.osci.redhat.com/baseos-ci/redhat-module/10/06/10067/https___baseos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com-ci-openstack-mbs-sti/2443/work-tests.ymlAu8v9H/tests-EMLhta/cockpit-podman/TestApplication-testDownloadImage-rhel-8-4-127.0.0.1-22-FAIL.log.gz It says `/run/user/1001/podman/podman.sock/v1.12/libpod/info?: couldn't connect: Could not connect: No such file or directory` Then it seems it is actually started, but we never get proper reply from it? I hoped the last week when I got tired of trying to make rootless service usable with one big hack, it was end of it. But seems not. This service is just too brittle and there are 100 ways how one can break it. Here are a few of my reports to show that the service is brittle and not tested properly and c-podman is the testing ground: https://github.com/containers/podman/issues/9251 https://github.com/containers/podman/issues/8762 https://github.com/containers/podman/issues/8751 https://github.com/containers/podman/issues/6660 https://github.com/containers/podman/issues/5840 Back from my rant to this specific issue. Our CI does not yet see this, as this new version is not yet available. Or the testing machine just got a bit more busy and stuff take longer (as this one seems like race in podman)? Are there these podman builds somewhere available that I can easily install into VM? But I am convinced this is going to be #9251. Also can you please retry to make sure, that this is reproducible before I waste one more day in this? P.S. Today Martin did another release of c-podman, so it can be included. It does not contain any fixes for this specific issue though. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |