job: periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-workers-rhel7 is failing frequently in CI, see testgrid results: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-workers-rhel7 It looks like this permanently started failing around 6/30.
RHEL scaleup completes but the step fails waiting for machineconfigpools/worker to update: 2021-07-07 13:48:47+00:00 - Waiting for worker machineconfigpool to update + oc wait machineconfigpool/worker --for=condition=Updated=True --timeout=10m error: timed out waiting for the condition on machineconfigpools/worker Also, before this step we attempt to delete the RHCOS nodes which also reports failure waiting for nodes to delete. 2021-07-07 13:36:45+00:00 - Waiting for CoreOS nodes to be removed + oc wait node --for=delete --timeout=10m --selector node.openshift.io/os_id=rhcos,node-role.kubernetes.io/worker node/ip-10-0-136-97.us-west-1.compute.internal condition met node/ip-10-0-139-212.us-west-1.compute.internal condition met error: timed out waiting for the condition on nodes/ip-10-0-195-255.us-west-1.compute.internal
Needs prioritized.
Will review again for a future sprint.
Looking at current failures, the problem mentioned in comment 1 is no longer happening. The job is failing on many tests, mostly related to [sig-build]. https://search.ci.openshift.org/?search=failed%3A.*sig-builds&maxAge=336h&context=1&type=build-log&name=nightly-4.9-e2e-aws-workers-rhel7&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
It seems like there may be problems with how images are being built on the rhel workers. From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-workers-rhel7/1453429013811826688, ~~~ Oct 27 19:48:50.266: INFO: Running 'oc --namespace=e2e-test-custom-build-mg99l --kubeconfig=/tmp/configfile033660499 logs -f build.build.openshift.io/custom-builder-image-1 --timestamps --v 10' Oct 27 19:48:50.703: INFO: 2021-10-27T19:48:32.928390267Z Receiving source from STDIN as archive ... 2021-10-27T19:48:36.018068153Z time="2021-10-27T19:48:36Z" level=info msg="metacopy option not supported on this kernelmetacopy=on" 2021-10-27T19:48:36.028271597Z time="2021-10-27T19:48:36Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: failed to mount overlay: invalid argument" 2021-10-27T19:48:36.033946337Z I1027 19:48:36.033922 1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on]. 2021-10-27T19:48:36.060829210Z Caching blobs under "/var/cache/blobs". 2021-10-27T19:48:36.062453705Z 2021-10-27T19:48:36.062453705Z Pulling image registry.redhat.io/rhel8/buildah:latest ... 2021-10-27T19:48:37.137004711Z Getting image source signatures 2021-10-27T19:48:37.352238984Z Copying blob sha256:06038631a24a25348b51d1bfc7d0a0ee555552a8998f8328f9b657d02dd4c64c 2021-10-27T19:48:37.359880877Z Copying blob sha256:262268b65bd5f33784d6a61514964887bc18bc00c60c588bc62bfae7edca46f1 2021-10-27T19:48:40.364347355Z Copying blob sha256:b794e6c09d5c032e7e212bc66f7b125b429381e722d87502e48253ef580f54d8 2021-10-27T19:48:43.185523971Z Copying config sha256:d19c0a0e81fa7244281d1df2f85594408ceeac80101cac33fc115393dbdacc8e 2021-10-27T19:48:43.195644584Z Writing manifest to image destination 2021-10-27T19:48:43.197240176Z Storing signatures 2021-10-27T19:48:49.189933108Z Adding transient rw bind mount for /run/secrets/rhsm 2021-10-27T19:48:49.191279569Z STEP 1: FROM registry.redhat.io/rhel8/buildah:latest 2021-10-27T19:48:49.230217051Z time="2021-10-27T19:48:49Z" level=error msg="error unmounting /var/lib/containers/storage/overlay/d456bbc62bcb2af7a7f9ee6f146e8ee71d65bedff6b221fe92e23488b8b2f04e/merged: invalid argument" 2021-10-27T19:48:49.319969178Z error: build error: error mounting new container: error mounting build container "0ad7cb573de4467bd0e4980e266fdd05ceaccecc1495dd9558cb4d88aed66968": error creating overlay mount to /var/lib/containers/storage/overlay/d456bbc62bcb2af7a7f9ee6f146e8ee71d65bedff6b221fe92e23488b8b2f04e/merged, mount_data="metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/TCAZGT2TQQ6LTJXT34DEBHF2J6:/var/lib/containers/storage/overlay/l/MFETL4Y53FI7A3OLWO6L6LUXLR:/var/lib/containers/storage/overlay/l/MN4J7IGIFV3UR43ZVJ3BEOHATY,upperdir=/var/lib/containers/storage/overlay/d456bbc62bcb2af7a7f9ee6f146e8ee71d65bedff6b221fe92e23488b8b2f04e/diff,workdir=/var/lib/containers/storage/overlay/d456bbc62bcb2af7a7f9ee6f146e8ee71d65bedff6b221fe92e23488b8b2f04e/work": invalid argument ~~~
Could we enlist some assistance from the Build team in helping to diagnose what the issue may be with the failure to perform builds on rhel7 workers?
*** Bug 2023942 has been marked as a duplicate of this bug. ***
Not completed during this sprint.
Root cause: Recent upgrades to buildah and its related libraries causes buildah to set incorrect options for the overlayfs storage driver on RHEL 7 hosts. This currently only impacts OCP 4.9 clusters with RHEL7 worker nodes - 4.8 and earlier versions are not impacted. Work around: Builds continue to function on RHCOS worker nodes if such nodes can be provisioned. The RHEL7 worker nodes do not need to be torn down - developers can use the following NodeSelector on their BuildConfig objects to ensure that builds only run on RHCOS nodes [1]: "node.openshift.io/os_id: rhcos" This same NodeSelector can be applied to all builds cluster-wide using the buildOverride configuration option [2]. [1] https://docs.openshift.com/container-platform/4.9/cicd/builds/advanced-build-operations.html#builds-assigning-builds-to-nodes_advanced-build-operations [2] https://docs.openshift.com/container-platform/4.9/cicd/builds/build-configuration.html
Comment 17 dropped UpgradeBlocker, so I'm clearing ImpactStatementRequested.
Verify ocp build with rhel, ocp 4.10.0-0.nightly-2021-12-10-033652 ======================== 1. Create a project testing-rhel 2. applied buildconfig with nodeSelector `node.openshift.io/os_id: rhel` buildconfig.yaml ``` apiVersion: build.openshift.io/v1 kind: BuildConfig metadata: name: example namespace: testing-rhel spec: nodeSelector: node.openshift.io/os_id: rhel source: git: ref: master uri: 'https://github.com/openshift/ruby-ex.git' type: Git strategy: type: Source sourceStrategy: from: kind: ImageStreamTag name: 'ruby:2.7' namespace: openshift env: [] triggers: - type: ImageChange imageChange: {} - type: ConfigChange ``` 3. Builds gets completed $ oc get builds NAME TYPE FROM STATUS STARTED DURATION example-1 Source Git@01effef Complete 54 seconds ago 54s $ oc get bc NAME TYPE FROM LATEST example Source Git@master 1 $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES example-1-build 0/1 Completed 0 69s <ip> jshjs-lshvl-w-a-l-rhel-1 <none> <none> $ oc logs pod/example-1-build time="2022-01-06T10:09:53Z" level=info msg="metacopy [...] "/var/cache/blobs". Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:19f0b4c21e1b5e77d5442515719a543d0dca5c6b7f57bfaeea8c5e100ae63232... Getting image source signatures Copying blob sha256:b46ca46c303b49d886a7585735ebd1dc8651e83d0fab5823300cf3a9fd2febc1 Copying blob sha256:ac08ca107ad9ed699cbd28339749dd6463a84c73aa1d468a4241385fc4ec3876 [...] Writing manifest to image destination Storing signatures Generating dockerfile with builder image image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:19f0b4c21e1b5e77d5442515719a543d0dca5c6b7f57bfaeea8c5e100ae63232 Adding transient rw bind mount for /run/secrets/rhsm Adding transient rw bind mount for /run/secrets/redhat.repo STEP 1/9: FROM image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:19f0b4c21e1b5e77d5442515719a543d0dca5c6b7f57bfaeea8c5e100ae63232 time="2022-01-06T10:10:07Z" level=warning msg="Ignoring global metacopy option, not supported with booted kernel" STEP 2/9: LABEL "io.openshift.build.image"="image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:19f0b4c21e1b5e77d5442515719a543d0dca5c6b7f57bfaeea8c5e100ae63232" "io.openshift.build.commit.author"="Honza Horak <hhorak>" "io.openshift.build.commit.date"="Fri Aug 21 13:44:47 2020 +0200" "io.openshift.build.commit.id"="01effef3a23935c1a83110d4b074b0738d677c44" "io.openshift.build.commit.ref"="master" "io.openshift.build.commit.message"="Merge pull request #35 from pvalena/bundler" "io.openshift.build.source-location"="https://github.com/openshift/ruby-ex.git" STEP 3/9: ENV OPENSHIFT_BUILD_NAME="example-1" OPENSHIFT_BUILD_NAMESPACE="testing-rhel" OPENSHIFT_BUILD_SOURCE="https://github.com/openshift/ruby-ex.git" OPENSHIFT_BUILD_REFERENCE="master" OPENSHIFT_BUILD_COMMIT="01effef3a23935c1a83110d4b074b0738d677c44" STEP 4/9: USER root STEP 5/9: COPY upload/src /tmp/src STEP 6/9: RUN chown -R 1001:0 /tmp/src STEP 7/9: USER 1001 STEP 8/9: RUN /usr/libexec/s2i/assemble ---> Installing application source ... ---> Building your Ruby application from source ... ---> Running 'bundle install --retry 2 --deployment --without development:test' ... [DEPRECATED] The `--deployment` flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use `bundle config set --local deployment 'true'`, and stop using this flag [DEPRECATED] The `--path` flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use `bundle config set --local path './bundle'`, and stop using this flag [DEPRECATED] The `--without` flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use `bundle config set --local without 'development:test'`, and stop using this flag Fetching gem metadata from https://rubygems.org/ Fetching gem metadata from https://rubygems.org/.. Fetching gem metadata from https://rubygems.org/.. Using bundler 2.2.24 Fetching nio4r 2.5.2 Fetching rack 2.2.3 Installing nio4r 2.5.2 with native extensions Installing rack 2.2.3 Fetching puma 4.3.5 Installing puma 4.3.5 with native extensions Bundle complete! 2 Gemfile dependencies, 4 gems now installed. Gems in the groups 'development' and 'test' were not installed. Bundled gems are installed into `./bundle` ---> Cleaning up unused ruby gems ... Running `bundle clean --verbose` with bundler 2.2.24 Frozen, using resolution from the lockfile STEP 9/9: CMD /usr/libexec/s2i/run COMMIT temp.builder.openshift.io/testing-rhel/example-1:c434382c time="2022-01-06T10:10:18Z" level=warning msg="Ignoring global metacopy option, not supported with booted kernel" Getting image source signatures Copying blob sha256:cc423b2000aec40199a4f4e1012f2e9b573d4ce6bc1ca416a598f8e1d45f3d13 Copying blob sha256:41d099875e8768dcadb9f7e388d68c50eb25f6160c8a3858b966d12d89e4d288 Copying blob sha256:3cd3b63408eccc3f9a1ffb740cf311d927927f94247e952af1c9b67c1ad2db4f Copying blob sha256:3dca2e66497972abbd6a7796a701296ada6bb53013b52d4432bc0d3f1cf0e7bd Copying blob sha256:83b76fb61d8095ec96901a654c78ecb24246378f905d96eb152348af72089f70 Copying blob sha256:f008aacb05a5e87c49ca50c5f5ac03b6c1b633f38249ed0617e984381b426c27 Copying config sha256:0c3c9936d566ecca29f4e8b92dfcceca341d632056ff056e628a1f612fe7f50b Writing manifest to image destination Storing signatures --> 0c3c9936d56 Successfully tagged temp.builder.openshift.io/testing-rhel/example-1:c434382c 0c3c9936d566ecca29f4e8b92dfcceca341d632056ff056e628a1f612fe7f50b Build complete, no image push requested
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056