Bug 2026806 - release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci fails due to missing admin ack
Summary: release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-c...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Framework
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.10.0
Assignee: W. Trevor King
QA Contact:
URL:
Whiteboard:
: 2028761 (view as bug list)
Depends On:
Blocks: 2027929
TreeView+ depends on / blocked
 
Reported: 2021-11-25 22:55 UTC by Aravindh Puthiyaparambil
Modified: 2021-12-07 23:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2027929 (view as bug list)
Environment:
job=release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci=all
Last Closed: 2021-12-07 23:36:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26649 0 None Merged Bug 2026806: test/e2e/upgrade/adminack: Poll gates for duration of update 2021-12-01 04:55:57 UTC
Github openshift origin pull 26661 0 None open Bug 2026806: Admin ack ignore unrelated upgradeable false 2021-12-02 07:02:26 UTC
Github openshift origin pull 26668 0 None open Bug 2026806: clusterversionoperator/adminack.go: Check for nil cm map 2021-12-03 19:03:44 UTC

Description Aravindh Puthiyaparambil 2021-11-25 22:55:39 UTC
release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci

Job URI: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1463537431096594432

Log:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1463537431096594432/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")'
2021-11-24T16:06:34Z RetrievedUpdates=False NoChannel: The update channel has not been configured.
2021-11-24T16:30:08Z Available=True -: Done applying 4.8.0-0.nightly-2021-11-24-020113
2021-11-24T19:32:46Z Failing=False -: -
2021-11-24T19:03:45Z Progressing=True ClusterOperatorUpdating: Working towards 4.9.0-0.ci-2021-11-24-092816: 205 of 738 done (27% complete), waiting on openshift-apiserver
2021-11-24T19:04:01Z Upgradeable=False AdminAckRequired: Kubernetes 1.22 and therefore OpenShift 4.9 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6329921 for details and instructions.

Comment 1 W. Trevor King 2021-12-01 04:52:58 UTC
I closed this CURRENTRELEASE, but then realized that we should see improvements in the 4.7 -> ... -> 4.10 job [1] now that the master/4.10 PR has landed with this bug.  I'm agnostic about whether we stay in CURRENTRELEASE or move back to MODIFIED/ON_QA as we wait for that to get some new runs.  I've also opened the 4.9 backport bug 2027929, and I'm agnostic about how long we cook in 4.10 before moving ahead and landing that backport.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci

Comment 3 Jack Ottofaro 2021-12-03 19:09:26 UTC
*** Bug 2028761 has been marked as a duplicate of this bug. ***

Comment 5 shweta 2021-12-06 07:33:21 UTC
this e2e is failing on direct install0f 4.8.23 as well on upgrade job from 4.8.22 to 4.8.23 for ppc64le 
" [sig-cluster-lifecycle] TestAdminAck should succeed [Suite:openshift/conformance/parallel]"

Comment 6 Surender Yadav 2021-12-06 14:14:46 UTC
S390x Upgrade jobs for 4.9 from 4.8 is continuously failing with test "disruption_tests: [bz-Cluster Version Operator] Verify presence of admin ack gate blocks upgrade until acknowledged" 

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-upgrade-from-nightly-4.8-ocp-remote-libvirt-s390x/1467690345897660416

Comment 8 W. Trevor King 2021-12-06 19:46:44 UTC
Checking a 4.7 -> ... -> 4.10 run, now that the third PR is in [1,2].  It has Jack's map-nil fix:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1467585900341891072/artifacts/release/artifacts/release-images-latest | jq -r '.spec.tags[] | select(.name == "tests").annotations["io.openshift.build.commit.id"]'
  bc643fa990bef62359eeaf8c54e1aa475f642193
  $ git --no-pager log --oneline -1 bc643fa990bef62359eeaf8c54e1aa475f642193
  bc643fa990 Merge pull request #26668 from jottofar/check-for-nil-map

But there's still a failure due to an DNS/networking hiccup:

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1467585900341891072/build-log.txt | grep FAIL:
  Dec  6 00:00:10.440: FAIL: Error accessing configmap openshift-config-managed/admin-gates: Get "https://api.ci-op-g2m38jp7-eafe9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-config-managed/configmaps/admin-gates": dial tcp: lookup api.ci-op-g2m38jp7-eafe9.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host

We may want to relax the test-case a bit so that it comes back and tries again when that sort of thing happens, instead of calling framework.Fail.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1467585900341891072

Comment 9 W. Trevor King 2021-12-06 19:50:00 UTC
The failures mentioned in comment 6 are from adminack.go line 193, and that's the nil-map thing Jack's fixed in master in origin#26668:

$ git cat-file -p origin/release-4.9:test/extended/util/adminack.go | grep -n . | grep -5 ^193:
188:func setAdminGate(ctx context.Context, gateName string, gateValue string, oc *CLI) string {
189:    ackCm, errMsg := getAdminAcksConfigMap(ctx, oc)
190:    if len(errMsg) != 0 {
191:            framework.Failf(errMsg)
192:    }
193:    ackCm.Data[gateName] = gateValue
194:    _, err := oc.AdminKubeClient().CoreV1().ConfigMaps("openshift-config").Update(ctx, ackCm, metav1.UpdateOptions{})
195:    if err != nil {
196:            return fmt.Sprintf("Unable to update configmap openshift-config/admin-acks, err=%v.", err)
197:    }
198:    return ""

Comment 10 W. Trevor King 2021-12-07 23:36:32 UTC
I'm punting the "no such host" hiccup from comment 8 to follow up work, and marking this CLOSED CURRENTRELEASE based on the other changes it made fixing issues we had been seeing in CI.


Note You need to log in before you can comment on or make changes to this bug.