Bug 2026806

Summary: release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci fails due to missing admin ack
Product: OpenShift Container Platform Reporter: Aravindh Puthiyaparambil <aravindh>
Component: Test FrameworkAssignee: W. Trevor King <wking>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: medium    
Version: 4.8CC: jack.ottofaro, sbiragda, sippy, suryadav, wking
Target Milestone: ---Keywords: Reopened
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2027929 (view as bug list) Environment:
job=release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci=all
Last Closed: 2021-12-07 23:36:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2027929    

Description Aravindh Puthiyaparambil 2021-11-25 22:55:39 UTC
release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci

Job URI: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1463537431096594432

Log:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.6-to-4.7-to-4.8-to-4.9-ci/1463537431096594432/artifacts/e2e-aws-upgrade/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")'
2021-11-24T16:06:34Z RetrievedUpdates=False NoChannel: The update channel has not been configured.
2021-11-24T16:30:08Z Available=True -: Done applying 4.8.0-0.nightly-2021-11-24-020113
2021-11-24T19:32:46Z Failing=False -: -
2021-11-24T19:03:45Z Progressing=True ClusterOperatorUpdating: Working towards 4.9.0-0.ci-2021-11-24-092816: 205 of 738 done (27% complete), waiting on openshift-apiserver
2021-11-24T19:04:01Z Upgradeable=False AdminAckRequired: Kubernetes 1.22 and therefore OpenShift 4.9 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6329921 for details and instructions.

Comment 1 W. Trevor King 2021-12-01 04:52:58 UTC
I closed this CURRENTRELEASE, but then realized that we should see improvements in the 4.7 -> ... -> 4.10 job [1] now that the master/4.10 PR has landed with this bug.  I'm agnostic about whether we stay in CURRENTRELEASE or move back to MODIFIED/ON_QA as we wait for that to get some new runs.  I've also opened the 4.9 backport bug 2027929, and I'm agnostic about how long we cook in 4.10 before moving ahead and landing that backport.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci

Comment 3 Jack Ottofaro 2021-12-03 19:09:26 UTC
*** Bug 2028761 has been marked as a duplicate of this bug. ***

Comment 5 shweta 2021-12-06 07:33:21 UTC
this e2e is failing on direct install0f 4.8.23 as well on upgrade job from 4.8.22 to 4.8.23 for ppc64le 
" [sig-cluster-lifecycle] TestAdminAck should succeed [Suite:openshift/conformance/parallel]"

Comment 6 Surender Yadav 2021-12-06 14:14:46 UTC
S390x Upgrade jobs for 4.9 from 4.8 is continuously failing with test "disruption_tests: [bz-Cluster Version Operator] Verify presence of admin ack gate blocks upgrade until acknowledged" 

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-upgrade-from-nightly-4.8-ocp-remote-libvirt-s390x/1467690345897660416

Comment 8 W. Trevor King 2021-12-06 19:46:44 UTC
Checking a 4.7 -> ... -> 4.10 run, now that the third PR is in [1,2].  It has Jack's map-nil fix:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1467585900341891072/artifacts/release/artifacts/release-images-latest | jq -r '.spec.tags[] | select(.name == "tests").annotations["io.openshift.build.commit.id"]'
  bc643fa990bef62359eeaf8c54e1aa475f642193
  $ git --no-pager log --oneline -1 bc643fa990bef62359eeaf8c54e1aa475f642193
  bc643fa990 Merge pull request #26668 from jottofar/check-for-nil-map

But there's still a failure due to an DNS/networking hiccup:

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1467585900341891072/build-log.txt | grep FAIL:
  Dec  6 00:00:10.440: FAIL: Error accessing configmap openshift-config-managed/admin-gates: Get "https://api.ci-op-g2m38jp7-eafe9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-config-managed/configmaps/admin-gates": dial tcp: lookup api.ci-op-g2m38jp7-eafe9.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host

We may want to relax the test-case a bit so that it comes back and tries again when that sort of thing happens, instead of calling framework.Fail.

[1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.7-to-4.8-to-4.9-to-4.10-ci/1467585900341891072

Comment 9 W. Trevor King 2021-12-06 19:50:00 UTC
The failures mentioned in comment 6 are from adminack.go line 193, and that's the nil-map thing Jack's fixed in master in origin#26668:

$ git cat-file -p origin/release-4.9:test/extended/util/adminack.go | grep -n . | grep -5 ^193:
188:func setAdminGate(ctx context.Context, gateName string, gateValue string, oc *CLI) string {
189:    ackCm, errMsg := getAdminAcksConfigMap(ctx, oc)
190:    if len(errMsg) != 0 {
191:            framework.Failf(errMsg)
192:    }
193:    ackCm.Data[gateName] = gateValue
194:    _, err := oc.AdminKubeClient().CoreV1().ConfigMaps("openshift-config").Update(ctx, ackCm, metav1.UpdateOptions{})
195:    if err != nil {
196:            return fmt.Sprintf("Unable to update configmap openshift-config/admin-acks, err=%v.", err)
197:    }
198:    return ""

Comment 10 W. Trevor King 2021-12-07 23:36:32 UTC
I'm punting the "no such host" hiccup from comment 8 to follow up work, and marking this CLOSED CURRENTRELEASE based on the other changes it made fixing issues we had been seeing in CI.