Bug 1781290
Summary: | During a 4.2 to 4.3 upgrade skew test, build e2e tests continuously fail | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
Component: | Build | Assignee: | Adam Kaplan <adam.kaplan> | |
Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.3.0 | CC: | aos-bugs, wzheng | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | 4.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1781302 (view as bug list) | Environment: | ||
Last Closed: | 2020-01-23 11:18:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1781302 |
Description
Clayton Coleman
2019-12-09 17:24:55 UTC
Dec 9 12:47:39.809: INFO: Waiting up to 2 minutes for the internal registry hostname to be published Dec 9 12:47:42.186: INFO: did not find the sequence in the OCM pod logs around the build controller getting started after the internal registry hostname has been set in the OCM config > There are three cases, this bug must be fixed in the appropriate place:
>
> 1. We don't correctly run these e2e tests when the control plane is 4.3 - fix in the 4.2 tests to tolerate a 4.3 control plane so that we can see it pass
> 2. We regressed a product API that a 4.2 client fails against a 4.3 server - must fix for ship because we don't regress APIs
> 3. We don't correctly work with the registry when the control plane is 4.3 and the nodes are 4.2 - must fix because you would break if someone ran during an upgrade.
I suspect #1 - we found on 4.3 we started flaking because the test depended on a specific controller start sequence. We are in fact getting the right internal registry hostname synced:
```
2019-12-09T12:34:29.970319148Z I1209 12:34:29.970252 1 build_controller.go:474] Starting build controller
2019-12-09T12:34:29.970319148Z I1209 12:34:29.970305 1 build_controller.go:476] OpenShift image registry hostname: image-registry.openshift-image-registry.svc:5000
2019-12-09T12:34:30.033985328Z I1209 12:34:30.033931 1 deleted_token_secrets.go:72] caches synced
2019-12-09T12:34:30.034172215Z I1209 12:34:30.034145 1 docker_registry_service.go:154] caches synced
2019-12-09T12:34:30.034209042Z I1209 12:34:30.034188 1 create_dockercfg_secrets.go:220] urls found
2019-12-09T12:34:30.034209042Z I1209 12:34:30.034202 1 create_dockercfg_secrets.go:226] caches synced
2019-12-09T12:34:30.034705813Z I1209 12:34:30.034664 1 docker_registry_service.go:284] Updating registry URLs from map[172.30.107.118:5000:{} image-registry.openshift-image-registry.svc.cluster.local:5000:{} image-registry.openshift-image-registry.svc:5000:{}] to map[172.30.107.118:5000:{} image-registry.openshift-image-registry.svc.cluster.local:5000:{} image-registry.openshift-image-registry.svc:5000:{}]
```
Moving to MODIFIED - we changed the logic that detects that the image registry was published in https://github.com/openshift/origin/pull/24048 "[Feature:Builds][pruning] prune builds based on settings in the buildconfig [Conformance] buildconfigs should have a default history limit set when created via the group api [Suite:openshift/conformance/parallel/minimal]" is not failed now in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.3/161/162/163 and checked in local e2e, also passed. STEP: waiting for openshift namespace imagestreams Dec 11 11:31:19.484: INFO: Waiting up to 2 minutes for the internal registry hostname to be published Dec 11 11:31:22.553: INFO: the OCM pod logs indicate the build controller was started after the internal registry hostname has been set in the OCM config Dec 11 11:31:23.202: INFO: OCM rollout progressing status reports complete Dec 11 11:31:23.202: INFO: Scanning openshift ImageStreams If need me check me more, please feel free to contact me. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |