Description of problem: We are seeing a lot of CI jobs failing with 'that name is already in use`. Sample log - https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/5032/pull-ci-openshift-console-master-e2e-gcp-console/16623/build-log.txt Search query - https://search.svc.ci.openshift.org/?search=container+.*installer.*+that+name+is+already+in+use.*&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520\ Version-Release number of selected component (if applicable): How reproducible: Typically seen in upgrade CI jobs.
https://github.com/cri-o/cri-o/pull/3586 https://github.com/cri-o/cri-o/pull/3588 Two fixes for error defers, and a missing defer for the storage provider.
*** Bug 1824353 has been marked as a duplicate of this bug. ***
is this fix already in 4.5?
Ryan and my PRs were merged to master yesterday. Today the back port was temporarily blocked by failing tests (unrelated to the patches).
the symptom that lead us to this is still appearing as of 54 minutes ago on 4.5: https://search.svc.ci.openshift.org/?search=Timed+out+waiting+for+internal+registry+hostname+to+be+published&maxAge=48h&context=1&type=bug%2Bjunit&name=4.5&maxMatches=5&maxBytes=20971520 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798 https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798/artifacts/e2e-gcp/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log I0416 19:52:58.305495 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"74b3c310-b4a6-4f01-ae77-9604639510d4", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-controller-manager changed: Progressing message changed from "" to "Progressing: daemonset/controller-manager: updated number scheduled is 2, desired number scheduled is 3"
'that name is already in use' didn't show up in openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log The log quoted above was an informational log. Was there error / failure ? Thanks
Yes, the failure is that the openshift controller manager doesn't consider itself fully rolled out (because it's got unscheduled pods in its daemonset) which cascades down to failures elsewhere for things that are waiting to see the OCM fully rolled out.
We don't integrate master directly into 4.5. The fixes won't make it into 4.5 until we integrate cri-o 1.18 into 4.5, or we get the fixes into 1.17. the latter should happen today, the former should happen next week.
This is absolutely critical to be resolved for 4.4 to ship, what's the latest update?
With https://github.com/cri-o/cri-o/pull/3613/ merged, I expect new cri-o build to be made soon.
Fixes mentioned in comment #1 are in latest 1.17 build.
*** Bug 1825946 has been marked as a duplicate of this bug. ***
*** Bug 1825949 has been marked as a duplicate of this bug. ***
You should check newer CI for such log (2020-04-21 or later)
I don't see any container `.*installer.* that name is already in use.*` in the last 24 hours. Moving to modified and ready for QE.
*** Bug 1788741 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581