Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1823949

Summary: ErrDuplicateName is being returned from c/storage in CI jobs
Product: OpenShift Container Platform Reporter: Mrunal Patel <mpatel>
Component: NodeAssignee: Ted Yu <zyu>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.4CC: adam.kaplan, aos-bugs, bparees, dcbw, dwalsh, gmontero, jokerman, mifiedle, pehunt, rphillips, umohnani, wking, wsun, zyu
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 11:49:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mrunal Patel 2020-04-14 20:32:40 UTC
Description of problem:
We are seeing a lot of CI jobs failing with  'that name is already in use`.
Sample log - https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/5032/pull-ci-openshift-console-master-e2e-gcp-console/16623/build-log.txt

Search query -
https://search.svc.ci.openshift.org/?search=container+.*installer.*+that+name+is+already+in+use.*&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520\




Version-Release number of selected component (if applicable):


How reproducible:
Typically seen in upgrade CI jobs.

Comment 1 Ryan Phillips 2020-04-15 19:00:27 UTC
https://github.com/cri-o/cri-o/pull/3586
https://github.com/cri-o/cri-o/pull/3588

Two fixes for error defers, and a missing defer for the storage provider.

Comment 2 Ryan Phillips 2020-04-16 19:28:30 UTC
*** Bug 1824353 has been marked as a duplicate of this bug. ***

Comment 3 Ben Parees 2020-04-16 19:36:22 UTC
is this fix already in 4.5?

Comment 4 Ted Yu 2020-04-16 19:44:03 UTC
Ryan and my PRs were merged to master yesterday.

Today the back port was temporarily blocked by failing tests (unrelated to the patches).

Comment 5 Ben Parees 2020-04-16 21:03:43 UTC
the symptom that lead us to this is still appearing as of 54 minutes ago on 4.5:

https://search.svc.ci.openshift.org/?search=Timed+out+waiting+for+internal+registry+hostname+to+be+published&maxAge=48h&context=1&type=bug%2Bjunit&name=4.5&maxMatches=5&maxBytes=20971520

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798/artifacts/e2e-gcp/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log

I0416 19:52:58.305495       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"74b3c310-b4a6-4f01-ae77-9604639510d4", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-controller-manager changed: Progressing message changed from "" to "Progressing: daemonset/controller-manager: updated number scheduled is 2, desired number scheduled is 3"

Comment 6 Ted Yu 2020-04-16 21:11:42 UTC
'that name is already in use' didn't show up in openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log

The log quoted above was an informational log.
Was there error / failure ?

Thanks

Comment 7 Ben Parees 2020-04-16 21:19:04 UTC
Yes, the failure is that the openshift controller manager doesn't consider itself fully rolled out (because it's got unscheduled pods in its daemonset) which cascades down to failures elsewhere for things that are waiting to see the OCM fully rolled out.

Comment 8 Peter Hunt 2020-04-17 13:57:54 UTC
We don't integrate master directly into 4.5. The fixes won't make it into 4.5 until we integrate cri-o 1.18 into 4.5, or we get the fixes into 1.17. the latter should happen today, the former should happen next week.

Comment 9 Ben Parees 2020-04-20 13:39:08 UTC
This is absolutely critical to be resolved for 4.4 to ship, what's the latest update?

Comment 10 Ted Yu 2020-04-20 13:58:47 UTC
With https://github.com/cri-o/cri-o/pull/3613/ merged, I expect new cri-o build to be made soon.

Comment 11 Ted Yu 2020-04-20 18:51:04 UTC
Fixes mentioned in comment #1 are in latest 1.17 build.

Comment 12 Ryan Phillips 2020-04-20 22:08:33 UTC
*** Bug 1825946 has been marked as a duplicate of this bug. ***

Comment 13 Ryan Phillips 2020-04-20 22:09:50 UTC
*** Bug 1825949 has been marked as a duplicate of this bug. ***

Comment 15 Ted Yu 2020-04-22 03:26:22 UTC
You should check newer CI for such log (2020-04-21 or later)

Comment 16 Ryan Phillips 2020-04-22 15:20:50 UTC
I don't see any container `.*installer.* that name is already in use.*` in the last 24 hours. Moving to modified and ready for QE.

Comment 21 Gabe Montero 2020-04-23 15:30:43 UTC
*** Bug 1788741 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2020-05-04 11:49:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581