Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1823949

Summary:	ErrDuplicateName is being returned from c/storage in CI jobs
Product:	OpenShift Container Platform	Reporter:	Mrunal Patel <mpatel>
Component:	Node	Assignee:	Ted Yu <zyu>
Status:	CLOSED ERRATA	QA Contact:	Sunil Choudhary <schoudha>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	4.4	CC:	adam.kaplan, aos-bugs, bparees, dcbw, dwalsh, gmontero, jokerman, mifiedle, pehunt, rphillips, umohnani, wking, wsun, zyu
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-05-04 11:49:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mrunal Patel 2020-04-14 20:32:40 UTC

Description of problem:
We are seeing a lot of CI jobs failing with  'that name is already in use`.
Sample log - https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/5032/pull-ci-openshift-console-master-e2e-gcp-console/16623/build-log.txt

Search query -
https://search.svc.ci.openshift.org/?search=container+.*installer.*+that+name+is+already+in+use.*&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520\




Version-Release number of selected component (if applicable):


How reproducible:
Typically seen in upgrade CI jobs.

Comment 1 Ryan Phillips 2020-04-15 19:00:27 UTC

https://github.com/cri-o/cri-o/pull/3586
https://github.com/cri-o/cri-o/pull/3588

Two fixes for error defers, and a missing defer for the storage provider.

Comment 2 Ryan Phillips 2020-04-16 19:28:30 UTC

*** Bug 1824353 has been marked as a duplicate of this bug. ***

Comment 3 Ben Parees 2020-04-16 19:36:22 UTC

is this fix already in 4.5?

Comment 4 Ted Yu 2020-04-16 19:44:03 UTC

Ryan and my PRs were merged to master yesterday.

Today the back port was temporarily blocked by failing tests (unrelated to the patches).

Comment 5 Ben Parees 2020-04-16 21:03:43 UTC

the symptom that lead us to this is still appearing as of 54 minutes ago on 4.5:

https://search.svc.ci.openshift.org/?search=Timed+out+waiting+for+internal+registry+hostname+to+be+published&maxAge=48h&context=1&type=bug%2Bjunit&name=4.5&maxMatches=5&maxBytes=20971520

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798/artifacts/e2e-gcp/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log

I0416 19:52:58.305495       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"74b3c310-b4a6-4f01-ae77-9604639510d4", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-controller-manager changed: Progressing message changed from "" to "Progressing: daemonset/controller-manager: updated number scheduled is 2, desired number scheduled is 3"

Comment 6 Ted Yu 2020-04-16 21:11:42 UTC

'that name is already in use' didn't show up in openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log

The log quoted above was an informational log.
Was there error / failure ?

Thanks

Comment 7 Ben Parees 2020-04-16 21:19:04 UTC

Yes, the failure is that the openshift controller manager doesn't consider itself fully rolled out (because it's got unscheduled pods in its daemonset) which cascades down to failures elsewhere for things that are waiting to see the OCM fully rolled out.

Comment 8 Peter Hunt 2020-04-17 13:57:54 UTC

We don't integrate master directly into 4.5. The fixes won't make it into 4.5 until we integrate cri-o 1.18 into 4.5, or we get the fixes into 1.17. the latter should happen today, the former should happen next week.

Comment 9 Ben Parees 2020-04-20 13:39:08 UTC

This is absolutely critical to be resolved for 4.4 to ship, what's the latest update?

Comment 10 Ted Yu 2020-04-20 13:58:47 UTC

With https://github.com/cri-o/cri-o/pull/3613/ merged, I expect new cri-o build to be made soon.

Comment 11 Ted Yu 2020-04-20 18:51:04 UTC

Fixes mentioned in comment #1 are in latest 1.17 build.

Comment 12 Ryan Phillips 2020-04-20 22:08:33 UTC

*** Bug 1825946 has been marked as a duplicate of this bug. ***

Comment 13 Ryan Phillips 2020-04-20 22:09:50 UTC

*** Bug 1825949 has been marked as a duplicate of this bug. ***

Comment 15 Ted Yu 2020-04-22 03:26:22 UTC

You should check newer CI for such log (2020-04-21 or later)

Comment 16 Ryan Phillips 2020-04-22 15:20:50 UTC

I don't see any container `.*installer.* that name is already in use.*` in the last 24 hours. Moving to modified and ready for QE.

Comment 21 Gabe Montero 2020-04-23 15:30:43 UTC

*** Bug 1788741 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2020-05-04 11:49:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581