Bug 1823949 - ErrDuplicateName is being returned from c/storage in CI jobs
Summary: ErrDuplicateName is being returned from c/storage in CI jobs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.4
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.4.0
Assignee: Ted Yu
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1788741 1825949 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-14 20:32 UTC by Mrunal Patel
Modified: 2020-05-04 11:49 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:49:19 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:49:41 UTC

Description Mrunal Patel 2020-04-14 20:32:40 UTC
Description of problem:
We are seeing a lot of CI jobs failing with  'that name is already in use`.
Sample log - https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console/5032/pull-ci-openshift-console-master-e2e-gcp-console/16623/build-log.txt

Search query -
https://search.svc.ci.openshift.org/?search=container+.*installer.*+that+name+is+already+in+use.*&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520\




Version-Release number of selected component (if applicable):


How reproducible:
Typically seen in upgrade CI jobs.

Comment 1 Ryan Phillips 2020-04-15 19:00:27 UTC
https://github.com/cri-o/cri-o/pull/3586
https://github.com/cri-o/cri-o/pull/3588

Two fixes for error defers, and a missing defer for the storage provider.

Comment 2 Ryan Phillips 2020-04-16 19:28:30 UTC
*** Bug 1824353 has been marked as a duplicate of this bug. ***

Comment 3 Ben Parees 2020-04-16 19:36:22 UTC
is this fix already in 4.5?

Comment 4 Ted Yu 2020-04-16 19:44:03 UTC
Ryan and my PRs were merged to master yesterday.

Today the back port was temporarily blocked by failing tests (unrelated to the patches).

Comment 5 Ben Parees 2020-04-16 21:03:43 UTC
the symptom that lead us to this is still appearing as of 54 minutes ago on 4.5:

https://search.svc.ci.openshift.org/?search=Timed+out+waiting+for+internal+registry+hostname+to+be+published&maxAge=48h&context=1&type=bug%2Bjunit&name=4.5&maxMatches=5&maxBytes=20971520

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798/artifacts/e2e-gcp/pods/openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log

I0416 19:52:58.305495       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"74b3c310-b4a6-4f01-ae77-9604639510d4", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/openshift-controller-manager changed: Progressing message changed from "" to "Progressing: daemonset/controller-manager: updated number scheduled is 2, desired number scheduled is 3"

Comment 6 Ted Yu 2020-04-16 21:11:42 UTC
'that name is already in use' didn't show up in openshift-controller-manager-operator_openshift-controller-manager-operator-6f7788dfd9-bjdd4_operator.log

The log quoted above was an informational log.
Was there error / failure ?

Thanks

Comment 7 Ben Parees 2020-04-16 21:19:04 UTC
Yes, the failure is that the openshift controller manager doesn't consider itself fully rolled out (because it's got unscheduled pods in its daemonset) which cascades down to failures elsewhere for things that are waiting to see the OCM fully rolled out.

Comment 8 Peter Hunt 2020-04-17 13:57:54 UTC
We don't integrate master directly into 4.5. The fixes won't make it into 4.5 until we integrate cri-o 1.18 into 4.5, or we get the fixes into 1.17. the latter should happen today, the former should happen next week.

Comment 9 Ben Parees 2020-04-20 13:39:08 UTC
This is absolutely critical to be resolved for 4.4 to ship, what's the latest update?

Comment 10 Ted Yu 2020-04-20 13:58:47 UTC
With https://github.com/cri-o/cri-o/pull/3613/ merged, I expect new cri-o build to be made soon.

Comment 11 Ted Yu 2020-04-20 18:51:04 UTC
Fixes mentioned in comment #1 are in latest 1.17 build.

Comment 12 Ryan Phillips 2020-04-20 22:08:33 UTC
*** Bug 1825946 has been marked as a duplicate of this bug. ***

Comment 13 Ryan Phillips 2020-04-20 22:09:50 UTC
*** Bug 1825949 has been marked as a duplicate of this bug. ***

Comment 15 Ted Yu 2020-04-22 03:26:22 UTC
You should check newer CI for such log (2020-04-21 or later)

Comment 16 Ryan Phillips 2020-04-22 15:20:50 UTC
I don't see any container `.*installer.* that name is already in use.*` in the last 24 hours. Moving to modified and ready for QE.

Comment 21 Gabe Montero 2020-04-23 15:30:43 UTC
*** Bug 1788741 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2020-05-04 11:49:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.