Bug 1771747
| Summary: | Operator installed via OperatorHub/markeplace crashing on pod creation with network error | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rogerio Bastos <rbastos> | ||||||
| Component: | OLM | Assignee: | Evan Cordell <ecordell> | ||||||
| OLM sub component: | OperatorHub | QA Contact: | Fan Jia <jfan> | ||||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | high | CC: | cblecker, dageoffr, haowang, jeder, nhale, nmalik, rbastos | ||||||
| Version: | 4.2.0 | ||||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.2.z | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-11-13 15:01:01 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1635509 [details]
oc describe pod
On this cluster noticed an error with the CatalogSourceConfig: $ oc get catalogsourceconfig -n openshift-marketplace NAME STATUS MESSAGE AGE installed-certified-brandon-images Configuring namespaces "brandon-images" not found 50d A test was done to confirm the bad CatalogConfigConfig was causing the issue: 1) Problem in the CatalogSourceConfig object $ oc get catalogsourceconfig installed-certified-rhopp1 NAME STATUS MESSAGE AGE installed-certified-rhopp1 Configuring namespaces "rhopp1" not found 33d 2) Missing NS created: $ oc new-project rhopp1 3) Issue fixed: $ oc get catalogsourceconfig installed-certified-rhopp1 NAME STATUS MESSAGE AGE installed-certified-rhopp1 Succeeded The object has been successfully reconciled 33d $ oc get pods NAME READY STATUS RESTARTS AGE installed-certified-rhopp1-7bdfd77876-st8bg 1/1 Running 0 64m marketplace-operator-d6774f7b6-4pcv4 1/1 Running 0 2d16h osd-curated-certified-operators-66488dd446-vn2lh 1/1 Running 0 2d16h osd-curated-community-operators-6bdcf8d74c-mr4kc 1/1 Running 0 2d16h osd-curated-redhat-operators-54944c5bc4-59t45 1/1 Running 0 18h *** This bug has been marked as a duplicate of bug 1767547 *** For easy reference, the cloned 4.2.z version of the duplicate bug above is https://bugzilla.redhat.com/show_bug.cgi?id=1769841 and is awaiting cherry pick for the next 4.2.z release. |
Created attachment 1635508 [details] operator deployment Description of problem: An operator deployed by a customer from the certified-operators repo is crashing when creating a pod from the deployment object, with a network error. Multiple bogus replicasets are being created in the cluster, to a point where it starts impacting responsiveness of cluster API Version-Release number of selected component (if applicable): Openshift 4.2.2 How reproducible: This same behavior was identified in 2 customer clusters, always in deployments created in the openshift-marketplace NS, when customers deploy operators from the marketplace/OperatorHub. It was mapped with different operators. Steps to Reproduce: 1.Customer deploys operator into openshift-marketplace NS 2.Wait until deployment gets created and pods crashing 3.After a couple of days the cluster will have thousands of failed replicaSets, impacting cluster api/responsiveness Actual results: Example from openshift-marketplace NS, pods: NAME READY STATUS RESTARTS AGE installed-certified-brandon-images-58bf78b4c6-c59n6 1/1 Running 0 131m installed-certified-brandon-images-5d557857fb-xflrp 0/1 Terminating 0 3s installed-certified-brandon-images-65cc8f96f8-n9pjr 0/1 Terminating 0 5s installed-certified-brandon-images-6c6bfc9f66-xsdzs 0/1 Terminating 0 1s installed-certified-brandon-images-6f7d85fb78-ftmqn 0/1 Terminating 0 6s installed-certified-brandon-images-7b4ff448cb-9rbzb 0/1 Terminating 0 8s Constant errors identified when doing: oc get events 46m Normal Scheduled pod/installed-certified-brandon-images-fff78bc57-fjctc Successfully assigned openshift-marketplace/installed-certified-brandon-images-fff78bc57-fjctc to ip-10-0-130-71.ca-central-1.compute.internal 45m Warning FailedCreatePodSandBox pod/installed-certified-brandon-images-fff78bc57-fjctc Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installed-certified-brandon-images-fff78bc57-fjctc_openshift-marketplace_77985f6e-058d-11ea-8b30-02637b2b3cea_0(e81726545e10a158414a8438639d33224de03421eca39607e80b252ff305d08e): Multus: Err adding pod to network "openshift-sdn": cannot set "openshift-sdn" ifname to "eth0": no netns: failed to Statfs "/proc/98478/ns/net": no such file or directory 46m Normal SuccessfulCreate replicaset/installed-certified-brandon-images-fff78bc57 Created pod: installed-certified-brandon-images-fff78bc57-fjctc 46m Normal SuccessfulDelete replicaset/installed-certified-brandon-images-fff78bc57 Deleted pod: installed-certified-brandon-images-fff78bc57-fjctc 109m Normal Scheduled pod/installed-certified-brandon-images-fffb5c476-l5rfb Successfully assigned openshift-marketplace/installed-certified-brandon-images-fffb5c476-l5rfb to ip-10-0-138-18.ca-central-1.compute.internal 109m Warning FailedCreatePodSandBox pod/installed-certified-brandon-images-fffb5c476-l5rfb Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installed-certified-brandon-images-fffb5c476-l5rfb_openshift-marketplace_a17da02a-0584-11ea-8b30-02637b2b3cea_0(7f8779b10a58ed39ef373eb43f99a7a4807b6df31cb76d70ae9d7f72b7053ba4): Multus: Err adding pod to network "openshift-sdn": cannot set "openshift-sdn" ifname to "eth0": no netns: failed to Statfs "/proc/5807/ns/net": no such file or directory Operator Deployment: (attached to this BZ) Count of replicasets: $ oc get replicaset | grep brandon | wc -l 6532 In another cluster it was identified >27k bogus replicaSets Additional info: An example of `oc describe pod` is attached as well