Bug 1859240
Summary: | Network Operator degraded with `openshift-multus: could not update object (/v1, Kind=Namespace) /openshift-multus: Internal error occurred: admission plugin "MutatingAdmissionWebhook" failed to complete mutation in 13s)` | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | rvanderp | |
Component: | Etcd | Assignee: | Sam Batschelet <sbatsche> | |
Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> | |
Severity: | high | Docs Contact: | ||
Priority: | low | |||
Version: | 4.3.z | CC: | abudavis, aos-bugs, bbennett, cruhm, dosmith, fpan, mfojtik, rkshirsa, sbatsche, sttts, xxia | |
Target Milestone: | --- | |||
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1954032 (view as bug list) | Environment: | ||
Last Closed: | 2020-08-31 15:17:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1954032 |
Description
rvanderp
2020-07-21 14:25:20 UTC
OVS flows, sosreport from master-0, and must-gather to follow. Thanks for the report. I'm trying to get some eyes on this to get a second opinion. I don't recognize this error. And it looks like an API connectivity issue for the multus-admission-controller from the output.
> We tried deleting the SDN, OVS, and Multus pods. This did not address the issue. The only way to resolve the issue was to restart master-0.
This also to me points at there possibly being a problem on the API side.
The console does also seem relevant, it may be suffering the same ills.
I need another opinion on this, and if the multus-admission-controller is a symptom of a deeper cause, then I'd like to get the BZ assigned to the right engineers.
Thanks to some investigation from Aniket Bhat, we did also find this error: ``` 2020-07-10T21:53:58.671180642Z W0710 21:53:58.670857 1 reflector.go:302] github.com/k8snetworkplumbingwg/net-attach-def-admission-controller/pkg/controller/controller.go:181: watch of *v1.Pod ended with: very short watch: github.com/k8snetworkplumbingwg/net-attach-def-admission-controller/pkg/controller/controller.go:181: Unexpected watch close - watch lasted less than a second and no items received ``` It's still not a smoking gun, but provides some more insight. Some research about this message seems to point at the possibility of the node not being able to communicate with the API server. Ever since we upgraded our Openshift cluster (Azure IPI) from Openshift 4.3.22 to 4.3.28, we get a similar issue "Error creating build pod: Internal error occurred: admission plugin "MutatingAdmissionWebhook" failed to complete mutation in 13s" At the same time, we also see the below errors reported on the dashboard: "The API server has an abnormal latency of 13.016189253374991 seconds for POST pods." "API server is returning errors for 100% of requests for POST pods" "The API server has a 99th percentile latency of 14.95 seconds for POST pods." The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |