Bug 2047844
| Summary: | Leaking sessions to vCenter causing vpxd to crash | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Matthew Robson <mrobson> |
| Component: | Cloud Compute | Assignee: | dmoiseev |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | high | CC: | aos-bugs, hekumar, jspeed |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-02-24 15:26:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2004953, 2048496 | ||
| Bug Blocks: | |||
Description of problem: vSphere team reported problem their clusters due to high load and vpxd crashing. Debugging with vmware, they could thousands of sessions from their 'osedeploy' account across their 40+ IPI clusters. We start to see 503 errors from pods like vsphere problem detector and cluster storage operator. ./var/log/pods/openshift-cluster-storage-operator_cluster-storage-operator-8558ccf8dd-rlsd9_bc544c60-4f94-438f-aa81-daddb8d9b691/cluster-storage-operator/0.log:2022-01-22T01:48:20.811341202+00:00 stderr F I0122 01:48:20.811270 1 status_controller.go:211] clusteroperator/storage diff {"status":{"conditions":[{"lastTransitionTime":"2022-01-21T16:55:43Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2022-01-21T17:48:04Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2022-01-21T17:48:18Z","message":"VSphereProblemDetectorControllerAvailable: failed to connect to server.company.com: POST \"/sdk\": 503 Service Unavailable","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2022-01-21T16:55:56Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}} Version-Release number of selected component (if applicable): 4.8.27 How reproducible: Always Steps to Reproduce: 1. Lets the clusters run 2. 3. Actual results: Many sessions causing perf issues. Expected results: