Bug 1906194
| Summary: | OpenShift cluster on OpenStack lost api ingress after ~12 hours | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alex Krzos <akrzos> | ||||||
| Component: | Installer | Assignee: | Adolfo Duarte <adduarte> | ||||||
| Installer sub component: | OpenShift on OpenStack | QA Contact: | weiwei jiang <wjiang> | ||||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||||
| Severity: | low | ||||||||
| Priority: | medium | CC: | aos-bugs, dblack, egarcia, eparis, jokerman, mkarg, pprinett | ||||||
| Version: | 4.6 | ||||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.8.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-02-17 17:01:39 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Alex Krzos
2020-12-09 21:30:20 UTC
Assigning this to the Installer team, since the issue seems to be with OpenStack components being mis-configured or degraded after install. (In reply to egarcia from comment #2) > Assigning this to the Installer team, since the issue seems to be with > OpenStack components being mis-configured or degraded after install. This is reoccurring in our environment in which rebooting the master nodes and allowing some time for things to normalize will permit the api to work again. I believe this is an issue with "openshift-openstack-infra" namespace's keepalived pods. @alex Krzos. Has this defect been seen or reproduce in other systems (more than once). Also if so, also in the environment were this was seen and the system recover, was rebooting the system a requirement. In other words, will the system recover if left to normalized for a period of time *without* rebooting it. do you happen to have an estimate of how long the system takes to "normalized" before the api services start working correctly again. thanks Created attachment 1749824 [details]
Grafana network dashboard showing tcp retransmis rate out of all sent segments
Created attachment 1749865 [details]
master-0 haproxy
Closing as a duplicate, since we believe this is a symptom of https://bugzilla.redhat.com/show_bug.cgi?id=1915080 *** This bug has been marked as a duplicate of bug 1915080 *** |