Bug 1848264
| Summary: | Upgrade test suite fails on ppc64le environment | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Basheer <bkhadars> | ||||
| Component: | Networking | Assignee: | Rafael Fonseca <rdossant> | ||||
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> | ||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | unspecified | CC: | adahiya, amcdermo, aos-bugs, bleanhar, danili, dslavens, mkumatag, mmasters, obulatov, rdossant, smilner, wking | ||||
| Version: | 4.4 | Keywords: | UpcomingSprint | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | ppc64le | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | multi-arch | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-01-21 14:35:37 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Basheer
2020-06-18 06:17:30 UTC
I'm wondering if we have any tool or guide to debug what is going on especially why frontend services are down more than the threshold set.? Frontends in upgrade tests usually refer to the ingress controller so moving to the Network Edge team Target reset to 4.7 while investigation is either ongoing or not yet started. Will be considered for earlier release versions when diagnosed and resolved. Upgrade from 4.2.36 to 4.3 hit the same problem: Frontends were unreachable during disruption for at least 14m51s of 48m26s (31%) Logs and artefacts are available at https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1302866183778734080 Dan Li, this report is assigned to the "Routing" component, which is the responsibility of the Network Edge team. However, I see that you've been changing the assignment of this report among several people who are not on the Network Edge team. Are you expecting the Network Edge team to take action on this report, or is this issue being handled by the multi-arch folks? Hi Miciah, at the moment this bug is assigned under Rafael Dos Santos, our Multi-Arch CI engineer and henceforth I believe it should be handled by the multi-arch team (since this bug is reported by our IBM partner engineer) Setting to assigned re: comment #10. Adding "UpcomingSprint" as team will not have bandwidth to look at this bug during this sprint We were able to reproduce this with a 4.7 nightly image on ppc64le but not on s390x. The difference between the 2 arches was how the cluster was configured: in the s390x case, there is a load balancer configured whereas there is none for ppc64le. So what happens is that the "frontend" operators are hard-coded to a specific worker and when that worker is being upgraded, it's unavailable beyond the 20% threshold. Basheer, can you confirm if that's the case in your setup? Closing. Re-open in case the solution from the last comment doesn't work. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |