Bug 1751756
| Summary: | [IPI][OSP][Kuryr] The installer times out when running in OSP 13 with Kuryr sdn | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jon Uriarte <juriarte> | |
| Component: | Networking | Assignee: | Maysa Macedo <mdemaced> | |
| Networking sub component: | kuryr | QA Contact: | Jon Uriarte <juriarte> | |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | ||
| Severity: | low | |||
| Priority: | unspecified | CC: | adahiya, asimonel, bbennett, benl, eduen, itbrown, ltomasbo, mdemaced, mdulko, oblaut, wking, wzheng, xtian | |
| Version: | 4.2.z | |||
| Target Milestone: | --- | |||
| Target Release: | 4.3.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1758669 (view as bug list) | Environment: | ||
| Last Closed: | 2020-01-09 08:16:13 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1783258 | |||
| Bug Blocks: | 1758669 | |||
|
Description
Jon Uriarte
2019-09-12 14:12:43 UTC
Based on previous conversations with the devs, this bug should not be considered a blocker as it's a matter of timing. The cluster initialization phase on the installer requires 30 min, and Kuryr might sometimes requires more. All the operators will eventually be ready and available. The environments used for development and QE have a single compute node to run all the VMs for masters, workers and load balancers, which leaded us to believe that installation with Kuryr could be failing around 50% of the times due to the heavy resource usage. After the addition of the two compute nodes and the recent addition of a health monitor to the API LB, we've observed that the time required for installation with Kuryr has decreased significantly when compared to our previous tries. We still need to run more times to validate if those were the only causes of the timeouts. Time required for the installation today and the image used: 38m10.150s - 4.2.0-0.nightly-2019-09-20-040328 36m55.677s. - latest release img 20/09 + Health Monitor 33m3.899s - latest release img 20/09 + Health Monitor After applying the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1736854 I'm having consistent success with just 1 compute node, taking around 43-48 minutes total time. Pull landed in 4.2 [1]: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.0-rc.1 | grep network-operator cluster-network-operator https://github.com/openshift/cluster-network-operator bf92b4ec6a0e1595e9d6fdbab32ed9266f46b3d8 $ git log --oneline bf92b4ec6a0e1595e9d6fdbab32ed9266f46b3d8 | grep bf92b4e bf92b4ec Merge pull request #233 from dulek/octavia-healthmonitor But with the blocking bug 1736854 MODIFIED (and lacking a Target Release), maybe it makes sense for this to still be POST? 4.2.0 GA is very close; can we punt this to 4.2.z or get bug 1736854 resolved? I dunno how to track whether bug 1736854 is MODIFIED or has since been sucked into 4.2 nightlies (or wherever it has to go to become ON_QA). The referenced pull request was for master. So I am making this be the master BZ and will clone this for 4.2.z. > The referenced pull request was for master.
But it landed in master before the 4.2/4.3 fork, so we only need to verify for 4.2.0.
You are completely right. I just realized that when I checked the code. Moving to ON_QA (Set to MODIFIED so ART picks it up for errata generation) 4.2.0-0.nightly-2019-10-02-122541 Moving back to ON_QA as it depends on the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1736854, which is still in MODIFIED status. Once that BZ is ON_QA we will be able to verify this one as well. The bug can't be verified because it depends on https://bugzilla.redhat.com/show_bug.cgi?id=1736854 which is not on ON_QA status. |