Kuryr components are often contacting the K8s API through a loadbalancer (e.g. Octavia LB in DevStack deployments, HAProxy in OpenShift) and we've often seen they're able to drop connections silently, effectively leaving our requests hanging forever. This got fixed in `K8sClient.watch` by setting a read timeout there which helped a lot, but we now seem to see it happening with other requests that doesn't have read timeout set.
Verified on 4.6.0-0.nightly-2020-10-02-001427 over OSP16.1 (RHOS-16.1-RHEL-8-20200917.n.3) with OVN-octavia provider and OSP13 (2020-09-16.1) with amphora provider. On OSP13, the installation works fine: (shiftstack) [stack@undercloud-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-10-02-001427 True False 13m Cluster version is 4.6.0-0.nightly-2020-10-02-001427 (shiftstack) [stack@undercloud-0 ~]$ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-2prfj 1/1 Running 1 45m kuryr-cni-4l5p9 1/1 Running 0 45m kuryr-cni-cgflf 1/1 Running 0 30m kuryr-cni-j28qm 1/1 Running 0 33m kuryr-cni-k68zp 1/1 Running 0 33m kuryr-cni-vtwg2 1/1 Running 1 45m kuryr-controller-9999f7ffd-ttsqm 1/1 Running 1 45m Timing during the installation: DEBUG Time elapsed per stage: DEBUG Infrastructure: 1m50s DEBUG Bootstrap Complete: 14m31s DEBUG API: 2m52s DEBUG Bootstrap Destroy: 47s DEBUG Cluster Operators: 23m26s INFO Time elapsed: 41m23s On OSP16.1, the installation also worked fine: (shiftstack) [stack@undercloud-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-10-02-001427 True False 78s Cluster version is 4.6.0-0.nightly-2020-10-02-001427 (shiftstack) [stack@undercloud-0 ~]$ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-89pm9 1/1 Running 1 17m kuryr-cni-jfltp 1/1 Running 4 43m kuryr-cni-k4j95 1/1 Running 0 43m kuryr-cni-l87vw 1/1 Running 0 17m kuryr-cni-zhvzc 1/1 Running 0 17m kuryr-cni-zpmfv 1/1 Running 0 43m kuryr-controller-775ff4bb-bgpml 1/1 Running 1 43m Timing during the installation: DEBUG Time elapsed per stage: DEBUG Infrastructure: 1m47s DEBUG Bootstrap Complete: 27m30s DEBUG API: 3m41s DEBUG Bootstrap Destroy: 39s DEBUG Cluster Operators: 24m28s INFO Time elapsed: 55m14s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196