Bug 1929168

Summary: UPI installation with Kuryr timing out on bootstrap stage
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Maysa Macedo <mdemaced>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: high CC: ltomasbo, mbridges, mdulko, rlobillo, wking
Version: 4.7Keywords: Upgrades
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-16 08:42:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1927244    
Bug Blocks:    

Description OpenShift BugZilla Robot 2021-02-16 10:45:05 UTC
+++ This bug was initially created as a clone of Bug #1927244 +++

Description of problem:

Following this:
 
https://docs.openshift.com/container-platform/4.7/installing/installing_openstack/installing-openstack-user-kuryr.html#installation-osp-converting-ignition-resources_installing-openstack-user-kuryr

bootstrap-complete command is timing out:

INFO Waiting up to 20m0s for the Kubernetes API at https://api.ostest.shiftstack.com:6443...
INFO API v1.20.0+ba45583 up
INFO Waiting up to 30m0s for bootstrapping to complete...
ERROR Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.ostest.shiftstack.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 10.46.44.166:6443: connect: connection refused
INFO Use the following commands to gather logs from the cluster
INFO openshift-install gather bootstrap --help
FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition

The keepalived VIP is moved to master-2, but there are not any kube-api containers running there:

$ openstack port list | grep api
| a8cee914-c40d-4578-b781-99634aeb0ce4 | ostest-vmzfj-api-port                                | fa:16:3e:95:74:9d | ip_address='10.196.0.5', subnet_id='de581745-c45f-4a9c-8ee8-0cec3b8bacdb'     | DOWN   |

[core@ostest-vmzfj-master-2 ~]$ ip a | grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
    inet 10.196.3.104/16 brd 10.196.255.255 scope global dynamic noprefixroute ens3
    inet 10.196.0.5/32 scope global ens3
    inet6 fe80::1b90:80d6:f7e:dced/64 scope link noprefixroute 

[core@ostest-vmzfj-master-2 ~]$ sudo crictl ps 
CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                 ATTEMPT             POD ID
fb6dd5f32ed1f       5af7159d316af17f38072eef0e7745389989017725a8c320cbd168cfaefe070d                                                         2 minutes ago       Running             kuryr-cni            2                   615c95db5f094
b074bef3bc3c8       97c854b8868a24ef3e5a538145ecbecbba48ee6370be09ae164a3a35bef2932d                                                         22 minutes ago      Running             kube-multus          0                   4820cc531e120
02af4286eb352       0a0c7e16e7894a279f968f623f0f31d1280369bb72e29072292f56bf153d3be4                                                         25 minutes ago      Running             haproxy              1                   ca44cfcd9eacb
2dde7cbca8d75       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e123461c26c61423ad0d4b9e12f231f100369aadf3fdd1ba28aba211f4c222df   26 minutes ago      Running             mdns-publisher       0                   75b0f2690833c
9f3c567e64ee2       f513bff2bbca49470048b7f39d65544d8090270061c667bc3e1b3545863aa2c2                                                         26 minutes ago      Running             keepalived-monitor   0                   7191d9c878c03
11f673d4f3d5c       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:38787bc323485664a97880ab37d0d51cdc13d50df8ffd58fa95be8196a16b0d6   26 minutes ago      Running             keepalived           0                   7191d9c878c03
ea2d11b0bbff2       f513bff2bbca49470048b7f39d65544d8090270061c667bc3e1b3545863aa2c2                                                         26 minutes ago      Running             coredns-monitor      0                   3321725141fca
a9f1c88f94768       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d159c7e01d99c9ccaf26e1d997a5f56830b9e2e7b2799928b3b8663e04903d8   26 minutes ago      Running             coredns              0                   3321725141fca
8337c04ff07e8       f513bff2bbca49470048b7f39d65544d8090270061c667bc3e1b3545863aa2c2                                                         27 minutes ago      Running             haproxy-monitor      0                   ca44cfcd9eacb


Version-Release number of selected component (if applicable):

Observed on 4.7.0-0.nightly-2021-02-09-024347

The last successful UPI installation took place with 4.7.0-0.nightly-2021-01-27-110023 (https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_ci/job/DFG-osasinfra-shiftstack_ci-ocp_verification-osp16.1-ocp4.7-upi/4)

Furthermore, the installation is successful if OpenShiftSDN is configured.

How reproducible: Always

Steps to Reproduce: Run Kuryr CI job: 
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_ci/job/DFG-osasinfra-shiftstack_ci-ocp_verification-osp16.1-ocp4.7-upi/

Actual results: Installation failure.


Expected results: Successful installation.


Additional info: Attaching sosreport and OCP installation logs

--- Additional comment from rlobillo on 2021-02-10 13:11:08 UTC ---

sos-report: http://rhos-release.virt.bos.redhat.com/log/bz1927244/

--- Additional comment from rlobillo on 2021-02-10 13:29:34 UTC ---

Created attachment 1756207 [details]
openshift-installer log bundle

Comment 2 MichaƂ Dulko 2021-02-22 09:06:50 UTC
*** Bug 1931347 has been marked as a duplicate of this bug. ***

Comment 8 rlobillo 2021-03-09 09:14:43 UTC
Verified on OCP4.7.0-0.nightly-2021-03-06-183610.

- on top of OSP16.1 (RHOS-16.1-RHEL-8-20201214.n.3) with OVN-octavia. CI job installation + tests OK:

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_ci/job/DFG-osasinfra-shiftstack_ci-ocp_verification-osp16.1-ocp4.7-upi/12/

- on top of OSP13 (2021-01-20.1) with Amphora provider. CI job installation + tests OK:

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_ci/job/DFG-osasinfra-shiftstack_ci-ocp_verification-osp13-ocp4.7-upi/9/

Comment 10 errata-xmlrpc 2021-03-16 08:42:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.2 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0749