Created attachment 1487703 [details] upgrade log with inventory file embeded Description of problem: Version-Release number of the following components: openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch atomic-openshift-3.11.16-1.git.0.b48b8f8.el7.x86_64 cri-o-1.11.5-2.rhaos3.11.git1c8a4b1.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Install a 3.10 cluster with crio runtime enabled 2. Trigger upgrade 3. Actual results: Upgrade failed. TASK [openshift_node : Approve node certificates when bootstrapping] *********** <--snip--> FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left). fatal: [qe-jialiu3101-master-etcd-1.0927-nqc.qe.rhcloud.com -> qe-jialiu3101-master-etcd-1.0927-nqc.qe.rhcloud.com]: FAILED! => {"all_subjects_found": [], "attempts": 30, "changed": false, "client_approve_results": [], "client_csrs": null, "msg": "The connection to the server qe-jialiu3101-master-etcd-1:8443 was refused - did you specify the right host or port?\n", "oc_get_nodes": null, "rc": 0, "server_approve_results": [], "server_csrs": null, "state": "unknown", "unwanted_csrs": []} Check api log: [root@qe-jialiu3101-master-etcd-1 ~]# crictl logs b4200a17855cc <--snip--> I0927 08:21:14.253799 1 master_config.go:414] Initializing cache sizes based on 0MB limit I0927 08:21:14.253907 1 master_config.go:476] Using the lease endpoint reconciler with TTL=15s and interval=10s I0927 08:21:14.253955 1 storage_factory.go:285] storing { apiServerIPInfo} in v1, reading as __internal from storagebackend.Config{Type:"etcd3", Prefix:"kubernetes.io", ServerList:[]string{"https://qe-jialiu3101-master-etcd-1:2379"}, KeyFile:"/etc/origin/master/master.etcd-client.key", CertFile:"/etc/origin/master/master.etcd-client.crt", CAFile:"/etc/origin/master/master.etcd-ca.crt", Quorum:true, Paging:true, DeserializationCacheSize:0, Codec:runtime.Codec(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000} F0927 08:21:24.255210 1 start_api.go:68] context deadline exceeded Restart dnsmasq would bring etcd connection back. Expected results: upgrade is passed. Additional info: This bug is really similar to https://bugzilla.redhat.com/show_bug.cgi?id=1623145#c7 and https://bugzilla.redhat.com/show_bug.cgi?id=1624448, so I also tried a upgrade without crio runtime, upgrade is completed successfully. QE have ever run upgrade crio cluster upgrade successfully some days ago, I can not remember exact version info now.
Re-test this bug with openshift-ansible-3.11.39-1.git.0.fe42b3b.el7.noarch, still reproduce. upgrade log is attached.
Created attachment 1501783 [details] upgrade log with inventory file embeded
Possibly related, given the upgrade is being performed on OpenStack: https://bugzilla.redhat.com/show_bug.cgi?id=1661232
Can this be tested again with the information provided in https://bugzilla.redhat.com/show_bug.cgi?id=1661232#c6?
Fixed. openshift-ansible-3.11.82-1.git.0.f29227a.el7.noarch upgrade success on openstack v10 and v13, AWS and GCE
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0407