Bug 1633570 - cluster with crio runtime upgrade failed at [openshift_node : Approve node certificates when bootstrapping] due to missing connection to etcd hostname.
Summary: cluster with crio runtime upgrade failed at [openshift_node : Approve node ce...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-27 10:10 UTC by Johnny Liu
Modified: 2019-03-14 02:18 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-03-14 02:17:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
upgrade log with inventory file embeded (1.49 MB, text/plain)
2018-09-27 10:10 UTC, Johnny Liu
no flags Details
upgrade log with inventory file embeded (6.36 MB, text/plain)
2018-11-05 11:00 UTC, Johnny Liu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0407 0 None None None 2019-03-14 02:18:07 UTC

Description Johnny Liu 2018-09-27 10:10:59 UTC
Created attachment 1487703 [details]
upgrade log with inventory file embeded

Description of problem:


Version-Release number of the following components:
openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch
atomic-openshift-3.11.16-1.git.0.b48b8f8.el7.x86_64
cri-o-1.11.5-2.rhaos3.11.git1c8a4b1.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install a 3.10 cluster with crio runtime enabled
2. Trigger upgrade
3.

Actual results:
Upgrade failed.
TASK [openshift_node : Approve node certificates when bootstrapping] ***********
<--snip-->
FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left).

fatal: [qe-jialiu3101-master-etcd-1.0927-nqc.qe.rhcloud.com -> qe-jialiu3101-master-etcd-1.0927-nqc.qe.rhcloud.com]: FAILED! => {"all_subjects_found": [], "attempts": 30, "changed": false, "client_approve_results": [], "client_csrs": null, "msg": "The connection to the server qe-jialiu3101-master-etcd-1:8443 was refused - did you specify the right host or port?\n", "oc_get_nodes": null, "rc": 0, "server_approve_results": [], "server_csrs": null, "state": "unknown", "unwanted_csrs": []}

Check api log:
[root@qe-jialiu3101-master-etcd-1 ~]# crictl logs b4200a17855cc
<--snip-->
I0927 08:21:14.253799       1 master_config.go:414] Initializing cache sizes based on 0MB limit
I0927 08:21:14.253907       1 master_config.go:476] Using the lease endpoint reconciler with TTL=15s and interval=10s
I0927 08:21:14.253955       1 storage_factory.go:285] storing { apiServerIPInfo} in v1, reading as __internal from storagebackend.Config{Type:"etcd3", Prefix:"kubernetes.io", ServerList:[]string{"https://qe-jialiu3101-master-etcd-1:2379"}, KeyFile:"/etc/origin/master/master.etcd-client.key", CertFile:"/etc/origin/master/master.etcd-client.crt", CAFile:"/etc/origin/master/master.etcd-ca.crt", Quorum:true, Paging:true, DeserializationCacheSize:0, Codec:runtime.Codec(nil), Transformer:value.Transformer(nil), CompactionInterval:300000000000, CountMetricPollPeriod:60000000000}
F0927 08:21:24.255210       1 start_api.go:68] context deadline exceeded


Restart dnsmasq would bring etcd connection back.

Expected results:
upgrade is passed.

Additional info:
This bug is really similar to https://bugzilla.redhat.com/show_bug.cgi?id=1623145#c7 and https://bugzilla.redhat.com/show_bug.cgi?id=1624448, so I also tried a upgrade without crio runtime, upgrade is completed successfully.

QE have ever run upgrade crio cluster upgrade successfully some days ago, I can not remember exact version info now.

Comment 10 Johnny Liu 2018-11-05 10:58:36 UTC
Re-test this bug with openshift-ansible-3.11.39-1.git.0.fe42b3b.el7.noarch, still reproduce.

upgrade log is attached.

Comment 11 Johnny Liu 2018-11-05 11:00:08 UTC
Created attachment 1501783 [details]
upgrade log with inventory file embeded

Comment 19 Russell Teague 2018-12-20 19:40:49 UTC
Possibly related, given the upgrade is being performed on OpenStack: https://bugzilla.redhat.com/show_bug.cgi?id=1661232

Comment 20 Russell Teague 2019-01-30 16:50:12 UTC
Can this be tested again with the information provided in https://bugzilla.redhat.com/show_bug.cgi?id=1661232#c6?

Comment 21 Weihua Meng 2019-02-12 10:25:17 UTC
Fixed.

openshift-ansible-3.11.82-1.git.0.f29227a.el7.noarch

upgrade success on openstack v10 and v13, AWS and GCE

Comment 23 errata-xmlrpc 2019-03-14 02:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0407


Note You need to log in before you can comment on or make changes to this bug.