1585062 – Nodes not ready while upgrade cri-o based environment

Bug 1585062 - Nodes not ready while upgrade cri-o based environment

Summary: Nodes not ready while upgrade cri-o based environment

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Scott Dodson
QA Contact:	Gan Huang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-01 08:39 UTC by Gan Huang
Modified:	2018-07-23 13:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-06-18 17:03:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Gan Huang 2018-06-01 08:39:58 UTC

Description of problem:
While preforming cri-o based installation upgrade, the upgrade failed due to nodes get not ready. Check the logs found that cri-o package not got updated at this point that caused the failure.

Jun 01 04:20:45 qe-ghuang-master-etcd-1 atomic-openshift-node[99814]: E0601 04:20:45.939342   99814 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
Jun 01 04:20:45 qe-ghuang-master-etcd-1 atomic-openshift-node[99814]: E0601 04:20:45.939431   99814 kuberuntime_manager.go:172] Get runtime version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
Jun 01 04:20:45 qe-ghuang-master-etcd-1 atomic-openshift-node[99814]: F0601 04:20:45.939454   99814 server.go:233] failed to run Kubelet: failed to create kubelet: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
Jun 01 04:20:45 qe-ghuang-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Jun 01 04:20:45 qe-ghuang-master-etcd-1 systemd[1]: Failed to start OpenShift Node.


Version-Release number of the following components:
openshift-ansible-3.10.0-0.56.0.git.0.b921fb9.el7.noarch.rpm

How reproducible:
always

Steps to Reproduce:
1. Spin up a 3.9 cluster with cri-o enabled
2. Upgrade to 3.10
3.

Actual results:
Failed at task:

TASK [openshift_node : Wait for node to be ready] ******************************

FAILED - RETRYING: Wait for node to be ready (36 retries left).

cri-o version was still for 3.9 at this moment
# crio -version
crio version 1.9.12

Expected results:

Additional info:
Once updated the cri-o package and restart cri-o service, node can be back to ready.

Comment 1 Scott Dodson 2018-06-01 17:46:59 UTC

I've reproduced this locally with a clean install. It looks to me like I had a systemd unit from a previous system container based install of crio lingering around. As soon as I removed that crio was started properly and the node started as well.

Comment 2 Scott Dodson 2018-06-01 19:30:00 UTC

We'll look at any need to potentially clean up a system container based crio but since it was never officially supported I'm not sure we should consider this a blocker.

Lets verify that you don't have system container leftovers.

rm /etc/systemd/system/cri-o.service && systemctl daemon-reload
atomic containers delete cri-o
reboot

then re-run the installer

Comment 3 Scott Dodson 2018-06-01 19:32:38 UTC

https://github.com/openshift/openshift-ansible/pull/8612 cleans up a few issues I ran into while testing crio and 3.10 installs. These problems were introduced very recently due to the oreg_url refactoring.

Comment 4 Gan Huang 2018-06-04 02:08:44 UTC

Scott, this is a rpm cri-o installation. We won't test cri-o system container installation as it was not officially supported.

The issue is that we should upgrade cri-o rpm package as long as node get updated as cri-o package isn't compatible across minor versions of kubelet.

Let me know if you still need something from me.

Comment 5 Scott Dodson 2018-06-04 20:05:55 UTC

https://github.com/openshift/openshift-ansible/pull/8628 to ensure that cri-o package is updated when upgrading the node

Comment 7 Gan Huang 2018-06-06 07:48:55 UTC

Verified in openshift-ansible-3.10.0-0.60.0

rpm cri-o package is updated successfully

Note You need to log in before you can comment on or make changes to this bug.