Description of problem: While preforming cri-o based installation upgrade, the upgrade failed due to nodes get not ready. Check the logs found that cri-o package not got updated at this point that caused the failure. Jun 01 04:20:45 qe-ghuang-master-etcd-1 atomic-openshift-node[99814]: E0601 04:20:45.939342 99814 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService Jun 01 04:20:45 qe-ghuang-master-etcd-1 atomic-openshift-node[99814]: E0601 04:20:45.939431 99814 kuberuntime_manager.go:172] Get runtime version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService Jun 01 04:20:45 qe-ghuang-master-etcd-1 atomic-openshift-node[99814]: F0601 04:20:45.939454 99814 server.go:233] failed to run Kubelet: failed to create kubelet: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService Jun 01 04:20:45 qe-ghuang-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Jun 01 04:20:45 qe-ghuang-master-etcd-1 systemd[1]: Failed to start OpenShift Node. Version-Release number of the following components: openshift-ansible-3.10.0-0.56.0.git.0.b921fb9.el7.noarch.rpm How reproducible: always Steps to Reproduce: 1. Spin up a 3.9 cluster with cri-o enabled 2. Upgrade to 3.10 3. Actual results: Failed at task: TASK [openshift_node : Wait for node to be ready] ****************************** FAILED - RETRYING: Wait for node to be ready (36 retries left). cri-o version was still for 3.9 at this moment # crio -version crio version 1.9.12 Expected results: Additional info: Once updated the cri-o package and restart cri-o service, node can be back to ready.
I've reproduced this locally with a clean install. It looks to me like I had a systemd unit from a previous system container based install of crio lingering around. As soon as I removed that crio was started properly and the node started as well.
We'll look at any need to potentially clean up a system container based crio but since it was never officially supported I'm not sure we should consider this a blocker. Lets verify that you don't have system container leftovers. rm /etc/systemd/system/cri-o.service && systemctl daemon-reload atomic containers delete cri-o reboot then re-run the installer
https://github.com/openshift/openshift-ansible/pull/8612 cleans up a few issues I ran into while testing crio and 3.10 installs. These problems were introduced very recently due to the oreg_url refactoring.
Scott, this is a rpm cri-o installation. We won't test cri-o system container installation as it was not officially supported. The issue is that we should upgrade cri-o rpm package as long as node get updated as cri-o package isn't compatible across minor versions of kubelet. Let me know if you still need something from me.
https://github.com/openshift/openshift-ansible/pull/8628 to ensure that cri-o package is updated when upgrading the node
Verified in openshift-ansible-3.10.0-0.60.0 rpm cri-o package is updated successfully