New bug for further investigation of the issue on atomic host.
Fixed the ASB issue. Now I see the issue reported. Investigating.
QE, there's suspicion that this may be related to a bug in the container runtime, does this problem exist in the latest versions of Atomic Host?
(In reply to Scott Dodson from comment #16) > QE, there's suspicion that this may be related to a bug in the container > runtime, does this problem exist in the latest versions of Atomic Host? @wmeng, pls help have one more check on this.
latest Atomic Host, meet this issue, too openshift-ansible-3.10.165-1.git.0.5ef95e3.el7 Red Hat Enterprise Linux Atomic Host 7.7.0 Linux 3.10.0-1062.el7.x86_64 docker-1.13.1-103.git7f2769b.el7.x86_64 when upgrade failed, # oc get nodes NAME STATUS ROLES AGE VERSION wmengug4ah770-master-etcd-zone1-1 Ready master 15h v1.10.0+b81c8f8 wmengug4ah770-master-etcd-zone2-1 Ready master 15h v1.10.0+b81c8f8 wmengug4ah770-master-etcd-zone2-2 Ready master 15h v1.10.0+b81c8f8 wmengug4ah770-node-zone1-primary-1 Ready compute 15h v1.9.1+a0ce1bc657 wmengug4ah770-node-zone2-primary-1 Ready compute 15h v1.9.1+a0ce1bc657 wmengug4ah770-nrriz-1 NotReady,SchedulingDisabled infra 15h v1.10.0+b81c8f8 wmengug4ah770-nrriz-2 Ready <none> 15h v1.9.1+a0ce1bc657 upgrade log: https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Run-Ansible-Playbooks-Nextge/470/consoleFull
Created attachment 1608562 [details] Node NotReady atomic-openshift-node logs ec2-23-20-104-227.compute-1.amazonaws.com
2349 Aug 26 21:52:42 ip-172-18-10-19.ec2.internal atomic-openshift-node[432]: I0826 21:52:42.130753 444 container_manager_linux.go:266] Creating device plugin manager: true 2350 Aug 26 21:52:42 ip-172-18-10-19.ec2.internal atomic-openshift-node[432]: I0826 21:52:42.130766 444 manager.go:102] Creating Device Plugin manager at /var/lib/kubelet/device-plugins/kubelet.sock [root@ip-172-18-10-19 ~]# ls -alh /var/lib/kubelet/device-plugins/ total 0 drwxr-xr-x. 2 root root 6 Aug 26 21:52 . drwxr-x---. 3 root root 28 Aug 26 21:52 ..
This was root caused to be the same as Bug 1508040. The suggested work around is to reboot the affected node and restart the upgrade.