Description of problem: I've been using cri-o as the OCP runtime for a few releases now. In the last week or so, attempting to rebuild with crio fails. The nodes fail to start since the kubelet can not talk to a docker daemon. Version-Release number of selected component (if applicable): OCP: atomic-openshift-node-3.11.43-1.git.0.647ac05.el7.x86_64 openshift-ansible: commit 8ce8a45542ed29f0b325417a9aab1b673f33c2e1 (HEAD -> release-3.11, tag: openshift-ansible-3.11.52-1, origin/release-3.11) How reproducible: Always Steps to Reproduce: 1. Configure inventory file for cri-o: openshift_use_crio=True openshift_use_crio_only=True openshift_crio_enable_docker_gc=True openshift_crio_docker_gc_node_selector={'runtime': 'cri-o'} # add runtime="cri-o" to node labels 2. Run openshift-ansible/playbooks/deploy_cluster.yml Actual results: - Install fails with the following message: Failure summary: 1. Hosts: dc-ocp-m0.cloud.lab.eng.bos.redhat.com Play: Approve any pending CSR requests from inventory nodes Task: Approve node certificates when bootstrapping Message: Could not find csr for nodes: dc-ocp-n0.cloud.lab.eng.bos.redhat.com, dc-ocp-m1.cloud.lab.eng.bos.redhat.com, dc-ocp-n4.cloud.lab.eng.bos.redhat.com, dc-ocp-n3.cloud.lab.eng.bos.redhat.com, dc-ocp-n2.cloud.lab.eng.bos.redhat.com, dc-ocp-n1.cloud.lab.eng.bos.redhat.com, dc-ocp-m2.cloud.lab.eng.bos.redhat.com There are other BZs that mention this and hostname/hostname -f differences but that doesn't seem to be the case here. Actual error, from the node: Dec 06 21:43:01 dc-ocp-n0.cloud.lab.eng.bos.redhat.com atomic-openshift-node[55550]: E1206 21:43:01.918501 55550 kube_docker_client.go:91] failed to retrieve docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Dec 06 21:43:01 dc-ocp-n0.cloud.lab.eng.bos.redhat.com atomic-openshift-node[55550]: W1206 21:43:01.918539 55550 kube_docker_client.go:92] Using empty version for docker client, this may sometimes cause compatibility issue. Dec 06 21:43:01 dc-ocp-n0.cloud.lab.eng.bos.redhat.com atomic-openshift-node[55550]: F1206 21:43:01.918872 55550 server.go:262] failed to run Kubelet: failed to create kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Output of oc get nodes (master0 with docker, rest as cri-o???): # oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME dc-ocp-m0.cloud.lab.eng.bos.redhat.com Ready master 26m v1.11.0+d4cacc0 10.19.138.166 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 docker://1.13.1 dc-ocp-m1.cloud.lab.eng.bos.redhat.com NotReady master 26m v1.11.0+d4cacc0 10.19.138.167 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-m2.cloud.lab.eng.bos.redhat.com NotReady master 26m v1.11.0+d4cacc0 10.19.138.168 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n0.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 21m v1.11.0+d4cacc0 10.19.138.161 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n1.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 21m v1.11.0+d4cacc0 10.19.138.162 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n2.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 21m v1.11.0+d4cacc0 10.19.138.163 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n3.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 21m v1.11.0+d4cacc0 10.19.138.164 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n4.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 21m v1.11.0+d4cacc0 10.19.138.165 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 Since the master0 is reporting runtime as docker, OCP is trying to start up those pods with docker: # oc get pods -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE master-api-dc-ocp-m0.cloud.lab.eng.bos.redhat.com 0/1 CrashLoopBackOff 9 27m 10.19.138.166 dc-ocp-m0.cloud.lab.eng.bos.redhat.com <none> master-api-dc-ocp-m1.cloud.lab.eng.bos.redhat.com 1/1 Running 0 27m 10.19.138.167 dc-ocp-m1.cloud.lab.eng.bos.redhat.com <none> master-api-dc-ocp-m2.cloud.lab.eng.bos.redhat.com 1/1 Running 0 28m 10.19.138.168 dc-ocp-m2.cloud.lab.eng.bos.redhat.com <none> master-controllers-dc-ocp-m0.cloud.lab.eng.bos.redhat.com 0/1 CrashLoopBackOff 9 28m 10.19.138.166 dc-ocp-m0.cloud.lab.eng.bos.redhat.com <none> master-controllers-dc-ocp-m1.cloud.lab.eng.bos.redhat.com 1/1 Running 0 27m 10.19.138.167 dc-ocp-m1.cloud.lab.eng.bos.redhat.com <none> master-controllers-dc-ocp-m2.cloud.lab.eng.bos.redhat.com 1/1 Running 0 27m 10.19.138.168 dc-ocp-m2.cloud.lab.eng.bos.redhat.com <none> master-etcd-dc-ocp-m0.cloud.lab.eng.bos.redhat.com 0/1 CrashLoopBackOff 9 27m 10.19.138.166 dc-ocp-m0.cloud.lab.eng.bos.redhat.com <none> But they are already running under crio: crictl ps W1206 21:49:35.826966 14179 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT 6f7cbb0b5a1e0 901c817d48ccadd98b0bcd9f9d3f16738c8dbaee0e0a6d5fb85217a616493d4a 26 minutes ago Running sync 0 452da364d0521 e043f4037c7ff202ac1ae302bb4990d1f398f3a80f22ab02e3a13b389499f963 29 minutes ago Running api 0 ff9b3adc42728 e043f4037c7ff202ac1ae302bb4990d1f398f3a80f22ab02e3a13b389499f963 29 minutes ago Running controllers 0 50f0e74ff4ba4 635bb36d7fc7b0199d318dcb4fde1aaadf5654b9ad4f9a4a3a1c5fe94c23339f 29 minutes ago Running etcd 0 So they keep crashing: # oc logs master-api-dc-ocp-m0.cloud.lab.eng.bos.redhat.com -n kube-system | tail I1206 21:48:41.154781 1 plugins.go:84] Registered admission plugin "PodTolerationRestriction" I1206 21:48:41.154794 1 plugins.go:84] Registered admission plugin "ResourceQuota" I1206 21:48:41.154807 1 plugins.go:84] Registered admission plugin "PodSecurityPolicy" I1206 21:48:41.154819 1 plugins.go:84] Registered admission plugin "Priority" I1206 21:48:41.154842 1 plugins.go:84] Registered admission plugin "SecurityContextDeny" I1206 21:48:41.154859 1 plugins.go:84] Registered admission plugin "ServiceAccount" I1206 21:48:41.154871 1 plugins.go:84] Registered admission plugin "DefaultStorageClass" I1206 21:48:41.154885 1 plugins.go:84] Registered admission plugin "PersistentVolumeClaimResize" I1206 21:48:41.154896 1 plugins.go:84] Registered admission plugin "StorageObjectInUseProtection" F1206 21:48:41.155462 1 start_api.go:68] failed to create listener: failed to listen on 0.0.0.0:8443: listen tcp4 0.0.0.0:8443: bind: address already in use Expected results: - OCP/the kubelet should not be attempting to talk to docker in a cri-o environment. - All nodes should be using the cri-o runtime.
I think this is kinds of side effect when fixing https://bugzilla.redhat.com/show_bug.cgi?id=1647516, QE also opened doc bug - https://bugzilla.redhat.com/show_bug.cgi?id=1656359 to request doc update.
More information might be needed but taking a guess. 1. With an OpenShift using crio, install docker will still get installed/or should be installed still. Docker is not used as the runtime it is only used if container build need to happen on that node. 2. If oc get node -o wide shows the run time as docker likely the node is not using a node-config that has crio configured as the runtime. https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node_group/templates/node-config.yaml.j2#L22-L31 3. Now in openshift nodes get their node-config.yaml from a configmap based on the group that belong to. A label on the node sets the grou. # oc get nodes --show-labels To see the config the node will use run the following # oc get cm -n openshift-node <GROUP_NAME> -o yaml More the likely the 1st master belongs to the wrong nodegroup and using a node-config that does not have crio set as the run time and it defaults back to using docker which is installed. dc-ocp-m0.cloud.lab.eng.bos.redhat.com Ready master 26m v1.11.0+d4cacc0 10.19.138.166 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 docker://1.13.1 Next Steps: - Confirm the node-group that this host belongs too. - Make sure that node-group has its config set to use crio. If not change the group this node belongs to by changing the label. - Locally on the node check both configs /etc/origin/node/{bootstrap-,}node-config.yaml and make sure they are correct with crio configured - /etc/origin/node/node-config.yaml will get replaced by the node-sync pod based on the configmap that is linked to the nodegroup this host belongs too.
Here's some more output.... # oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS dc-ocp-m0.cloud.lab.eng.bos.redhat.com Ready master 39m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-m0.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/master=true dc-ocp-m1.cloud.lab.eng.bos.redhat.com NotReady master 39m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-m1.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/master=true dc-ocp-m2.cloud.lab.eng.bos.redhat.com NotReady master 39m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-m2.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/master=true dc-ocp-n0.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 34m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-n0.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true,node-role.kubernetes.io/kubevirt=true dc-ocp-n1.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 34m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-n1.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true,node-role.kubernetes.io/kubevirt=true dc-ocp-n2.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 34m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-n2.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true,node-role.kubernetes.io/kubevirt=true dc-ocp-n3.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 34m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-n3.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true,node-role.kubernetes.io/kubevirt=true dc-ocp-n4.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 34m v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=dc-ocp-n4.cloud.lab.eng.bos.redhat.com,node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true,node-role.kubernetes.io/kubevirt=true # oc get cm -n openshift-node node-config-master -o yaml apiVersion: v1 data: node-config.yaml: | apiVersion: v1 authConfig: authenticationCacheSize: 1000 authenticationCacheTTL: 5m authorizationCacheSize: 1000 authorizationCacheTTL: 5m dnsBindAddress: 127.0.0.1:53 dnsDomain: cluster.local dnsIP: 0.0.0.0 dnsNameservers: null dnsRecursiveResolvConf: /etc/origin/node/resolv.conf dockerConfig: dockerShimRootDirectory: /var/lib/dockershim dockerShimSocket: /var/run/dockershim.sock execHandlerName: native enableUnidling: true imageConfig: format: registry.redhat.io/openshift3/ose-${component}:${version} latest: false iptablesSyncPeriod: 30s kind: NodeConfig kubeletArguments: bootstrap-kubeconfig: - /etc/origin/node/bootstrap.kubeconfig cert-dir: - /etc/origin/node/certificates enable-controller-attach-detach: - 'true' feature-gates: - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true node-labels: - node-role.kubernetes.io/master=true pod-manifest-path: - /etc/origin/node/pods rotate-certificates: - 'true' masterClientConnectionOverrides: acceptContentTypes: application/vnd.kubernetes.protobuf,application/json burst: 40 contentType: application/vnd.kubernetes.protobuf qps: 20 masterKubeConfig: node.kubeconfig networkConfig: mtu: 1450 networkPluginName: redhat/openshift-ovs-subnet proxyArguments: cluster-cidr: - 10.128.0.0/14 servingInfo: bindAddress: 0.0.0.0:10250 bindNetwork: tcp4 clientCA: client-ca.crt volumeConfig: localQuota: perFSGroup: null volumeDirectory: /var/lib/origin/openshift.local.volumes kind: ConfigMap metadata: creationTimestamp: 2018-12-07T19:41:11Z name: node-config-master namespace: openshift-node resourceVersion: "1249" selfLink: /api/v1/namespaces/openshift-node/configmaps/node-config-master uid: 0d064984-fa58-11e8-ac5d-beeffeed0062 # oc get cm -n openshift-node node-config-infra -o yaml apiVersion: v1 data: node-config.yaml: | apiVersion: v1 authConfig: authenticationCacheSize: 1000 authenticationCacheTTL: 5m authorizationCacheSize: 1000 authorizationCacheTTL: 5m dnsBindAddress: 127.0.0.1:53 dnsDomain: cluster.local dnsIP: 0.0.0.0 dnsNameservers: null dnsRecursiveResolvConf: /etc/origin/node/resolv.conf dockerConfig: dockerShimRootDirectory: /var/lib/dockershim dockerShimSocket: /var/run/dockershim.sock execHandlerName: native enableUnidling: true imageConfig: format: registry.redhat.io/openshift3/ose-${component}:${version} latest: false iptablesSyncPeriod: 30s kind: NodeConfig kubeletArguments: bootstrap-kubeconfig: - /etc/origin/node/bootstrap.kubeconfig cert-dir: - /etc/origin/node/certificates enable-controller-attach-detach: - 'true' feature-gates: - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true node-labels: - node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true pod-manifest-path: - /etc/origin/node/pods rotate-certificates: - 'true' masterClientConnectionOverrides: acceptContentTypes: application/vnd.kubernetes.protobuf,application/json burst: 40 contentType: application/vnd.kubernetes.protobuf qps: 20 masterKubeConfig: node.kubeconfig networkConfig: mtu: 1450 networkPluginName: redhat/openshift-ovs-subnet proxyArguments: cluster-cidr: - 10.128.0.0/14 servingInfo: bindAddress: 0.0.0.0:10250 bindNetwork: tcp4 clientCA: client-ca.crt volumeConfig: localQuota: perFSGroup: null volumeDirectory: /var/lib/origin/openshift.local.volumes kind: ConfigMap metadata: creationTimestamp: 2018-12-07T19:41:16Z name: node-config-infra namespace: openshift-node resourceVersion: "1259" selfLink: /api/v1/namespaces/openshift-node/configmaps/node-config-infra uid: 0fe61bd6-fa58-11e8-ac5d-beeffeed0062 # oc get cm -n openshift-node node-config-infra-compute -o yaml apiVersion: v1 data: node-config.yaml: | apiVersion: v1 authConfig: authenticationCacheSize: 1000 authenticationCacheTTL: 5m authorizationCacheSize: 1000 authorizationCacheTTL: 5m dnsBindAddress: 127.0.0.1:53 dnsDomain: cluster.local dnsIP: 0.0.0.0 dnsNameservers: null dnsRecursiveResolvConf: /etc/origin/node/resolv.conf dockerConfig: dockerShimRootDirectory: /var/lib/dockershim dockerShimSocket: /var/run/dockershim.sock execHandlerName: native enableUnidling: true imageConfig: format: registry.redhat.io/openshift3/ose-${component}:${version} latest: false iptablesSyncPeriod: 30s kind: NodeConfig kubeletArguments: bootstrap-kubeconfig: - /etc/origin/node/bootstrap.kubeconfig cert-dir: - /etc/origin/node/certificates enable-controller-attach-detach: - 'true' feature-gates: - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true node-labels: - node-role.kubernetes.io/compute=true,node-role.kubernetes.io/infra=true,node-role.kubernetes.io/kubevirt=true pod-manifest-path: - /etc/origin/node/pods rotate-certificates: - 'true' masterClientConnectionOverrides: acceptContentTypes: application/vnd.kubernetes.protobuf,application/json burst: 40 contentType: application/vnd.kubernetes.protobuf qps: 20 masterKubeConfig: node.kubeconfig networkConfig: mtu: 1450 networkPluginName: redhat/openshift-ovs-subnet proxyArguments: cluster-cidr: - 10.128.0.0/14 servingInfo: bindAddress: 0.0.0.0:10250 bindNetwork: tcp4 clientCA: client-ca.crt volumeConfig: localQuota: perFSGroup: null volumeDirectory: /var/lib/origin/openshift.local.volumes kind: ConfigMap metadata: creationTimestamp: 2018-12-07T19:41:21Z name: node-config-infra-compute namespace: openshift-node resourceVersion: "1269" selfLink: /api/v1/namespaces/openshift-node/configmaps/node-config-infra-compute uid: 12b2c79f-fa58-11e8-ac5d-beeffeed0062 Here's the relevant bits from my inventory: openshift_use_crio=True openshift_use_crio_only=True openshift_crio_enable_docker_gc=True openshift_crio_docker_gc_node_selector={'runtime': 'cri-o'} openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, { 'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/compute=true', 'node-role.kubernetes.io/infra=true']}, { 'name': 'node-config-infra-compute', 'labels': ['node-role.kubernetes.io/compute=true', 'node-role.kubernetes.io/infra=true', 'node-role.kubernetes.io/kubevirt=true']}] <...snip...> [nodes] dc-ocp-m0.cloud.lab.eng.bos.redhat.com runtime="cri-o" dc-ocp-m1.cloud.lab.eng.bos.redhat.com runtime="cri-o" dc-ocp-m2.cloud.lab.eng.bos.redhat.com runtime="cri-o" dc-ocp-n0.cloud.lab.eng.bos.redhat.com openshift_node_group_name="node-config-infra-compute" runtime="cri-o" dc-ocp-n1.cloud.lab.eng.bos.redhat.com openshift_node_group_name="node-config-infra-compute" runtime="cri-o" dc-ocp-n2.cloud.lab.eng.bos.redhat.com openshift_node_group_name="node-config-infra-compute" runtime="cri-o" dc-ocp-n3.cloud.lab.eng.bos.redhat.com openshift_node_group_name="node-config-infra-compute" runtime="cri-o" dc-ocp-n4.cloud.lab.eng.bos.redhat.com openshift_node_group_name="node-config-infra-compute" runtime="cri-o" It seems like the bootstrap node config is configured for crio, but the actual node config isnt: # grep -ri kubeletArguments /etc/origin/node/* -A10 /etc/origin/node/bootstrap-node-config.yaml:kubeletArguments: /etc/origin/node/bootstrap-node-config.yaml- bootstrap-kubeconfig: /etc/origin/node/bootstrap-node-config.yaml- - /etc/origin/node/bootstrap.kubeconfig /etc/origin/node/bootstrap-node-config.yaml- cert-dir: /etc/origin/node/bootstrap-node-config.yaml- - /etc/origin/node/certificates /etc/origin/node/bootstrap-node-config.yaml- container-runtime: /etc/origin/node/bootstrap-node-config.yaml- - remote /etc/origin/node/bootstrap-node-config.yaml- container-runtime-endpoint: /etc/origin/node/bootstrap-node-config.yaml- - /var/run/crio/crio.sock /etc/origin/node/bootstrap-node-config.yaml- enable-controller-attach-detach: /etc/origin/node/bootstrap-node-config.yaml- - 'true' -- /etc/origin/node/node-config.yaml:kubeletArguments: /etc/origin/node/node-config.yaml- bootstrap-kubeconfig: /etc/origin/node/node-config.yaml- - /etc/origin/node/bootstrap.kubeconfig /etc/origin/node/node-config.yaml- cert-dir: /etc/origin/node/node-config.yaml- - /etc/origin/node/certificates /etc/origin/node/node-config.yaml- enable-controller-attach-detach: /etc/origin/node/node-config.yaml- - 'true' /etc/origin/node/node-config.yaml- feature-gates: /etc/origin/node/node-config.yaml- - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true /etc/origin/node/node-config.yaml- node-labels: /etc/origin/node/node-config.yaml- - node-role.kubernetes.io/master=true -- /etc/origin/node/tmp/node-config.yaml:kubeletArguments: /etc/origin/node/tmp/node-config.yaml- bootstrap-kubeconfig: /etc/origin/node/tmp/node-config.yaml- - /etc/origin/node/bootstrap.kubeconfig /etc/origin/node/tmp/node-config.yaml- cert-dir: /etc/origin/node/tmp/node-config.yaml- - /etc/origin/node/certificates /etc/origin/node/tmp/node-config.yaml- enable-controller-attach-detach: /etc/origin/node/tmp/node-config.yaml- - 'true' /etc/origin/node/tmp/node-config.yaml- feature-gates: /etc/origin/node/tmp/node-config.yaml- - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true /etc/origin/node/tmp/node-config.yaml- node-labels: /etc/origin/node/tmp/node-config.yaml- - node-role.kubernetes.io/master=true I captured oc get nodes a couple of times during the ansible run to see how things change. master0 flips from crio to docker at some point: # cat oc.get_nodes Fri Dec 7 19:42:10 UTC 2018 NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME dc-ocp-m0.cloud.lab.eng.bos.redhat.com Ready master 3m v1.11.0+d4cacc0 10.19.138.166 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-m1.cloud.lab.eng.bos.redhat.com Ready master 3m v1.11.0+d4cacc0 10.19.138.167 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-m2.cloud.lab.eng.bos.redhat.com Ready master 3m v1.11.0+d4cacc0 10.19.138.168 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 Fri Dec 7 19:42:35 UTC 2018 NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME dc-ocp-m0.cloud.lab.eng.bos.redhat.com NotReady master 3m v1.11.0+d4cacc0 10.19.138.166 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-m1.cloud.lab.eng.bos.redhat.com NotReady master 3m v1.11.0+d4cacc0 10.19.138.167 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-m2.cloud.lab.eng.bos.redhat.com NotReady master 3m v1.11.0+d4cacc0 10.19.138.168 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 Fri Dec 7 19:44:57 UTC 2018 NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME dc-ocp-m0.cloud.lab.eng.bos.redhat.com Ready master 5m v1.11.0+d4cacc0 10.19.138.166 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 docker://1.13.1 dc-ocp-m1.cloud.lab.eng.bos.redhat.com NotReady master 5m v1.11.0+d4cacc0 10.19.138.167 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-m2.cloud.lab.eng.bos.redhat.com NotReady master 5m v1.11.0+d4cacc0 10.19.138.168 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n0.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 24s v1.11.0+d4cacc0 10.19.138.161 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n1.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 23s v1.11.0+d4cacc0 10.19.138.162 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n2.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 23s v1.11.0+d4cacc0 10.19.138.163 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n3.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 23s v1.11.0+d4cacc0 10.19.138.164 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 dc-ocp-n4.cloud.lab.eng.bos.redhat.com NotReady compute,infra,kubevirt 23s v1.11.0+d4cacc0 10.19.138.165 <none> OpenShift 3.10.0-957.1.3.el7.x86_64 cri-o://1.11.8 If I omit the `openshift_use_crio_only=True` part, then they all end up running docker, with the master/api pods crashing as before since they are started w/ crio.
I am pretty sure you are hitting the issue what I said in comment 1. @Russell, this issue is caused by https://github.com/openshift/openshift-ansible/pull/10645 when fixing BZ#1647516. Do you think installer should update user customized node config automatically based on openshift_use_crio setting?
(In reply to Johnny Liu from comment #4) > I am pretty sure you are hitting the issue what I said in comment 1. > > @Russell, this issue is caused by > https://github.com/openshift/openshift-ansible/pull/10645 > when fixing BZ#1647516. Do you think installer should update user > customized node config automatically based on openshift_use_crio setting? Confirmed. I reverted to the previous commit of the node-config.yaml.j2 template, and the install proceeded as expected.
I think you need to assign nodes you intend to use crio on to one of the crio node groups or when crafting your own you need to make sure that the relevant edits are applied to the kubelet config so that it uses a remote runtime and the socket is provided.
@Johnny, You are correct. This is a side effect of fixing BZ#1647516. Previously, if a cluster was deployed and the first master had openshift_use_crio=True, ALL configmaps were created with crio settings regardless if those hosts were supposed to use crio. In order to use crio, you must specify a node config which has crio edits. We have default groups available, node-config-master-crio, node-config-infra-crio, node-config-compute-crio, node-config-master-infra-crio, node-config-all-in-one-crio. https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_facts/defaults/main.yml#L144-L209 This is a docs issue as we've required proper use of node configs for a while. The bug mentioned above just allowed a loophole for not using the right node config. Please assign crio node configs to your hosts and redeploy to confirm this fixes your issue.
I can confirm that fixes my issue. I updated my inventory so the openshit_node_groups are like so: openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true'], 'edits': '{{ openshift_node_group_edits_crio }}'}, { 'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/compute=true', 'node-role.kubernetes.io/infra=true'] , 'edits': '{{ openshift_node_group_edits_crio }}'}, { 'name': 'node-config-infra-compute', 'labels': ['node-role.kubernetes.io/compute=true', 'node-role.kubernetes.io/infra=true', 'node-role.kubernetes.io/kubevirt=true'], 'edits': '{{ openshift_node_group_edits_crio }}'}] And the deploy works from latest git release-3.11 w/o modifications. Thanks!