+++ This bug was initially created as a clone of Bug #1647516 +++ Description of problem: Using openstack playbooks I'm trying to have an environment such as: * Masters with cri-o only * App nodes with cri-o only * Infra nodes with docker only So I have: $ grep osm_use_cockpit all.yml osm_use_cockpit: false $ cat inventory/group_vars/masters.yml openshift_use_crio_only: true openshift_use_crio: true openshift_node_group_name: node-config-master-crio $ cat inventory/group_vars/openstack_compute_nodes.yml openshift_use_crio_only: true openshift_use_crio: true openshift_node_group_name: node-config-compute-crio $ cat inventory/group_vars/openstack_infra_nodes.yml openshift_use_crio_only: false openshift_use_crio: false openshift_node_group_name: node-config-infra Also, I need to 'patch' the roles/openshift_node/defaults/main.yml until this is released https://github.com/openshift/openshift-ansible/pull/10501/files The installer skips docker installation in cri-o nodes but when creating the node-config.yaml file for infra nodes, it edits the node-config-infra to add the cri-o settings even if it shouldn't. Logs: ----8<---- 2018-11-07 06:48:33,684 p=18330 u=cloud-user | TASK [openshift_node_group : fetch node configmap] ***************************** 2018-11-07 06:48:33,684 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:33 -0500 (0:00:00.071) 1:14:20.993 **** 2018-11-07 06:48:35,090 p=18330 u=cloud-user | ok: [master-0.shiftstack.automated.lan] 2018-11-07 06:48:35,111 p=18330 u=cloud-user | TASK [openshift_node_group : debug node config] ******************************** 2018-11-07 06:48:35,111 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:35 -0500 (0:00:01.427) 1:14:22.421 **** 2018-11-07 06:48:35,155 p=18330 u=cloud-user | ok: [master-0.shiftstack.automated.lan] => { "configout": { "changed": false, "failed": false, "results": { "cmd": "/bin/oc get configmap node-config-infra -o json -n openshift-node", "results": [ {} ], "returncode": 0, "stderr": "Error from server (NotFound): configmaps \"node-config-infra\" not found\n", "stdout": "" }, "state": "list" } } 2018-11-07 06:48:35,177 p=18330 u=cloud-user | TASK [openshift_node_group : create a temp dir for this work] ****************** 2018-11-07 06:48:35,177 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:35 -0500 (0:00:00.065) 1:14:22.487 **** 2018-11-07 06:48:35,875 p=18330 u=cloud-user | changed: [master-0.shiftstack.automated.lan] 2018-11-07 06:48:35,899 p=18330 u=cloud-user | TASK [openshift_node_group : create node config template] ********************** 2018-11-07 06:48:35,899 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:35 -0500 (0:00:00.721) 1:14:23.209 **** 2018-11-07 06:48:38,791 p=18330 u=cloud-user | changed: [master-0.shiftstack.automated.lan] 2018-11-07 06:48:38,815 p=18330 u=cloud-user | TASK [openshift_node_group : lay down the config from the existing configmap] *** 2018-11-07 06:48:38,816 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:38 -0500 (0:00:02.916) 1:14:26.125 **** 2018-11-07 06:48:38,836 p=18330 u=cloud-user | skipping: [master-0.shiftstack.automated.lan] 2018-11-07 06:48:38,859 p=18330 u=cloud-user | TASK [openshift_node_group : specialize the generated configs for node-config-infra] *** 2018-11-07 06:48:38,859 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:38 -0500 (0:00:00.043) 1:14:26.169 **** 2018-11-07 06:48:39,712 p=18330 u=cloud-user | changed: [master-0.shiftstack.automated.lan] 2018-11-07 06:48:39,737 p=18330 u=cloud-user | TASK [openshift_node_group : show the yeditout debug var] ********************** 2018-11-07 06:48:39,737 p=18330 u=cloud-user | Wednesday 07 November 2018 06:48:39 -0500 (0:00:00.877) 1:14:27.047 **** 2018-11-07 06:48:39,786 p=18330 u=cloud-user | ok: [master-0.shiftstack.automated.lan] => { "yeditout": { "changed": true, "failed": false, "result": [ { "edit": { "apiVersion": "v1", "authConfig": { "authenticationCacheSize": 1000, "authenticationCacheTTL": "5m", "authorizationCacheSize": 1000, "authorizationCacheTTL": "5m" }, "dnsBindAddress": "127.0.0.1:53", "dnsDomain": "cluster.local", "dnsIP": "0.0.0.0", "dnsNameservers": null, "dnsRecursiveResolvConf": "/etc/origin/node/resolv.conf", "dockerConfig": { "dockerShimRootDirectory": "/var/lib/dockershim", "dockerShimSocket": "/var/run/dockershim.sock", "execHandlerName": "native" }, "enableUnidling": true, "imageConfig": { "format": "registry.redhat.io/openshift3/ose-${component}:${version}", "latest": false }, "iptablesSyncPeriod": "30s", "kind": "NodeConfig", "kubeletArguments": { "bootstrap-kubeconfig": [ "/etc/origin/node/bootstrap.kubeconfig" ], "cert-dir": [ "/etc/origin/node/certificates" ], "cloud-config": [ "/etc/origin/cloudprovider/openstack.conf" ], "cloud-provider": [ "openstack" ], "container-runtime": [ "remote" ], ], "container-runtime-endpoint": [ "/var/run/crio/crio.sock" ], "enable-controller-attach-detach": [ "true" ], "feature-gates": [ "RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true" ], "image-service-endpoint": [ "/var/run/crio/crio.sock" ], "node-labels": [ "node-role.kubernetes.io/infra=true" ], "pod-manifest-path": [ "/etc/origin/node/pods" ], "rotate-certificates": [ "true" ], "runtime-request-timeout": [ "10m" ] }, "masterClientConnectionOverrides": { "acceptContentTypes": "application/vnd.kubernetes.protobuf,application/json", "burst": 40, "contentType": "application/vnd.kubernetes.protobuf", "qps": 20 }, "masterKubeConfig": "node.kubeconfig", "networkConfig": { "mtu": 1450, "networkPluginName": "redhat/openshift-ovs-subnet" }, "servingInfo": { "bindAddress": "0.0.0.0:10250", "bindNetwork": "tcp4", "clientCA": "client-ca.crt" }, "volumeConfig": { "localQuota": { "perFSGroup": null } }, "volumeDirectory": "/var/lib/origin/openshift.local.volumes" }, "key": "kubeletArguments.node-labels" } ], "state": "present" } } ---->8---- Version-Release number of the following components: $ rpm -q openshift-ansible openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch $ rpm -q ansible ansible-2.5.7-1.el7ae.noarch $ ansible --version ansible 2.5.7 config file = /home/cloud-user/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] How reproducible: See above. Steps to Reproduce: 1. 2. 3. Actual results: See above. Expected results: node-config-infra configmap is not modified Additional info: BOOTSTRAP_CONFIG_NAME in /etc/sysconfig/atomic-openshift-node is properly set to 'node-config-infra', the issue is that the node-config is modified and it shouldn't. --- Additional comment from Eduardo Minguez on 2018-11-08 04:07:40 EST --- I've tested the same scenario but setting per host variables instead per group with same result. * Masters with cri-o only * App nodes with cri-o only * Infra nodes with docker only $ grep osm_use_cockpit all.yml osm_use_cockpit: false $ cat inventory/group_vars/masters.yml openshift_use_crio_only: true openshift_use_crio: true openshift_node_group_name: node-config-master-crio $ cat inventory/group_vars/openstack_compute_nodes.yml openshift_use_crio_only: true openshift_use_crio: true openshift_node_group_name: node-config-compute-crio $ cat inventory/host_vars/infra-node-0.shiftstack.automated.lan.yml openshift_use_crio_only: false openshift_use_crio: false openshift_node_group_name: node-config-infra $ cat inventory/host_vars/infra-node-1.shiftstack.automated.lan.yml openshift_use_crio_only: false openshift_use_crio: false openshift_node_group_name: node-config-infra $ cat inventory/host_vars/infra-node-2.shiftstack.automated.lan.yml openshift_use_crio_only: false openshift_use_crio: false openshift_node_group_name: node-config-infra --- Additional comment from Eduardo Minguez on 2018-11-08 04:36:30 EST --- I apologize as I copy/pasted the ansible version wrong. Those are the proper values: $ rpm -q openshift-ansible openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch $ rpm -q ansible ansible-2.6.7-1.el7ae.noarch $ ansible --version ansible 2.6.7 config file = /home/cloud-user/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] --- Additional comment from Russell Teague on 2018-11-08 08:12:46 EST --- I've been looking into this and found there is a possible issue with the oc_configmap module. When it attempts to retrieve the configmaps, it fails, and then uses the default configmap template from the openshift_node_group role. However, the template is processed as if it was on the master, which is set for use_crio, and therefore ends up including the crio settings. Looking into a fix. --- Additional comment from Russell Teague on 2018-11-16 09:43:12 EST --- WIP Proposed: https://github.com/openshift/openshift-ansible/pull/10645 --- Additional comment from Johnny Liu on 2018-11-19 04:33:47 EST --- Reproduce this bug with openshift-ansible-3.11.44-1.git.0.11d174e.el7.noarch. [nodes] master-node openshift_use_crio=True openshift_use_crio_only=True openshift_node_group_name='node-config-master-crio' infra-node openshift_node_group_name='node-config-infra' pure-crio-node openshift_use_crio=True openshift_use_crio_only=True openshift_node_group_name='node-config-compute-crio' [root@qe-jialiu311-mep-1 ~]# oc get node NAME STATUS ROLES AGE VERSION qe-jialiu311-mep-1 Ready master 26m v1.11.0+d4cacc0 qe-jialiu311-node-infra-1 NotReady infra 23m v1.11.0+d4cacc0 qe-jialiu311-node-pure-crio-runtime-1 Ready compute 23m v1.11.0+d4cacc0 [root@qe-jialiu311-node-infra-1 ~]# journalctl -f -u atomic-openshift-node.service -- Logs begin at Mon 2018-11-19 01:00:43 EST. -- Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: I1119 01:42:00.919032 46639 kubelet.go:299] Watching apiserver Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: W1119 01:42:00.925647 46639 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: W1119 01:42:00.925718 46639 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: E1119 01:42:00.926073 46639 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: E1119 01:42:00.926147 46639 kuberuntime_manager.go:172] Get runtime version failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: F1119 01:42:00.926160 46639 server.go:262] failed to run Kubelet: failed to create kubelet: rpc error: code = Unavailable desc = grpc: the connection is unavailable Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: Failed to start OpenShift Node. Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: atomic-openshift-node.service failed. Added one more TC to cover it. --- Additional comment from Russell Teague on 2018-11-19 10:42:37 EST --- Waiting for build $ git tag --contains 63e84e757e781a19da8b8cdac151c78922ae4ebc
https://github.com/openshift/openshift-ansible/pull/10743
Fixed. openshift-ansible-3.10.83-1.git.0.12699eb.el7.noarch node-config are correct for crio container-runtime nodes and docker container-runtime nodes. Kernel Version: 3.10.0-957.1.3.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3750