Bug 1652191 - [3.10] Setting openshift_use_crio=false per group is ignored and node-config is modified
Summary: [3.10] Setting openshift_use_crio=false per group is ignored and node-config ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.10.z
Assignee: Russell Teague
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On: 1647516
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-21 16:23 UTC by Russell Teague
Modified: 2018-12-13 17:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Node configmaps are all created by running create tasks on the first master. Consequence: If the first master has openshift_use_crio=True, all configmaps are created with crio settings because the node-config template has crio settings as part of the template based on that host var. Fix: The crio settings have been removed from the node-config template so that crio settings will only be added fi they are part of the openshift_node_group edits. Additionally, the bootstrap-node-config is updated directly if the host openshift_use_crio=True. Result: Node configmaps are generated correctly based on openshift_node_group edits allowing nodes to be properly configured with crio settings.
Clone Of: 1647516
: 1656359 (view as bug list)
Environment:
Last Closed: 2018-12-13 17:09:08 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3750 None None None 2018-12-13 17:09:16 UTC

Description Russell Teague 2018-11-21 16:23:40 UTC
+++ This bug was initially created as a clone of Bug #1647516 +++

Description of problem:

Using openstack playbooks I'm trying to have an environment such as:

* Masters with cri-o only
* App nodes with cri-o only
* Infra nodes with docker only

So I have:

$ grep osm_use_cockpit all.yml 
osm_use_cockpit: false

$ cat inventory/group_vars/masters.yml 
openshift_use_crio_only: true
openshift_use_crio: true
openshift_node_group_name: node-config-master-crio

$ cat inventory/group_vars/openstack_compute_nodes.yml 
openshift_use_crio_only: true
openshift_use_crio: true
openshift_node_group_name: node-config-compute-crio

$ cat inventory/group_vars/openstack_infra_nodes.yml 
openshift_use_crio_only: false
openshift_use_crio: false
openshift_node_group_name: node-config-infra

Also, I need to 'patch' the roles/openshift_node/defaults/main.yml until this is released https://github.com/openshift/openshift-ansible/pull/10501/files

The installer skips docker installation in cri-o nodes but when creating the node-config.yaml file for infra nodes, it edits the node-config-infra to add the cri-o settings even if it shouldn't. Logs:


----8<----
2018-11-07 06:48:33,684 p=18330 u=cloud-user |  TASK [openshift_node_group : fetch node configmap] *****************************
2018-11-07 06:48:33,684 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:33 -0500 (0:00:00.071)       1:14:20.993 **** 
2018-11-07 06:48:35,090 p=18330 u=cloud-user |  ok: [master-0.shiftstack.automated.lan]
2018-11-07 06:48:35,111 p=18330 u=cloud-user |  TASK [openshift_node_group : debug node config] ********************************
2018-11-07 06:48:35,111 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:35 -0500 (0:00:01.427)       1:14:22.421 **** 
2018-11-07 06:48:35,155 p=18330 u=cloud-user |  ok: [master-0.shiftstack.automated.lan] => {
    "configout": {
        "changed": false, 
        "failed": false, 
        "results": {
            "cmd": "/bin/oc get configmap node-config-infra -o json -n openshift-node", 
            "results": [
                {}
            ], 
            "returncode": 0, 
            "stderr": "Error from server (NotFound): configmaps \"node-config-infra\" not found\n", 
            "stdout": ""
        }, 
        "state": "list"
    }
}
2018-11-07 06:48:35,177 p=18330 u=cloud-user |  TASK [openshift_node_group : create a temp dir for this work] ******************
2018-11-07 06:48:35,177 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:35 -0500 (0:00:00.065)       1:14:22.487 **** 
2018-11-07 06:48:35,875 p=18330 u=cloud-user |  changed: [master-0.shiftstack.automated.lan]
2018-11-07 06:48:35,899 p=18330 u=cloud-user |  TASK [openshift_node_group : create node config template] **********************
2018-11-07 06:48:35,899 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:35 -0500 (0:00:00.721)       1:14:23.209 **** 
2018-11-07 06:48:38,791 p=18330 u=cloud-user |  changed: [master-0.shiftstack.automated.lan]
2018-11-07 06:48:38,815 p=18330 u=cloud-user |  TASK [openshift_node_group : lay down the config from the existing configmap] ***
2018-11-07 06:48:38,816 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:38 -0500 (0:00:02.916)       1:14:26.125 **** 
2018-11-07 06:48:38,836 p=18330 u=cloud-user |  skipping: [master-0.shiftstack.automated.lan]
2018-11-07 06:48:38,859 p=18330 u=cloud-user |  TASK [openshift_node_group : specialize the generated configs for node-config-infra] ***
2018-11-07 06:48:38,859 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:38 -0500 (0:00:00.043)       1:14:26.169 **** 
2018-11-07 06:48:39,712 p=18330 u=cloud-user |  changed: [master-0.shiftstack.automated.lan]
2018-11-07 06:48:39,737 p=18330 u=cloud-user |  TASK [openshift_node_group : show the yeditout debug var] **********************
2018-11-07 06:48:39,737 p=18330 u=cloud-user |  Wednesday 07 November 2018  06:48:39 -0500 (0:00:00.877)       1:14:27.047 **** 
2018-11-07 06:48:39,786 p=18330 u=cloud-user |  ok: [master-0.shiftstack.automated.lan] => {
   "yeditout": {
        "changed": true, 
        "failed": false, 
        "result": [
            {
                "edit": {
                    "apiVersion": "v1", 
                    "authConfig": {
                        "authenticationCacheSize": 1000, 
                        "authenticationCacheTTL": "5m", 
                        "authorizationCacheSize": 1000, 
                        "authorizationCacheTTL": "5m"
                    }, 
                    "dnsBindAddress": "127.0.0.1:53", 
                    "dnsDomain": "cluster.local", 
                    "dnsIP": "0.0.0.0", 
                    "dnsNameservers": null, 
                    "dnsRecursiveResolvConf": "/etc/origin/node/resolv.conf", 
                    "dockerConfig": {
                        "dockerShimRootDirectory": "/var/lib/dockershim", 
                        "dockerShimSocket": "/var/run/dockershim.sock", 
                        "execHandlerName": "native"
                    }, 
                    "enableUnidling": true, 
                    "imageConfig": {
                        "format": "registry.redhat.io/openshift3/ose-${component}:${version}", 
                        "latest": false
                    }, 
                    "iptablesSyncPeriod": "30s", 
                    "kind": "NodeConfig", 
                    "kubeletArguments": {
                        "bootstrap-kubeconfig": [
                            "/etc/origin/node/bootstrap.kubeconfig"
                        ], 
                        "cert-dir": [
                            "/etc/origin/node/certificates"
                        ], 
                        "cloud-config": [
                            "/etc/origin/cloudprovider/openstack.conf"
                        ], 
                        "cloud-provider": [
                            "openstack"
                        ], 
                        "container-runtime": [
                            "remote"
                        ], 
                        ], 
                        "container-runtime-endpoint": [
                            "/var/run/crio/crio.sock"
                        ], 
                        "enable-controller-attach-detach": [
                            "true"
                        ], 
                        "feature-gates": [
                            "RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true"
                        ], 
                        "image-service-endpoint": [
                            "/var/run/crio/crio.sock"
                        ], 
                        "node-labels": [
                            "node-role.kubernetes.io/infra=true"
                        ], 
                        "pod-manifest-path": [
                            "/etc/origin/node/pods"
                        ], 
                        "rotate-certificates": [
                            "true"
                        ], 
                        "runtime-request-timeout": [
                            "10m"
                        ]
                    }, 
                    "masterClientConnectionOverrides": {
                        "acceptContentTypes": "application/vnd.kubernetes.protobuf,application/json", 
                        "burst": 40, 
                        "contentType": "application/vnd.kubernetes.protobuf", 
                        "qps": 20
                    }, 
                    "masterKubeConfig": "node.kubeconfig", 
                    "networkConfig": {
                        "mtu": 1450, 
                        "networkPluginName": "redhat/openshift-ovs-subnet"
                    }, 
                    "servingInfo": {
                        "bindAddress": "0.0.0.0:10250", 
                        "bindNetwork": "tcp4", 
                        "clientCA": "client-ca.crt"
                    }, 
                    "volumeConfig": {
                        "localQuota": {
                            "perFSGroup": null
                        }
                    }, 
                    "volumeDirectory": "/var/lib/origin/openshift.local.volumes"
                }, 
                "key": "kubeletArguments.node-labels"
            }
        ], 
        "state": "present"
    }
}
---->8----



Version-Release number of the following components:
$ rpm -q openshift-ansible
openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch

$ rpm -q ansible
ansible-2.5.7-1.el7ae.noarch

$ ansible --version
ansible 2.5.7
  config file = /home/cloud-user/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

How reproducible:
See above.

Steps to Reproduce:
1.
2.
3.

Actual results:
See above.

Expected results:
node-config-infra configmap is not modified

Additional info:
BOOTSTRAP_CONFIG_NAME in /etc/sysconfig/atomic-openshift-node is properly set to 'node-config-infra', the issue is that the node-config is modified and it shouldn't.

--- Additional comment from Eduardo Minguez on 2018-11-08 04:07:40 EST ---

I've tested the same scenario but setting per host variables instead per group with same result.

* Masters with cri-o only
* App nodes with cri-o only
* Infra nodes with docker only

$ grep osm_use_cockpit all.yml 
osm_use_cockpit: false

$ cat inventory/group_vars/masters.yml 
openshift_use_crio_only: true
openshift_use_crio: true
openshift_node_group_name: node-config-master-crio

$ cat inventory/group_vars/openstack_compute_nodes.yml 
openshift_use_crio_only: true
openshift_use_crio: true
openshift_node_group_name: node-config-compute-crio

$ cat inventory/host_vars/infra-node-0.shiftstack.automated.lan.yml
openshift_use_crio_only: false
openshift_use_crio: false
openshift_node_group_name: node-config-infra

$ cat inventory/host_vars/infra-node-1.shiftstack.automated.lan.yml
openshift_use_crio_only: false
openshift_use_crio: false
openshift_node_group_name: node-config-infra

$ cat inventory/host_vars/infra-node-2.shiftstack.automated.lan.yml
openshift_use_crio_only: false
openshift_use_crio: false
openshift_node_group_name: node-config-infra

--- Additional comment from Eduardo Minguez on 2018-11-08 04:36:30 EST ---

I apologize as I copy/pasted the ansible version wrong. Those are the proper values:

$ rpm -q openshift-ansible
openshift-ansible-3.11.16-1.git.0.4ac6f81.el7.noarch


$ rpm -q ansible
ansible-2.6.7-1.el7ae.noarch

$ ansible --version
ansible 2.6.7
  config file = /home/cloud-user/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

--- Additional comment from Russell Teague on 2018-11-08 08:12:46 EST ---

I've been looking into this and found there is a possible issue with the oc_configmap module.  When it attempts to retrieve the configmaps, it fails, and then uses the default configmap template from the openshift_node_group role.  However, the template is processed as if it was on the master, which is set for use_crio, and therefore ends up including the crio settings.  Looking into a fix.

--- Additional comment from Russell Teague on 2018-11-16 09:43:12 EST ---

WIP Proposed: https://github.com/openshift/openshift-ansible/pull/10645

--- Additional comment from Johnny Liu on 2018-11-19 04:33:47 EST ---

Reproduce this bug with openshift-ansible-3.11.44-1.git.0.11d174e.el7.noarch.

[nodes]
master-node openshift_use_crio=True openshift_use_crio_only=True openshift_node_group_name='node-config-master-crio'
infra-node openshift_node_group_name='node-config-infra'
pure-crio-node openshift_use_crio=True openshift_use_crio_only=True  openshift_node_group_name='node-config-compute-crio'

[root@qe-jialiu311-mep-1 ~]# oc get node
NAME                                    STATUS     ROLES     AGE       VERSION
qe-jialiu311-mep-1                      Ready      master    26m       v1.11.0+d4cacc0
qe-jialiu311-node-infra-1               NotReady   infra     23m       v1.11.0+d4cacc0
qe-jialiu311-node-pure-crio-runtime-1   Ready      compute   23m       v1.11.0+d4cacc0

[root@qe-jialiu311-node-infra-1 ~]# journalctl -f  -u atomic-openshift-node.service 
-- Logs begin at Mon 2018-11-19 01:00:43 EST. --
Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: I1119 01:42:00.919032   46639 kubelet.go:299] Watching apiserver
Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: W1119 01:42:00.925647   46639 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: W1119 01:42:00.925718   46639 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: E1119 01:42:00.926073   46639 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: E1119 01:42:00.926147   46639 kuberuntime_manager.go:172] Get runtime version failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Nov 19 01:42:00 qe-jialiu311-node-infra-1 atomic-openshift-node[46639]: F1119 01:42:00.926160   46639 server.go:262] failed to run Kubelet: failed to create kubelet: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: Failed to start OpenShift Node.
Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Nov 19 01:42:00 qe-jialiu311-node-infra-1 systemd[1]: atomic-openshift-node.service failed.

Added one more TC to cover it.

--- Additional comment from Russell Teague on 2018-11-19 10:42:37 EST ---

Waiting for build
$ git tag --contains 63e84e757e781a19da8b8cdac151c78922ae4ebc

Comment 5 Weihua Meng 2018-12-05 01:52:17 UTC
Fixed.

openshift-ansible-3.10.83-1.git.0.12699eb.el7.noarch

node-config are correct for crio container-runtime nodes and docker container-runtime nodes.

Kernel Version: 3.10.0-957.1.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)

Comment 7 errata-xmlrpc 2018-12-13 17:09:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3750


Note You need to log in before you can comment on or make changes to this bug.