Bug 1638519

Summary: [3.10] Ability to install a cluster with a mix of Docker and CRI-O nodes
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Johnny Liu <jialiu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, cshereme, dmoessne, dornelas, jialiu, jokerman, mmccomas, nschuetz, pdwyer, rhowe, rteague, scuppett, wmeng, wsun, xtian
Version: 3.10.0   
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1615884 Environment:
Last Closed: 2018-11-11 16:39:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1615884    
Bug Blocks:    

Comment 1 Scott Dodson 2018-10-11 19:18:30 UTC
https://github.com/openshift/openshift-ansible/pull/10138 backport to release-3.10

In openshift-ansible-3.10.50-1 and later

Comment 3 Johnny Liu 2018-10-12 10:17:23 UTC
Verified this bug with openshift-ansible-3.10.53-1.git.0.ba2c2ec.el7.noarch, and PASS.


[nodes]
master-node  openshift_node_group_name='node-config-master'
infra-node openshift_node_group_name='node-config-infra'
pure-docker-node openshift_use_crio=False  openshift_node_group_name='node-config-compute'
pure-crio-node openshift_use_crio=True openshift_use_crio_only=True  openshift_node_group_name='node-config-compute-crio'
both-docker-crio-node openshift_use_crio=True openshift_use_crio_only=False  openshift_node_group_name='node-config-compute-crio'

Installation is completed successfully.
# oc version
oc v3.10.51
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-jialiu-master-etcd-1:8443
openshift v3.10.45
kubernetes v1.10.0+b81c8f8


# oc get po
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-bp6sn    1/1       Running   0          19m
dockergc-gzwn7             1/1       Running   0          18m
dockergc-z7xw5             1/1       Running   0          18m
registry-console-1-hrr6b   1/1       Running   0          18m
router-1-q99tz             1/1       Running   0          19m

pure-docker-node:
kubeletArguments:
  bootstrap-kubeconfig:
  - /etc/origin/node/bootstrap.kubeconfig
  cert-dir:
  - /etc/origin/node/certificates
  cloud-config:
  - /etc/origin/cloudprovider/gce.conf
  cloud-provider:
  - gce
  enable-controller-attach-detach:
  - 'true'
  feature-gates:
  - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true
  node-labels:
  - node-role.kubernetes.io/compute=true
  pod-manifest-path:
  - /etc/origin/node/pods
  rotate-certificates:
  - 'true'


pure-crio-node:
kubeletArguments:
  bootstrap-kubeconfig:
  - /etc/origin/node/bootstrap.kubeconfig
  cert-dir:
  - /etc/origin/node/certificates
  cloud-config:
  - /etc/origin/cloudprovider/gce.conf
  cloud-provider:
  - gce
  container-runtime:
  - remote
  container-runtime-endpoint:
  - unix:///var/run/crio/crio.sock
  enable-controller-attach-detach:
  - 'true'
  feature-gates:
  - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true
  image-service-endpoint:
  - unix:///var/run/crio/crio.sock
  node-labels:
  - node-role.kubernetes.io/compute=true,runtime=qe-crio
  pod-manifest-path:
  - /etc/origin/node/pods
  rotate-certificates:
  - 'true'
  runtime-request-timeout:
  - 10m


both-docker-crio-node:
kubeletArguments:
  bootstrap-kubeconfig:
  - /etc/origin/node/bootstrap.kubeconfig
  cert-dir:
  - /etc/origin/node/certificates
  cloud-config:
  - /etc/origin/cloudprovider/gce.conf
  cloud-provider:
  - gce
  container-runtime:
  - remote
  container-runtime-endpoint:
  - unix:///var/run/crio/crio.sock
  enable-controller-attach-detach:
  - 'true'
  feature-gates:
  - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true
  image-service-endpoint:
  - unix:///var/run/crio/crio.sock
  node-labels:
  - node-role.kubernetes.io/compute=true,runtime=qe-crio
  pod-manifest-path:
  - /etc/origin/node/pods
  rotate-certificates:
  - 'true'
  runtime-request-timeout:
  - 10m


Labeling pure-docker-node with "a=b" as a S2I build node. And modify master-config file to make sure S2I builder pod running on the node.
admissionConfig:
  pluginConfig:
    BuildDefaults:
      configuration:
        apiVersion: v1
        env: []
        kind: BuildDefaultsConfig
        nodeSelector:
          a: b

S2I build is completed successfully.

# oc get po -n install-test -o wide
NAME                             READY     STATUS      RESTARTS   AGE       IP           NODE
mongodb-1-n6dk5                  1/1       Running     0          16m       10.131.0.3   qe-jialiu-node-pure-crio-runtime-1
nodejs-mongodb-example-1-build   0/1       Completed   0          16m       10.128.2.3   qe-jialiu-node-pure-docker-runtime-1
nodejs-mongodb-example-1-g2qx2   1/1       Running     0          3m        10.131.0.4   qe-jialiu-node-pure-crio-runtime-1
nodejs-mongodb-example-1-sh8wq   1/1       Running     0          12m       10.128.2.4   qe-jialiu-node-pure-docker-runtime-1


The only issue is that in this PR, the kubeletArguments.container-runtime-endpoint setting is changed to "unix:///var/run/crio/crio.sock", "unix://" is newly added. While compared with the 3.11, kubeletArguments.container-runtime-endpoint setting in 3.11 is "/var/run/crio/crio.sock". 
So is this change is by design? or some regression issue?

Comment 4 Scott Dodson 2018-10-12 13:45:40 UTC
https://github.com/openshift/openshift-ansible/pull/10396 to backport that

Comment 8 Johnny Liu 2018-10-31 05:57:36 UTC
The following PR in comment 4 is already merged, so move this bug to ON_QA.

Comment 9 Johnny Liu 2018-10-31 08:17:56 UTC
Verified this bug with openshift-ansible-3.10.66-1.git.0.3c3a83a.el7.noarch, and PASS.

Based on comment 3, the kubeletArguments.container-runtime-endpoint is set to "[/var/run/crio/crio.sock]" now.

kubeletArguments:
  bootstrap-kubeconfig:
  - /etc/origin/node/bootstrap.kubeconfig
  cert-dir:
  - /etc/origin/node/certificates
  cloud-config:
  - /etc/origin/cloudprovider/gce.conf
  cloud-provider:
  - gce
  container-runtime:
  - remote
  container-runtime-endpoint:
  - /var/run/crio/crio.sock
  enable-controller-attach-detach:
  - 'true'
  feature-gates:
  - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true
  image-service-endpoint:
  - /var/run/crio/crio.sock
  node-labels:
  - node-role.kubernetes.io/compute=true,runtime=qe-crio
  pod-manifest-path:
  - /etc/origin/node/pods
  rotate-certificates:
  - 'true'
  runtime-request-timeout:
  - 10m

Comment 12 errata-xmlrpc 2018-11-11 16:39:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2709