Bug 1609027

Summary: installer does not handle openshift_crio_docker_gc_node_selector={"node-role.kubernetes.io/compute": "true"} well.
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: If a node selector was provided as a value of 'true', it was interpreted as a boolean and would cause daemonset deployment to fail. Fix: The template for creating the daemonset is updated to quote the provided value to ensure it is interpreted as a string.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-31 06:18:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
installation log with inventory file embedded none

Description Johnny Liu 2018-07-26 18:43:29 UTC
Created attachment 1470867 [details]
installation log with inventory file embedded

Description of problem:
See the following details.

Version-Release number of the following components:
openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. install a cluster enable cri-o, and setting the following parameter
openshift_crio_docker_gc_node_selector={"node-role.kubernetes.io/compute": "true"}
2. trigger installation
3.

Actual results:
installation failed.
TASK [openshift_docker_gc : Apply dockergc DaemonSet] **************************
Thursday 26 July 2018  03:28:10 -0400 (0:00:00.537)       0:23:52.802 ********* 
fatal: [host-8-245-162.host.centralci.eng.rdu2.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": {"cmd": "/usr/bin/oc create -f /tmp/tmp.FizoYr71Pd/dockergc-ds.yaml -n default", "results": {}, "returncode": 1, "stderr": "Error from server (BadRequest): DaemonSet in version \"v1beta1\" cannot be handled as a DaemonSet: [pos 797]: json: expect char '\"' but got char 't'\n", "stdout": "serviceaccount \"dockergc\" created\n"}}

Expected results:
Installation succeed.

Additional info:
After the failure, log into cluster, check /tmp/tmp.FizoYr71Pd/dockergc-ds.yaml, found its node selector is set to:
      nodeSelector:
        node-role.kubernetes.io/compute: true

If adding quote for true like the following, the dockergc ds will be created successfully.
      nodeSelector:
        node-role.kubernetes.io/compute: "true"

Workaround:
openshift_crio_docker_gc_node_selector={'node-role.kubernetes.io/compute': '"true"'}

Comment 1 Russell Teague 2018-08-08 18:43:40 UTC
Proposed: https://github.com/openshift/openshift-ansible/pull/9486

Comment 2 openshift-github-bot 2018-08-10 08:04:05 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/5c7c5527a1d5540c932d6fc427f155b0996944ba
Merge pull request #9486 from mtnbikenc/fix-1609027

[Bug 1609027] Add quotes to docker gc node selector

Comment 3 Russell Teague 2018-08-14 14:09:28 UTC
release-3.10: https://github.com/openshift/openshift-ansible/pull/9587

Comment 5 Johnny Liu 2018-08-21 09:43:27 UTC
Verified this but with openshift-ansible-3.10.32-1.git.0.100156fNone, and PASS.

When openshift_crio_docker_gc_node_selector={"node-role.kubernetes.io/compute": "true"}, install is completed successfully.
# oc get ds dockergc -o yaml|grep nodeSelec -A 3
      nodeSelector:
        node-role.kubernetes.io/compute: "true"
      restartPolicy: Always
      schedulerName: default-scheduler

When openshift_crio_docker_gc_node_selector={"role": "node"}, install is also completed successfully.
# oc get ds dockergc -o yaml|grep nodeSelec -A 3
      nodeSelector:
        role: node
      restartPolicy: Always
      schedulerName: default-scheduler

Once correct rpm is moved to errata, will move this bug to verified.

Comment 6 Johnny Liu 2018-08-22 03:51:22 UTC
Now openshift-ansible-3.10.34-1.git.0.48df172None.noarch is already attached to errata, which is including the fix PR, based on comment 5, move this bug to verified.

Comment 8 errata-xmlrpc 2018-08-31 06:18:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2376