Created attachment 1424506 [details] Ansible hosts file inventory Description of problem: Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch rpm -q ansible ansible-2.4.3.0-1.el7ae.noarch ansible --version ansible 2.4.3.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Feb 20 2018, 09:19:12) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] How reproducible: Install OCP 3.9 with two CNS clusters (app + infra) Steps to Reproduce: 1. Run deploy from bastion: ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml 2. It will fail with an error in "GlusterFS Install" phase 3. Go to the node where heketi-registry is trying to run # oc get pod -n infra-storage -o wide glusterfs-registry-htqwt 1/1 Running 0 16h 10.100.102.91 ocp1-sto-infra02.noc.empresas.da.tdcorp glusterfs-registry-ng2mx 1/1 Running 0 16h 10.100.102.92 ocp1-sto-infra03.noc.empresas.da.tdcorp glusterfs-registry-thn5n 1/1 Running 0 16h 10.100.102.90 ocp1-sto-infra01.noc.empresas.da.tdcorp heketi-registry-1-deploy 1/1 Running 0 1m 10.16.13.4 ocp1-sto-infra02.noc.empresas.da.tdcorp heketi-registry-1-g4bnq 0/1 CrashLoopBackOff 6 6m 10.16.11.5 ocp1-sto-infra01.noc.empresas.da.tdcorp 4. Look at the pods log # oc logs heketi-registry-1-g4bnq Heketi 6.0.0 ERROR: Unable to parse /etc/heketi/heketi.json: invalid character '#' after object key:value pair 5. Look at the pods config volume in /var/lib/origin [root@ocp1-sto-app02 devicemapper]# cat /var/lib/origin/openshift.local.volumes/pods/a57641a3-449c-11e8-8315-00505693112a/volumes/kubernetes.io~secret/config/heketi.json { "_port_comment": "Heketi Server Port Number", "port" : "8080", "_use_auth": "Enable JWT authorization. Please enable for deployment", "use_auth" : false, "_jwt" : "Private keys for access", "jwt" : { "_admin" : "Admin has access to all APIs", "admin" : { "key" : "My Secret" }, "_user" : "User only has access to /volumes endpoint", "user" : { "key" : "My Secret" } }, "_glusterfs_comment": "GlusterFS Configuration", "glusterfs" : { "_executor_comment": "Execute plugin. Possible choices: mock, kubernetes, ssh", "executor" : "kubernetes", "_db_comment": "Database file name", "db" : "/var/lib/heketi/heketi.db", "sshexec" : { "keyfile" : "/etc/heketi/private_key", "port" : "22", "user" : "root", "sudo" : false }, "_auto_create_block_hosting_volume": "Creates Block Hosting volumes automatically if not found or exsisting volume exhausted", "auto_create_block_hosting_volume": true # enable automatic creation of block-hosting volumes in heketi upon receiving the first request for a block volume, "_block_hosting_volume_size": "New block hosting volume will be created in size mentioned, This is considered only if auto-create is enabled.", "block_hosting_volume_size": 210 } } Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated TASK [openshift_storage_glusterfs : Wait for heketi pod] ********************************************************************************************* FAILED - RETRYING: Wait for heketi pod (30 retries left). FAILED - RETRYING: Wait for heketi pod (29 retries left). FAILED - RETRYING: Wait for heketi pod (28 retries left). FAILED - RETRYING: Wait for heketi pod (27 retries left). FAILED - RETRYING: Wait for heketi pod (26 retries left). FAILED - RETRYING: Wait for heketi pod (25 retries left). FAILED - RETRYING: Wait for heketi pod (24 retries left). FAILED - RETRYING: Wait for heketi pod (23 retries left). FAILED - RETRYING: Wait for heketi pod (22 retries left). FAILED - RETRYING: Wait for heketi pod (21 retries left). FAILED - RETRYING: Wait for heketi pod (20 retries left). FAILED - RETRYING: Wait for heketi pod (19 retries left). FAILED - RETRYING: Wait for heketi pod (18 retries left). FAILED - RETRYING: Wait for heketi pod (17 retries left). FAILED - RETRYING: Wait for heketi pod (16 retries left). FAILED - RETRYING: Wait for heketi pod (15 retries left). FAILED - RETRYING: Wait for heketi pod (14 retries left). FAILED - RETRYING: Wait for heketi pod (13 retries left). FAILED - RETRYING: Wait for heketi pod (12 retries left). FAILED - RETRYING: Wait for heketi pod (11 retries left). FAILED - RETRYING: Wait for heketi pod (10 retries left). FAILED - RETRYING: Wait for heketi pod (9 retries left). FAILED - RETRYING: Wait for heketi pod (8 retries left). FAILED - RETRYING: Wait for heketi pod (7 retries left). FAILED - RETRYING: Wait for heketi pod (6 retries left). FAILED - RETRYING: Wait for heketi pod (5 retries left). FAILED - RETRYING: Wait for heketi pod (4 retries left). FAILED - RETRYING: Wait for heketi pod (3 retries left). FAILED - RETRYING: Wait for heketi pod (2 retries left). FAILED - RETRYING: Wait for heketi pod (1 retries left). fatal: [ocp1-master01.noc.empresas.da.tdcorp]: FAILED! => {"attempts": 30, "changed": false, "results": {"cmd": "/usr/bin/oc get pod --selector=glusterfs=heketi-registry-pod -o json -n infra-storage", "results": [{"apiVersion": "v1", "items": [{"apiVersion": "v1", "kind": "Pod", "metadata": {"annotations": {"openshift.io/deployment-config.latest-version": "1", "openshift.io/deployment-config.name": "heketi-registry", "openshift.io/deployment.name": "heketi-registry-1", "openshift.io/scc": "privileged"}, "creationTimestamp": "2018-04-20T12:46:17Z", "generateName": "heketi-registry-1-", "labels": {"deployment": "heketi-registry-1", "deploymentconfig": "heketi-registry", "glusterfs": "heketi-registry-pod", "heketi": "registry-pod"}, "name": "heketi-registry-1-g4bnq", "namespace": "infra-storage", "ownerReferences": [{"apiVersion": "v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicationController", "name": "heketi-registry-1", "uid": "a238406f-4498-11e8-8315-00505693112a"}], "resourceVersion": "187425", "selfLink": "/api/v1/namespaces/infra-storage/pods/heketi-registry-1-g4bnq", "uid": "d1b7095c-4498-11e8-8315-00505693112a"}, "spec": {"containers": [{"env": [{"name": "HEKETI_USER_KEY", "value": "+H0WztUwkzVcIm2qeRQRKDSWMsOZGCEJktAIs8+fmSs="}, {"name": "HEKETI_ADMIN_KEY", "value": "oqspib4KT7DN6ADasWVVRExNwTV2XqP7+LJwLnN8+XM="}, {"name": "HEKETI_EXECUTOR", "value": "kubernetes"}, {"name": "HEKETI_FSTAB", "value": "/var/lib/heketi/fstab"}, {"name": "HEKETI_SNAPSHOT_LIMIT", "value": "14"}, {"name": "HEKETI_KUBE_GLUSTER_DAEMONSET", "value": "1"}, {"name": "HEKETI_IGNORE_STALE_OPERATIONS", "value": "true"}], "image": "rhgs3/rhgs-volmanager-rhel7:latest", "imagePullPolicy": "IfNotPresent", "livenessProbe": {"failureThreshold": 3, "httpGet": {"path": "/hello", "port": 8080, "scheme": "HTTP"}, "initialDelaySeconds": 30, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 3}, "name": "heketi", "ports": [{"containerPort": 8080, "protocol": "TCP"}], "readinessProbe": {"failureThreshold": 3, "httpGet": {"path": "/hello", "port": 8080, "scheme": "HTTP"}, "initialDelaySeconds": 3, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 3}, "resources": {}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/var/lib/heketi", "name": "db"}, {"mountPath": "/etc/heketi", "name": "config"}, {"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "heketi-registry-service-account-token-2g88w", "readOnly": true}]}], "dnsPolicy": "ClusterFirst", "imagePullSecrets": [{"name": "heketi-registry-service-account-dockercfg-phdtk"}], "nodeName": "ocp1-sto-infra01.noc.empresas.da.tdcorp", "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "heketi-registry-service-account", "serviceAccountName": "heketi-registry-service-account", "terminationGracePeriodSeconds": 30, "volumes": [{"glusterfs": {"endpoints": "heketi-db-registry-endpoints", "path": "heketidbstorage"}, "name": "db"}, {"name": "config", "secret": {"defaultMode": 420, "secretName": "heketi-registry-config-secret"}}, {"name": "heketi-registry-service-account-token-2g88w", "secret": {"defaultMode": 420, "secretName": "heketi-registry-service-account-token-2g88w"}}]}, "status": {"conditions": [{"lastProbeTime": null, "lastTransitionTime": "2018-04-20T12:46:17Z", "status": "True", "type": "Initialized"}, {"lastProbeTime": null, "lastTransitionTime": "2018-04-20T12:46:17Z", "message": "containers with unready status: [heketi]", "reason": "ContainersNotReady", "status": "False", "type": "Ready"}, {"lastProbeTime": null, "lastTransitionTime": "2018-04-20T12:46:17Z", "status": "True", "type": "PodScheduled"}], "containerStatuses": [{"containerID": "docker://783dd779b6eda05671d42d77a86235723a210e16ec926648544b3f1242fb5c54", "image": "registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7:latest", "imageID": "docker-pullable://registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7@sha256:02408e79aa76cc2031438a218b5ce8368c5e498b6dc1d5455e94e03da0a12456", "lastState": {"terminated": {"containerID": "docker://783dd779b6eda05671d42d77a86235723a210e16ec926648544b3f1242fb5c54", "exitCode": 1, "finishedAt": "2018-04-20T12:49:35Z", "reason": "Error", "startedAt": "2018-04-20T12:49:34Z"}}, "name": "heketi", "ready": false, "restartCount": 5, "state": {"waiting": {"message": "Back-off 2m40s restarting failed container=heketi pod=heketi-registry-1-g4bnq_infra-storage(d1b7095c-4498-11e8-8315-00505693112a)", "reason": "CrashLoopBackOff"}}}], "hostIP": "10.100.102.90", "phase": "Running", "podIP": "10.16.11.5", "qosClass": "BestEffort", "startTime": "2018-04-20T12:46:17Z"}}], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry PLAY RECAP ******************************************************************************************************************************************* localhost : ok=14 changed=0 unreachable=0 failed=0 ocp1-app01.noc.empresas.da.tdcorp : ok=122 changed=14 unreachable=0 failed=0 ocp1-app02.noc.empresas.da.tdcorp : ok=122 changed=14 unreachable=0 failed=0 ocp1-app03.noc.empresas.da.tdcorp : ok=122 changed=14 unreachable=0 failed=0 ocp1-infra01.noc.empresas.da.tdcorp : ok=122 changed=14 unreachable=0 failed=0 ocp1-infra02.noc.empresas.da.tdcorp : ok=122 changed=14 unreachable=0 failed=0 ocp1-infra03.noc.empresas.da.tdcorp : ok=122 changed=14 unreachable=0 failed=0 ocp1-lb01.noc.empresas.da.tdcorp : ok=49 changed=2 unreachable=0 failed=0 ocp1-master01.noc.empresas.da.tdcorp : ok=481 changed=84 unreachable=0 failed=1 ocp1-master02.noc.empresas.da.tdcorp : ok=292 changed=40 unreachable=0 failed=0 ocp1-master03.noc.empresas.da.tdcorp : ok=292 changed=40 unreachable=0 failed=0 ocp1-sto-app01.noc.empresas.da.tdcorp : ok=125 changed=15 unreachable=0 failed=0 ocp1-sto-app02.noc.empresas.da.tdcorp : ok=125 changed=15 unreachable=0 failed=0 ocp1-sto-app03.noc.empresas.da.tdcorp : ok=125 changed=15 unreachable=0 failed=0 ocp1-sto-infra01.noc.empresas.da.tdcorp : ok=125 changed=15 unreachable=0 failed=0 ocp1-sto-infra02.noc.empresas.da.tdcorp : ok=125 changed=15 unreachable=0 failed=0 ocp1-sto-infra03.noc.empresas.da.tdcorp : ok=125 changed=15 unreachable=0 failed=0 INSTALLER STATUS ************************************************************************************************************************************* Initialization : Complete (0:00:51) Health Check : Complete (0:00:47) etcd Install : Complete (0:00:47) Load balancer Install : Complete (0:00:12) Master Install : Complete (0:02:35) Master Additional Install : Complete (0:00:32) Node Install : Complete (0:03:35) GlusterFS Install : In Progress (0:07:45) This phase can be restarted by running: playbooks/openshift-glusterfs/config.yml Failure summary: 1. Hosts: ocp1-master01.noc.empresas.da.tdcorp Play: Configure GlusterFS Task: Wait for heketi pod Message: Failed without returning a message. Expected results: heketi-registry pod should be in running state Additional info: heketi.json looks wrong, has this line containing a comment leading with '#' ... "auto_create_block_hosting_volume": true # enable automatic creation of block-hosting volumes in heketi upon receiving the first request for a block volume ... Additional info: Please attach logs from ansible-playbook with the -vvv flag
Found the issue The #comment is in the inventory file and Ansible is moving it into the hefeti.json The line in the hosts/inventory file: openshift_storage_glusterfs_registry_block_host_vol_create=true # enable automatic creation of block-hosting volumes in heketi upon receiving the first request for a block volume
What happens if you remove the comment from the end of the variable definition?
I removed the comment from the end of the variable and the heketi-registry pod is Running [root@ocp1-master01 ~]# oc get pod -n infra-storage -o wide -w NAME READY STATUS RESTARTS AGE IP NODE glusterblock-registry-provisioner-dc-1-rzbqx 1/1 Running 0 1h 10.16.2.2 ocp1-infra01.noc.empresas.da.tdcorp glusterfs-registry-2stm9 1/1 Running 0 1h 10.100.102.92 ocp1-sto-infra03.noc.empresas.da.tdcorp glusterfs-registry-wkz7b 1/1 Running 0 1h 10.100.102.91 ocp1-sto-infra02.noc.empresas.da.tdcorp glusterfs-registry-xk4jc 1/1 Running 0 1h 10.100.102.90 ocp1-sto-infra01.noc.empresas.da.tdcorp heketi-registry-1-rjtdv 1/1 Running 0 1h 10.16.7.2 ocp1-infra03.noc.empresas.da.tdcorp
Sounds like a misconfigured inventory file, then. :) Problem solved?
(In reply to Jose A. Rivera from comment #4) > Sounds like a misconfigured inventory file, then. :) Problem solved? The problem is solved. Don't know if this is an expected behavior when you put a comment after a value like this: variable=value #comment
I believe Ansible does not support inline comments like that. Closing the BZ.