Description of problem: CNS installation failed with "Unable to add device: Device /dev/vsda not found." Version-Release number of the following components: openshift-ansible-3.11.72-1.git.0.7c8b4f0.el7.noarch.rpm How reproducible: Always Steps to Reproduce: 1. Deploy OCP 3.11 with glusterfs group 2. 3. Actual results: CNS installation failed. fatal: [*.host.com]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/tmp/openshift-glusterfs-ansible-BQ4KQX/admin.kubeconfig", "rsh", "--namespace=glusterfs", "deploy-heketi-storage-1-562fd", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "ufPFoif6F5UP/e/w8vOIKZ5xQUfimkyvg+wunYcT6/Q=", "topology", "load", "--json=/tmp/openshift-glusterfs-ansible-BQ4KQX/topology.json", "2>&1"], "delta": "0:00:04.942068", "end": "2019-01-21 05:02:11.234561", "failed_when_result": true, "rc": 0, "start": "2019-01-21 05:02:06.292493", "stderr": "", "stderr_lines": [], "stdout": "Creating cluster ... ID: 8934e337e78817423a56e7d6cc7d0e3f\n\tAllowing file volumes on cluster.\n\tAllowing block volumes on cluster.\n\tCreating node flexy-gluster-mchomaglusterfs-node-1 ... ID: fdff635ada4832278cea78a827dc1042\n\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.\n\tCreating node flexy-gluster-mchomaglusterfs-node-2 ... ID: 2232b6d20102586b56210a1200c2028b\n\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.\n\tCreating node flexy-gluster-mchomaglusterfs-node-3 ... ID: 63f4b4900c1e2b8bd64d8a16fb72aa6e\n\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.", "stdout_lines": ["Creating cluster ... ID: 8934e337e78817423a56e7d6cc7d0e3f", "\tAllowing file volumes on cluster.", "\tAllowing block volumes on cluster.", "\tCreating node flexy-gluster-mchomaglusterfs-node-1 ... ID: fdff635ada4832278cea78a827dc1042", "\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.", "\tCreating node flexy-gluster-mchomaglusterfs-node-2 ... ID: 2232b6d20102586b56210a1200c2028b", "\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.", "\tCreating node flexy-gluster-mchomaglusterfs-node-3 ... ID: 63f4b4900c1e2b8bd64d8a16fb72aa6e", "\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found."]} Expected results: CNS installation successes. Additional info: $ cat /tmp/openshift-glusterfs-ansible-BQ4KQX/topology.json { "clusters": [{ "nodes": [{ "node": { "hostnames": { "manage": ["flexy-gluster-mchomaglusterfs-node-1"], "storage": ["172.16.120.54"] }, "zone": 1 }, "devices": ["/dev/vsda"] },{ "node": { "hostnames": { "manage": ["flexy-gluster-mchomaglusterfs-node-2"], "storage": ["172.16.120.4"] }, "zone": 1 }, "devices": ["/dev/vsda"] },{ "node": { "hostnames": { "manage": ["flexy-gluster-mchomaglusterfs-node-3"], "storage": ["172.16.120.88"] }, "zone": 1 }, "devices": ["/dev/vsda"] }] }] } device in gluster node host: # ls /dev/vsda -l lrwxrwxrwx. 1 root root 10 Jan 21 04:37 /dev/vsda -> /dev/loop0 device in gluster-storage pod: # oc exec glusterfs-storage-kjmfm -- ls /dev/vsda ls: cannot access /dev/vsda: No such file or directory command terminated with exit code 2 # oc exec glusterfs-storage-kjmfm -- ls /mnt/host-dev/vsda -l lrwxrwxrwx. 1 root root 10 Jan 21 09:37 /mnt/host-dev/vsda -> /dev/loop0 # oc get ds glusterfs-storage -oyaml apiVersion: extensions/v1beta1 ... volumeMounts: ... - mountPath: /mnt/host-dev name: glusterfs-dev ... volumes: ... - hostPath: path: /dev type: "" name: glusterfs-dev ...
Does the problem only occur when the specified glusterfs device is a symlink?
I believe this PR should resolve things: https://github.com/openshift/openshift-ansible/pull/11068
What version of OCS are you installing? Anything before ocs-3.11.1 should work with the PR from comment #2, ocs-3.11.1 and newer are expected to work with openshift-ansible-3.11.72.
Qin Ping is reporting 3.11.72 in description. Past failures I see show it happens with v3.11.59.
(In reply to Niels de Vos from comment #3) > What version of OCS are you installing? Anything before ocs-3.11.1 should > work with the PR from comment #2, ocs-3.11.1 and newer are expected to work > with openshift-ansible-3.11.72. The inventory has this: openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-server-rhel7 And https://access.redhat.com/containers/?tab=overview#/registry.access.redhat.com/rhgs3/rhgs-server-rhel7 currently lists 3.11.0-6 as version. This means that https://github.com/openshift/openshift-ansible/pull/11068 is expected to address the problem. You would need to provide the :3.11.0-6 tag to openshift_storage_glusterfs_image as (the working) :latest is not released and available from registry.access.redhat.com yet.
Agree. Workaround: if you have an openshift-ansible version that consumes the HOST_DEV_DIR variable in the template, you can set it to "/dev" when it get processed. That is what https://github.com/openshift/openshift-ansible/pull/11068/files does too.
Move to ON_QA as referenced PR is in openshift-ansible-3.11.74-1 and later
Still got the same error. # oc version oc v3.11.74 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO openshift v3.11.74 kubernetes v1.11.0+d4cacc0 # oc exec glusterfs-storage-8pw87 -- rpm -qa|grep gluster python2-gluster-3.12.2-25.el7rhgs.x86_64 glusterfs-server-3.12.2-25.el7rhgs.x86_64 gluster-block-0.2.1-28.el7rhgs.x86_64 glusterfs-api-3.12.2-25.el7rhgs.x86_64 glusterfs-cli-3.12.2-25.el7rhgs.x86_64 glusterfs-fuse-3.12.2-25.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64 glusterfs-libs-3.12.2-25.el7rhgs.x86_64 glusterfs-3.12.2-25.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64 [glusterfs] host-1 ansible_user=root ansible_ssh_user=root glusterfs_devices="['/dev/vsda']" host-2 ansible_user=root ansible_ssh_user=root glusterfs_devices="['/dev/vsda']" host-3 ansible_user=root ansible_ssh_user=root glusterfs_devices="['/dev/vsda']"
Please reproduce the issue and grab the output of "oc logs <heketi_pod>".
Sorry for the last comment. I re-installed the CNS today, it succeeded. So mark this as verified. Verified this: openshift-ansible-3.11.74-1.git.0.cde4c69.el7.noarch.rpm
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0407