Bug 1669080 - CNS installation failed with "Unable to add device: Device /dev/vsda not found."
Summary: CNS installation failed with "Unable to add device: Device /dev/vsda not found."
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.11.z
Assignee: Jose A. Rivera
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks: 1668316 1668335
TreeView+ depends on / blocked
 
Reported: 2019-01-24 09:44 UTC by Qin Ping
Modified: 2019-03-14 02:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-03-14 02:17:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 11068 0 None closed gluster: detect intent to deploy legacy OpenShift Container Storage 2020-08-25 11:15:17 UTC
Red Hat Product Errata RHBA-2019:0407 0 None None None 2019-03-14 02:18:07 UTC

Description Qin Ping 2019-01-24 09:44:15 UTC
Description of problem:
CNS installation failed with "Unable to add device: Device /dev/vsda not found."

Version-Release number of the following components:
openshift-ansible-3.11.72-1.git.0.7c8b4f0.el7.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1. Deploy OCP 3.11 with glusterfs group
2.
3.

Actual results:
CNS installation failed.
fatal: [*.host.com]: FAILED! => {"changed": true, "cmd": ["oc", "--config=/tmp/openshift-glusterfs-ansible-BQ4KQX/admin.kubeconfig", "rsh", "--namespace=glusterfs", "deploy-heketi-storage-1-562fd", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "ufPFoif6F5UP/e/w8vOIKZ5xQUfimkyvg+wunYcT6/Q=", "topology", "load", "--json=/tmp/openshift-glusterfs-ansible-BQ4KQX/topology.json", "2>&1"], "delta": "0:00:04.942068", "end": "2019-01-21 05:02:11.234561", "failed_when_result": true, "rc": 0, "start": "2019-01-21 05:02:06.292493", "stderr": "", "stderr_lines": [], "stdout": "Creating cluster ... ID: 8934e337e78817423a56e7d6cc7d0e3f\n\tAllowing file volumes on cluster.\n\tAllowing block volumes on cluster.\n\tCreating node flexy-gluster-mchomaglusterfs-node-1 ... ID: fdff635ada4832278cea78a827dc1042\n\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.\n\tCreating node flexy-gluster-mchomaglusterfs-node-2 ... ID: 2232b6d20102586b56210a1200c2028b\n\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.\n\tCreating node flexy-gluster-mchomaglusterfs-node-3 ... ID: 63f4b4900c1e2b8bd64d8a16fb72aa6e\n\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.", "stdout_lines": ["Creating cluster ... ID: 8934e337e78817423a56e7d6cc7d0e3f", "\tAllowing file volumes on cluster.", "\tAllowing block volumes on cluster.", "\tCreating node flexy-gluster-mchomaglusterfs-node-1 ... ID: fdff635ada4832278cea78a827dc1042", "\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.", "\tCreating node flexy-gluster-mchomaglusterfs-node-2 ... ID: 2232b6d20102586b56210a1200c2028b", "\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found.", "\tCreating node flexy-gluster-mchomaglusterfs-node-3 ... ID: 63f4b4900c1e2b8bd64d8a16fb72aa6e", "\t\tAdding device /dev/vsda ... Unable to add device: Device /dev/vsda not found."]}

Expected results:
CNS installation successes.

Additional info:

$ cat /tmp/openshift-glusterfs-ansible-BQ4KQX/topology.json 
{
  "clusters": [{
      "nodes": [{
          "node": {
            "hostnames": {
              "manage": ["flexy-gluster-mchomaglusterfs-node-1"],
              "storage": ["172.16.120.54"]
            },
            "zone": 1
          },
          "devices": ["/dev/vsda"]
        },{
          "node": {
            "hostnames": {
              "manage": ["flexy-gluster-mchomaglusterfs-node-2"],
              "storage": ["172.16.120.4"]
            },
            "zone": 1
          },
          "devices": ["/dev/vsda"]
        },{
          "node": {
            "hostnames": {
              "manage": ["flexy-gluster-mchomaglusterfs-node-3"],
              "storage": ["172.16.120.88"]
            },
            "zone": 1
          },
          "devices": ["/dev/vsda"]
        }]
    }]
}

device in gluster node host:
# ls /dev/vsda -l
lrwxrwxrwx. 1 root root 10 Jan 21 04:37 /dev/vsda -> /dev/loop0

device in gluster-storage pod:
# oc exec glusterfs-storage-kjmfm -- ls /dev/vsda
ls: cannot access /dev/vsda: No such file or directory
command terminated with exit code 2

# oc exec glusterfs-storage-kjmfm -- ls /mnt/host-dev/vsda -l
lrwxrwxrwx. 1 root root 10 Jan 21 09:37 /mnt/host-dev/vsda -> /dev/loop0

# oc get ds glusterfs-storage -oyaml
apiVersion: extensions/v1beta1
...
        volumeMounts:
        ...
        - mountPath: /mnt/host-dev
          name: glusterfs-dev
        ...
      volumes:
      ...
      - hostPath:
          path: /dev
          type: ""
        name: glusterfs-dev
...

Comment 1 Marek Schmidt 2019-01-24 09:58:57 UTC
Does the problem only occur when the specified glusterfs device is a symlink?

Comment 2 Jose A. Rivera 2019-01-24 13:38:45 UTC
I believe this PR should resolve things: https://github.com/openshift/openshift-ansible/pull/11068

Comment 3 Niels de Vos 2019-01-24 14:14:27 UTC
What version of OCS are you installing? Anything before ocs-3.11.1 should work with the PR from comment #2, ocs-3.11.1 and newer are expected to work with openshift-ansible-3.11.72.

Comment 4 Aleksandar Kostadinov 2019-01-24 14:36:01 UTC
 Qin Ping is reporting 3.11.72 in description. Past failures I see show it happens with v3.11.59.

Comment 6 Niels de Vos 2019-01-24 15:01:36 UTC
(In reply to Niels de Vos from comment #3)
> What version of OCS are you installing? Anything before ocs-3.11.1 should
> work with the PR from comment #2, ocs-3.11.1 and newer are expected to work
> with openshift-ansible-3.11.72.

The inventory has this:

openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-server-rhel7

And https://access.redhat.com/containers/?tab=overview#/registry.access.redhat.com/rhgs3/rhgs-server-rhel7 currently lists 3.11.0-6 as version.

This means that https://github.com/openshift/openshift-ansible/pull/11068 is expected to address the problem. You would need to provide the :3.11.0-6 tag to openshift_storage_glusterfs_image as (the working) :latest is not released and available from registry.access.redhat.com yet.

Comment 7 Qin Ping 2019-01-25 06:59:14 UTC
Agree.

Workaround: 
if you have an openshift-ansible version that consumes the HOST_DEV_DIR variable in the template, you can set it to "/dev" when it get processed. That is what https://github.com/openshift/openshift-ansible/pull/11068/files does too.

Comment 8 Scott Dodson 2019-01-28 14:46:53 UTC
Move to ON_QA as referenced PR is in openshift-ansible-3.11.74-1 and later

Comment 10 Qin Ping 2019-02-11 03:37:15 UTC
Still got the same error.

# oc version
oc v3.11.74
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
openshift v3.11.74
kubernetes v1.11.0+d4cacc0

# oc exec glusterfs-storage-8pw87 -- rpm -qa|grep gluster
python2-gluster-3.12.2-25.el7rhgs.x86_64
glusterfs-server-3.12.2-25.el7rhgs.x86_64
gluster-block-0.2.1-28.el7rhgs.x86_64
glusterfs-api-3.12.2-25.el7rhgs.x86_64
glusterfs-cli-3.12.2-25.el7rhgs.x86_64
glusterfs-fuse-3.12.2-25.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64
glusterfs-libs-3.12.2-25.el7rhgs.x86_64
glusterfs-3.12.2-25.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64


[glusterfs]
host-1 ansible_user=root ansible_ssh_user=root glusterfs_devices="['/dev/vsda']"
host-2 ansible_user=root ansible_ssh_user=root glusterfs_devices="['/dev/vsda']"
host-3 ansible_user=root ansible_ssh_user=root glusterfs_devices="['/dev/vsda']"

Comment 11 Jose A. Rivera 2019-02-13 13:32:40 UTC
Please reproduce the issue and grab the output of "oc logs <heketi_pod>".

Comment 12 Qin Ping 2019-02-14 03:57:45 UTC
Sorry for the last comment.

I re-installed the CNS today, it succeeded.

So mark this as verified.

Verified this:
openshift-ansible-3.11.74-1.git.0.cde4c69.el7.noarch.rpm

Comment 14 errata-xmlrpc 2019-03-14 02:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0407


Note You need to log in before you can comment on or make changes to this bug.