Bug 1165215 - [vdsm] connectStorageServer fails on a RHEL7 host while trying to connect to a gluster domain which was created using a RHEL6 host
Summary: [vdsm] connectStorageServer fails on a RHEL7 host while trying to connect to ...
Keywords:
Status: CLOSED DUPLICATE of bug 1177651
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Adam Litke
QA Contact: Aharon Canan
URL:
Whiteboard: storage
: 1165243 (view as bug list)
Depends On:
Blocks: 1165243
TreeView+ depends on / blocked
 
Reported: 2014-11-18 15:17 UTC by Elad
Modified: 2016-02-10 19:12 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1165243 (view as bug list)
Environment:
Last Closed: 2015-01-15 10:12:46 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
amureini: needinfo-


Attachments (Terms of Use)
/var/log from both hosts (9.00 MB, application/x-gzip)
2014-11-18 15:17 UTC, Elad
no flags Details
sosreport (11.70 MB, application/x-gzip)
2014-12-02 11:37 UTC, Elad
no flags Details
debug level logs (13.02 MB, application/x-gzip)
2014-12-04 14:38 UTC, Elad
no flags Details
gluster logs (439.75 KB, application/x-gzip)
2014-12-07 09:21 UTC, Elad
no flags Details

Description Elad 2014-11-18 15:17:21 UTC
Created attachment 958622 [details]
/var/log from both hosts

Description of problem:
I tried to add a new RHEL7 host to a DC where I already had a host with RHEL6.6 OS installed. In the DC I had a gluster domain which was created using the RHEL6.6 host. The host wasn't able to connect to the storage pool due to a mount failure to the gluster domain.

Version-Release number of selected component (if applicable):

rhev 3.5 vt10

On the RHEL7 host:
Red Hat Enterprise Linux Server release 7.0 (Maipo)
vdsm-4.16.7.4-1.el7ev.x86_64
scsi-target-utils-gluster-1.0.46-3.el7.x86_64
glusterfs-api-devel-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-api-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-3.4.0.65rhs-1.el7_0.x86_64
samba-vfs-glusterfs-4.1.1-37.el7_0.x86_64
glusterfs-rdma-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-debuginfo-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-libs-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-fuse-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-devel-3.4.0.65rhs-1.el7_0.x86_64
libvirt-daemon-1.1.1-29.el7_0.3.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.10.x86_64
sanlock-3.1.0-2.el7.x86_64

On the RHEL6.6 host:
Red Hat Enterprise Linux Server release 6.6 (Santiago)
vdsm-4.16.7.4-1.el6ev.x86_64
glusterfs-api-devel-3.6.0.28-2.el6.x86_64
glusterfs-cli-3.6.0.28-2.el6.x86_64
glusterfs-devel-3.6.0.28-2.el6.x86_64
samba-glusterfs-3.6.23-12.el6.x86_64
glusterfs-rdma-3.6.0.28-2.el6.x86_64
glusterfs-debuginfo-3.4.0.57rhs-1.el6_5.x86_64
glusterfs-libs-3.6.0.28-2.el6.x86_64
glusterfs-api-3.6.0.28-2.el6.x86_64
glusterfs-3.6.0.28-2.el6.x86_64
glusterfs-fuse-3.6.0.28-2.el6.x86_64
libvirt-0.10.2-46.el6_6.1.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
sanlock-2.8-1.el6.x86_64

rhevm-3.5.0-0.20.el6ev.noarch


How reproducible:
Always

Steps to Reproduce:
Have 2 hosts: one with RHEL6.6 and one with RHEL7. Install all gluster packages mentioned above
1. Create a gluster domain using a RHEL6.6 host 

Thread-52::DEBUG::2014-11-18 15:40:15,924::fileUtils::142::Storage.fileUtils::(createdir) Creating directory: /rhev/data-center/mnt/glusterSD/10.35.160.202:_elad1
Thread-52::DEBUG::2014-11-18 15:40:15,925::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /bin/mount -t glusterfs 10.35.160.202:/elad1 /rhev/data-center/mnt/glusterSD/10.35.160.202:_elad1 (cwd None)

2. Add a new cluster to the DC and add a RHEL7 host to it
3.

Actual results:
Host is unable to connect to the storage pool:

vdsm.log:

Thread-342::DEBUG::2014-11-18 17:00:04,106::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/mount -t glusterfs 10.35.160.202:/elad1 /rhev/data-center/mnt/glusterSD/10.35.160.202:_elad1 (cwd No
ne)
Thread-342::ERROR::2014-11-18 17:00:04,189::storageServer::211::Storage.StorageServer.MountConnection::(connect) Mount failed: (1, 'Mount failed. Please check the log file for more details.\n;')
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storageServer.py", line 209, in connect
    self._mount.mount(self.options, self._vfsType)
  File "/usr/share/vdsm/storage/mount.py", line 223, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (1, 'Mount failed. Please check the log file for more details.\n;')
Thread-342::ERROR::2014-11-18 17:00:04,189::hsm::2433::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2430, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 217, in connect
    raise e
MountError: (1, 'Mount failed. Please check the log file for more details.\n;')


engine.log:

2014-11-18 16:50:03,570 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-37) [3a4fb9aa] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Mess
age: Failed to connect Host green-vdsc to the Storage Domains gluster1.
2014-11-18 16:50:03,571 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-7-thread-37) [3a4fb9aa] FINISH, ConnectStorageServerVDSCommand, return: {a768d625-9c7
0-4440-b81b-18a1d9494221=477}, log id: b513854
2014-11-18 16:50:03,588 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-37) [3a4fb9aa] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Mess
age: The error message for connection 10.35.160.202:/elad1 returned by VDSM was: Problem while trying to mount target
2014-11-18 16:50:03,612 ERROR [org.ovirt.engine.core.bll.storage.GLUSTERFSStorageHelper] (org.ovirt.thread.pool-7-thread-37) [3a4fb9aa] The connection with details 10.35.160.202:/elad1 failed because of error code
 477 and error message is: problem while trying to mount target



Tried to add a new host with RHEL6 installed to the DC. It succeeded.

The problem occurs only on RHEL7 hosts.
Mounting the gluster voume manually from the host succeeds:


[root@green-vdsc ~]# mount -t glusterfs 10.35.160.202:/elad1 /mnt
[root@green-vdsc ~]# mount
10.35.160.202:/elad1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Expected results:
Connecting a RHEL7 host to a gluster domain should succeed if it was created by a RHEL6 host

Additional info: 
/var/log from both hosts
engine.log

Comment 1 Allon Mureinik 2014-11-24 17:23:25 UTC
*** Bug 1165243 has been marked as a duplicate of this bug. ***

Comment 2 Allon Mureinik 2014-11-30 13:50:46 UTC
Seems like an inconsistency in glusrefs - can be handled async if needed,

Comment 3 Timothy Asir 2014-12-02 11:04:53 UTC
Can you also provide all the glusterd and brick logs.
You can run sosreport to get the logs.

Comment 4 Elad 2014-12-02 11:37:55 UTC
Created attachment 963654 [details]
sosreport

sosreport from the Gluster storage server attached.
Attched also vdsm.log

The relevant volume named 'elad'. It has bricks in /export/elad in the peers.

Comment 5 Elad 2014-12-02 11:39:28 UTC
A reference from vdsm.log:

Thread-5539::ERROR::2014-12-02 13:28:57,096::hsm::2531::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2527, in disconnectStorageServer
    conObj.disconnect()
  File "/usr/share/vdsm/storage/storageServer.py", line 235, in disconnect
    self._mount.umount(True, True)
  File "/usr/share/vdsm/storage/mount.py", line 254, in umount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (32, ';umount: /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad: not mounted\n')

Comment 6 Timothy Asir 2014-12-04 07:24:58 UTC
Could you please set client log level to DEBUG before recreates the issue
and attach the log

To set client log level to DEBUG for a volume,
#gluster volume set VOLNAME diagnostics.client-log-level DEBUG

Comment 7 Elad 2014-12-04 14:38:50 UTC
Created attachment 964694 [details]
debug level logs

(In reply to Timothy Asir from comment #6)
> Could you please set client log level to DEBUG before recreates the issue
> and attach the log
> 
> To set client log level to DEBUG for a volume,
> #gluster volume set VOLNAME diagnostics.client-log-level DEBUG

Reproduced using DEBUG level on the volume. 'elad2' is the volume

Comment 8 Timothy Asir 2014-12-05 07:45:04 UTC
Please attach the latest rhev mount log of el7-host
(rhev-data-center-mnt-glusterSD-10.35.160.202:_elad2.log)

Comment 9 Elad 2014-12-07 09:21:05 UTC
Created attachment 965528 [details]
gluster logs

Comment 10 Elad 2014-12-08 10:36:33 UTC
I think the issue here occurs due to SElinux installed on the RHEL7 host.
I checked it again while SElinux in permissive and it doesn't reproduced, the domain is created successfully.

There are the SElinux rpms I'm using in my RHEL7 host:

[root@puma20 ~]# rpm -qa |grep selinux
libselinux-2.2.2-6.el7.x86_64
libselinux-ruby-2.2.2-6.el7.x86_64
libselinux-utils-2.2.2-6.el7.x86_64
libselinux-python-2.2.2-6.el7.x86_64
selinux-policy-3.12.1-153.el7_0.12.noarch
selinux-policy-targeted-3.12.1-153.el7_0.12.noarch

[root@puma20 ~]# rpm -qa |grep gluster
glusterfs-api-3.5.3-1.el7.x86_64
glusterfs-rdma-3.5.3-1.el7.x86_64
glusterfs-debuginfo-3.5.3-1.el7.x86_64
glusterfs-3.5.3-1.el7.x86_64
glusterfs-extra-xlators-3.5.3-1.el7.x86_64
glusterfs-api-devel-3.5.3-1.el7.x86_64
glusterfs-libs-3.5.3-1.el7.x86_64
glusterfs-devel-3.5.3-1.el7.x86_64
glusterfs-fuse-3.5.3-1.el7.x86_64


vdsm-4.16.8.1-2.el7ev.x86_64


Adam, can you take a look?

Comment 11 Yaniv Lavi 2015-01-14 15:33:00 UTC
Is this a dup of BZ #1177651?
That was also a SELINUX issue.

Comment 12 Elad 2015-01-15 07:54:58 UTC
Seems like a SELinux issue in both BZs.

Comment 13 Yaniv Lavi 2015-01-15 09:04:42 UTC
(In reply to Elad from comment #12)
> Seems like a SELinux issue in both BZs.

Can you try to recreate using 7.1 hosts?

Comment 14 Elad 2015-01-15 09:58:11 UTC
This was already tested by Ori as stated here https://bugzilla.redhat.com/show_bug.cgi?id=1181111#c10.

Comment 15 Yaniv Lavi 2015-01-15 10:12:46 UTC

*** This bug has been marked as a duplicate of bug 1177651 ***


Note You need to log in before you can comment on or make changes to this bug.