Bug 1177651 - [RHEL7.0] oVirt fails to create glusterfs domain
Summary: [RHEL7.0] oVirt fails to create glusterfs domain
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Allon Mureinik
QA Contact: Ori Gofen
URL:
Whiteboard:
: 1165215 (view as bug list)
Depends On: 1181111 1185867
Blocks: 1165243 1205583
TreeView+ depends on / blocked
 
Reported: 2014-12-29 16:30 UTC by Ori Gofen
Modified: 2016-04-20 01:34 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1181111 1205583 (view as bug list)
Environment:
Last Closed: 2016-04-20 01:34:02 UTC
oVirt Team: Gluster
Target Upstream Version:
Embargoed:
amureini: needinfo+


Attachments (Terms of Use)
logs (5.91 MB, application/x-gzip)
2014-12-29 16:30 UTC, Ori Gofen
no flags Details
logs (1.16 MB, application/x-gzip)
2015-01-04 08:39 UTC, Ori Gofen
no flags Details
logs (558.19 KB, application/x-gzip)
2015-01-07 11:33 UTC, Ori Gofen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 38414 0 master MERGED spec: upgrade selinux dependency for gluster 2020-11-05 18:00:43 UTC

Description Ori Gofen 2014-12-29 16:30:08 UTC
Created attachment 974100 [details]
logs

Description of problem:

Creating a glusterfs domain fails if the target host is running Rhel7 OS,
tried most glusterfs packages versions 3.4.0 - 3.6, Rhel7.1 and Rhel6.5 work good.

the error on the UI:
Error while executing action Add Storage Connection: Problem while trying to mount target

attempted to mount target manually from host operation succeed:

root@dhcp-2-53 ~ # mount gluster-storage-01.scl.lab.tlv.redhat.com:ogofen5 /mnt
root@dhcp-2-53 ~ # cd /mnt  
root@dhcp-2-53 /mnt # ls   
9d4ba905-858b-465a-a1e1-59441d62ecec  __DIRECT_IO_TEST__    
  
Version-Release number of selected component (if applicable):
vdsm-xmlrpc-4.16.8.1-4.el7ev.noarch
vdsm-jsonrpc-4.16.8.1-4.el7ev.noarch
vdsm-python-zombiereaper-4.16.8.1-4.el7ev.noarch
vdsm-cli-4.16.8.1-4.el7ev.noarch
vdsm-4.16.8.1-4.el7ev.x86_64
vdsm-python-4.16.8.1-4.el7ev.noarch
vdsm-yajsonrpc-4.16.8.1-4.el7ev.noarch

glusterfs-api-devel-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-fuse-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-rdma-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-libs-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-debuginfo-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-devel-3.4.0.65rhs-1.el7_0.x86_64
glusterfs-api-3.4.0.65rhs-1.el7_0.x86_64

*note*
have tried all versions

How reproducible:
100%

Steps to Reproduce:
1.create glusterfs domain on a DC with one host running Rhel7


Actual results:
oVirt prompt an error window explaining there was a problem mounting the target while there is no mounting problem at all

Expected results:
operation successful 

Additional info:

Comment 2 Tal Nisan 2014-12-30 11:34:00 UTC
As seen in VDSM log, VDSM issues the correct mount command, however, this command fails with the error:

Thread-640::DEBUG::2014-12-29 18:20:27,473::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/mount -t glusterfs 10.35.160.6:ogofen4 /rhev/data-center/mnt/glusterSD/10.35.160.6:ogofen4 (cwd No
ne)
Thread-640::ERROR::2014-12-29 18:20:27,784::storageServer::211::Storage.StorageServer.MountConnection::(connect) Mount failed: (1, 'Mount failed. Please check the log file for more details.\n;')
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storageServer.py", line 209, in connect
    self._mount.mount(self.options, self._vfsType)
  File "/usr/share/vdsm/storage/mount.py", line 223, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (1, 'Mount failed. Please check the log file for more details.\n;')
Thread-640::ERROR::2014-12-29 18:20:27,785::hsm::2424::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2421, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 217, in connect
    raise e
MountError: (1, 'Mount failed. Please check the log file for more details.\n;')




In the Gluster logs (rhev-data-center-mnt-glusterSD-10.35.160.6:ogofen4.log) there is a permission denied error:

[2014-12-29 16:20:27.756326] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-ogofen4-client-0: changing port to 49259 (from 0)
[2014-12-29 16:20:27.759098] E [socket.c:2793:socket_connect] 0-ogofen4-client-0: connection attempt on 10.35.160.203:24007 failed, (Permission denied)
[2014-12-29 16:20:27.759530] I [rpc-clnt.c:1690:rpc_clnt_reconfig] 0-ogofen4-client-1: changing port to 49315 (from 0)
[2014-12-29 16:20:27.763127] E [socket.c:2793:socket_connect] 0-ogofen4-client-1: connection attempt on 10.35.160.6:24007 failed, (Permission denied)


Not sure if this is an environment problem or an actual Gluster issue, moving to the Gluster team for further inspection. Sahina, can someone from your team please have a look?

Comment 3 Bala.FA 2015-01-02 05:26:00 UTC
Looks like the volume is not started as per the log.  Can you confirm whether its running by

# gluster volume status <volume-name>

?

Comment 4 Sahina Bose 2015-01-02 05:57:39 UTC
Also, does the volume have the "server.allow-insecure on" option set and the option "rpc-auth-allow-insecure on" in glusterd.vol

Comment 5 Ori Gofen 2015-01-04 08:39:20 UTC
Created attachment 975918 [details]
logs

(In reply to Bala.FA from comment #3)
> Looks like the volume is not started as per the log.  Can you confirm
> whether its running by
> 
> # gluster volume status <volume-name>
> 
> ?

[root@gluster-storage-01 ~]# gluster volume info ogofen
 
Volume Name: ogofen
Type: Distribute
Volume ID: 8e65ff13-552f-46d7-853b-d46e43d25b37
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.35.160.6:/export/ogofen
Brick2: 10.35.160.203:/export/ogofen
Brick3: 10.35.160.202:/export/ogofen
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36

The volume is started, also I have created successfully a domain with a host running 7.1 OS (10.35.102.86), and failed with the host running 7.0(10.35.2.51).

Comment 6 Ori Gofen 2015-01-04 08:56:33 UTC
(In reply to Sahina Bose from comment #4)
> Also, does the volume have the "server.allow-insecure on" option set and the
> option "rpc-auth-allow-insecure on" in glusterd.vol

I have attempted to create a glusterfs domain with the flags you suggested(even though we have never used them and it worked fine):

gluster volume set ogofen1 server.allow-insecure on
gluster volume set ogofen1 rpc-auth-allow on 

with the same results as above.

Comment 7 Sahina Bose 2015-01-05 07:29:16 UTC
KP, could you help?

thanks!

Comment 8 Vijay Bellur 2015-01-07 06:29:18 UTC
Does /etc/glusterfs/glusterd.vol on all nodes contain "option rpc-auth-allow-insecure on"? If not, please add this line and restart glusterd.

Comment 9 Ori Gofen 2015-01-07 11:33:34 UTC
Created attachment 977219 [details]
logs

Have reproduced again after rechecked all flow is according to documentation and comments here.

verified vol is started:

[root@gluster-storage-03 ~]# gluster volume info ogofen4
 
Volume Name: ogofen4
Type: Distribute
Volume ID: 8e65ff13-552f-46d7-853b-d46e43d25b37
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.35.160.6:/export/ogofen4
Brick2: 10.35.160.203:/export/ogofen4
Brick3: 10.35.160.202:/export/ogofen4
Options Reconfigured:
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36


verified glusterd configuration on all servers:

[root@gluster-storage-01 ogofen4]# cat /etc/glusterfs/glusterd.vol                                                                                                                              
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option rpc-auth-allow-insecure on
#   option base-port 49152
end-volume

[root@gluster-storage-02 ogofen4]# cat /etc/glusterfs/glusterd.vol                                                                                                                              
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option rpc-auth-allow-insecure on
#   option base-port 49152
end-volume

[root@gluster-storage-03 ogofen4]# cat /etc/glusterfs/glusterd.vol                                                                                                                              
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option rpc-auth-allow-insecure on
#   option base-port 49152
end-volume

Have attempted to create glusterfs Domain twice, first time using a Rhel7.0 host (echoed "Add glusterfs domain with option rpc-auth-allow-insecure on [RHEL7]" to engine's log), second time using a Rhel6.6 host (echoed "Add glusterfs domain with option rpc-auth-allow-insecure on [RHEL7]" to engine's log).

the results are the same, while Rhel6.6 running same vdsm version succeed to create domain, Rhel7 host fails.

Comment 10 Vijay Bellur 2015-01-08 17:49:20 UTC
(In reply to Ori Gofen from comment #9)

> 
> Have attempted to create glusterfs Domain twice, first time using a Rhel7.0
> host (echoed "Add glusterfs domain with option rpc-auth-allow-insecure on
> [RHEL7]" to engine's log), second time using a Rhel6.6 host (echoed "Add
> glusterfs domain with option rpc-auth-allow-insecure on [RHEL7]" to engine's
> log).
> 
> the results are the same, while Rhel6.6 running same vdsm version succeed to
> create domain, Rhel7 host fails.

I still see this in the log file:

    [2015-01-07 11:01:47.440336] E [socket.c:2903:socket_connect] 0-ogofen4-client-0: connection attempt on 10.35.160.203:24007 failed, (Permission denied)
    [2015-01-07 11:01:47.440398] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-ogofen4-client-1: changing port to 49315 (from 0)
    [2015-01-07 11:01:47.444724] E [socket.c:2903:socket_connect] 0-ogofen4-client-1: connection attempt on 10.35.160.6:24007 failed, (Permission denied)

    Can we check if telnet 10.35.160.203:24007 from the RHEL 7 host to see if it goes through? If not, it might be good to check the firewall rules on the RHEL 7 host.

Comment 11 Ori Gofen 2015-01-11 12:56:11 UTC
(In reply to Vijay Bellur from comment #10)
> (In reply to Ori Gofen from comment #9)
> 
> > 
> > Have attempted to create glusterfs Domain twice, first time using a Rhel7.0
> > host (echoed "Add glusterfs domain with option rpc-auth-allow-insecure on
> > [RHEL7]" to engine's log), second time using a Rhel6.6 host (echoed "Add
> > glusterfs domain with option rpc-auth-allow-insecure on [RHEL7]" to engine's
> > log).
> > 
> > the results are the same, while Rhel6.6 running same vdsm version succeed to
> > create domain, Rhel7 host fails.
> 
> I still see this in the log file:
> 
>     [2015-01-07 11:01:47.440336] E [socket.c:2903:socket_connect]
> 0-ogofen4-client-0: connection attempt on 10.35.160.203:24007 failed,
> (Permission denied)
>     [2015-01-07 11:01:47.440398] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
> 0-ogofen4-client-1: changing port to 49315 (from 0)
>     [2015-01-07 11:01:47.444724] E [socket.c:2903:socket_connect]
> 0-ogofen4-client-1: connection attempt on 10.35.160.6:24007 failed,
> (Permission denied)
> 
>     Can we check if telnet 10.35.160.203:24007 from the RHEL 7 host to see
> if it goes through? If not, it might be good to check the firewall rules on
> the RHEL 7 host.

the telnet result is similar on both RHEL7, RHEL6.6 hosts

root@purple-vds1 ~ # telnet 10.35.160.203:24007   <-- RHEL7
telnet: 10.35.160.203:24007: Name or service not known
10.35.160.203:24007: Unknown host

root@adder ~ # telnet 10.35.160.203:24007         <-- RHEL6
telnet: 10.35.160.203:24007: Name or service not known
10.35.160.203:24007: Unknown host

in addition, I have removed all iptable exceptions(from hypervisor and glusterfs servers) in order to avoid any firewall issues.

Comment 12 Vijay Bellur 2015-01-11 16:11:47 UTC
(In reply to Ori Gofen from comment #11)
> (In reply to Vijay Bellur from comment #10)
> > (In reply to Ori Gofen from comment #9)
> > 
> > > 
> > > Have attempted to create glusterfs Domain twice, first time using a Rhel7.0
> > > host (echoed "Add glusterfs domain with option rpc-auth-allow-insecure on
> > > [RHEL7]" to engine's log), second time using a Rhel6.6 host (echoed "Add
> > > glusterfs domain with option rpc-auth-allow-insecure on [RHEL7]" to engine's
> > > log).
> > > 
> > > the results are the same, while Rhel6.6 running same vdsm version succeed to
> > > create domain, Rhel7 host fails.
> > 
> > I still see this in the log file:
> > 
> >     [2015-01-07 11:01:47.440336] E [socket.c:2903:socket_connect]
> > 0-ogofen4-client-0: connection attempt on 10.35.160.203:24007 failed,
> > (Permission denied)
> >     [2015-01-07 11:01:47.440398] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
> > 0-ogofen4-client-1: changing port to 49315 (from 0)
> >     [2015-01-07 11:01:47.444724] E [socket.c:2903:socket_connect]
> > 0-ogofen4-client-1: connection attempt on 10.35.160.6:24007 failed,
> > (Permission denied)
> > 
> >     Can we check if telnet 10.35.160.203:24007 from the RHEL 7 host to see
> > if it goes through? If not, it might be good to check the firewall rules on
> > the RHEL 7 host.
> 
> the telnet result is similar on both RHEL7, RHEL6.6 hosts
> 
> root@purple-vds1 ~ # telnet 10.35.160.203:24007   <-- RHEL7
> telnet: 10.35.160.203:24007: Name or service not known
> 10.35.160.203:24007: Unknown host
> 
> root@adder ~ # telnet 10.35.160.203:24007         <-- RHEL6
> telnet: 10.35.160.203:24007: Name or service not known
> 10.35.160.203:24007: Unknown host
> 

Please use a space between hostname and port instead of ':'. In addition, are there any selinux denials observed in RHEL 7?

Comment 13 Ori Gofen 2015-01-12 12:04:09 UTC
(In reply to Vijay Bellur from comment #12)
> (In reply to Ori Gofen from comment #11)
> > (In reply to Vijay Bellur from comment #10)
> > > (In reply to Ori Gofen from comment #9)
> > > 
> > > > 
> > > > Have attempted to create glusterfs Domain twice, first time using a Rhel7.0
> > > > host (echoed "Add glusterfs domain with option rpc-auth-allow-insecure on
> > > > [RHEL7]" to engine's log), second time using a Rhel6.6 host (echoed "Add
> > > > glusterfs domain with option rpc-auth-allow-insecure on [RHEL7]" to engine's
> > > > log).
> > > > 
> > > > the results are the same, while Rhel6.6 running same vdsm version succeed to
> > > > create domain, Rhel7 host fails.
> > > 
> > > I still see this in the log file:
> > > 
> > >     [2015-01-07 11:01:47.440336] E [socket.c:2903:socket_connect]
> > > 0-ogofen4-client-0: connection attempt on 10.35.160.203:24007 failed,
> > > (Permission denied)
> > >     [2015-01-07 11:01:47.440398] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
> > > 0-ogofen4-client-1: changing port to 49315 (from 0)
> > >     [2015-01-07 11:01:47.444724] E [socket.c:2903:socket_connect]
> > > 0-ogofen4-client-1: connection attempt on 10.35.160.6:24007 failed,
> > > (Permission denied)
> > > 
> > >     Can we check if telnet 10.35.160.203:24007 from the RHEL 7 host to see
> > > if it goes through? If not, it might be good to check the firewall rules on
> > > the RHEL 7 host.
> > 
> > the telnet result is similar on both RHEL7, RHEL6.6 hosts
> > 
> > root@purple-vds1 ~ # telnet 10.35.160.203:24007   <-- RHEL7
> > telnet: 10.35.160.203:24007: Name or service not known
> > 10.35.160.203:24007: Unknown host
> > 
> > root@adder ~ # telnet 10.35.160.203:24007         <-- RHEL6
> > telnet: 10.35.160.203:24007: Name or service not known
> > 10.35.160.203:24007: Unknown host
> > 
> 
> Please use a space between hostname and port instead of ':'. In addition,
> are there any selinux denials observed in RHEL 7?

You are right about the selinux denials, this issue is moved to rhel7 
bz #1181111. setting this bug to be Blocked, by bz #1181111 or it can also be closed as far as I know.

Comment 14 Allon Mureinik 2015-01-14 14:06:15 UTC
Bug 1181111 will (probably) be solved by building a new selinux-policy rpm. This bug will be used to track the need for a patch to vdsm's spec file to require it.

Comment 15 Yaniv Lavi 2015-01-15 10:12:47 UTC
*** Bug 1165215 has been marked as a duplicate of this bug. ***

Comment 16 Eyal Edri 2015-01-25 09:18:07 UTC
allon, can we move it to 3.5.0-1 for now, since we're not going to be respining for GA even if the selinux-policy rpm will be ready.

Comment 17 Allon Mureinik 2015-01-25 17:28:15 UTC
(In reply to Eyal Edri from comment #16)
> allon, can we move it to 3.5.0-1 for now, since we're not going to be
> respining for GA even if the selinux-policy rpm will be ready.
Agreed. Tentatively targeting to 3.5.0-1 with hope of getting the selinux-policy RPM. 

IMHO, BTW, this should not be a blocker for 3.5.0-1 either, as there's an easy workaround (see doctext).

Comment 18 Eyal Edri 2015-01-29 14:02:31 UTC
still blocked on selinux, doesn't seem to be converging for 3.5.0-1,
moving to z-stream.

Comment 19 Eyal Edri 2015-02-25 08:40:24 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 20 Allon Mureinik 2015-03-25 09:10:03 UTC
Andrew - the enclosed doctext is perfect for RHEV 3.5.0, but should be amended looking forward.

In RHEV 3.6.0 (and in some 3.5.z zstream, not sure which yet - probably 3.5.3), "yum install vdsm" or "yum upgrade vdsm" will pull the appropriate selinux dependency for the RHEL channel and turn this bug into a mute point.

Not quiet sure how this should be handled process-wise - please advice.

Comment 22 Andrew Dahms 2015-03-25 10:35:25 UTC
Hi Allon,

Thank you for letting me know about this bug. Perhaps we can ensure it is added as a known issue to the release notes for now, and track its resolution in future releases? When it is resolved, we can remove the release note.

Let me know what you think.

Kind regards,

Andrew

Comment 23 Allon Mureinik 2015-03-25 12:14:14 UTC
(In reply to Andrew Dahms from comment #22)
> Hi Allon,
> 
> Thank you for letting me know about this bug. Perhaps we can ensure it is
> added as a known issue to the release notes for now, and track its
> resolution in future releases? When it is resolved, we can remove the
> release note.

Sounds good to me, thanks.
How do you want to track this?
Currently, RHEV engineering has two bugs for the issue:
1. bug 1177651 (this issue) for 3.6.0 - by then issue will already be fixed (note the bug's status is MODIFIED)
2. bug 1205583 for 3.5.3. When 3.5.3 will be released, the issue will be fixed. Unless it is moved back to 3.5.1 (unlikely), we'll need the release note you mentioned for 3.5.1.

Are these two enough, or do we need a separate docs bug for 3.5.1 in case 1205583 is only solved in 3.5.3?

Comment 24 Andrew Dahms 2015-03-27 05:48:24 UTC
Hi Allon,

The two bugs should be enough.

I will keep track of the two bugs and their status, and will also make sure to speak with you about whether we can remove the note in further z-stream releases as well.

Does that sound ok?

For the time being, I will add the doc text to the release notes so that the known issue is covered there.

Kind regards,

Andrew

Comment 25 Ori Gofen 2015-04-16 11:27:46 UTC
verified

Comment 26 Lucy Bopf 2015-09-28 05:50:51 UTC
Hi Allon,

I am tracking this bug for the 3.6 beta release notes. As your conversation with Andrew in comment 22 and comment 23 suggests, should I remove the doc text for this bug now that it is verified? Is there anything here that users need to know that we should include in the release notes?

Kind Regards,

Lucy

Comment 27 Allon Mureinik 2015-10-06 11:30:31 UTC
Hi Lucy,

Sorry for the late reply - I was on PTO for the local holidays. In RHEV 3.6.0 there's nothing to document for this bug. Simply installing VDSM will pull the relevant gluster and selinux libraries so creating a gluster storage domain is possible.

In short - this release note should be removed in 3.6.0 [beta].
Thanks!

Comment 28 Lucy Bopf 2015-10-07 03:14:39 UTC
Thanks, Allon. I'll set the 'requires_release_note' flag back to blank (I can't seem to set it to '-'), and remove the text now.


Note You need to log in before you can comment on or make changes to this bug.