Bug 1245147

Summary: Error while adding second gluster storage domain
Product: Red Hat Enterprise Virtualization Manager Reporter: Raz Tamir <ratamir>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.4CC: acanan, ahino, amureini, danken, ecohen, eedri, fdeutsch, gklein, lpeer, lsurette, prasanna.kalever, ratamir, rbalakri, Rhev-m-bugs, sabose, yeylon, ylavi
Target Milestone: ---Keywords: Automation, AutomationTriaged, Regression
Target Release: 3.5.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: gluster
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-16 08:29:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
vdsm log
none
gluster log
none
user logs none

Description Raz Tamir 2015-07-21 10:25:26 UTC
Created attachment 1054276 [details]
logs

Description of problem:
When trying to add second gluster storage domain, the operation will fail on:
"Error while executing action Add Storage Connection: Problem while trying to mount target"

All relevant gluster packages exists:
glusterfs-3.6.0.53-1.el7.x86_64
glusterfs-api-3.6.0.53-1.el7.x86_64
glusterfs-fuse-3.6.0.53-1.el7.x86_64
glusterfs-devel-3.6.0.53-1.el7.x86_64
glusterfs-libs-3.6.0.53-1.el7.x86_64
glusterfs-rdma-3.6.0.53-1.el7.x86_64

The volume is started and ownership permissions set to vdsm:kvm. 


Version-Release number of selected component (if applicable):
vdsm-4.16.21-1.el7ev.x86_64
vt16.3

How reproducible:
100%

Steps to Reproduce:
1. Add gluster storage domain
2. Add second gluster storage domain
3.

Actual results:
Described above


Expected results:


Additional info:

Comment 1 Raz Tamir 2015-07-21 10:41:32 UTC
Created attachment 1054288 [details]
vdsm log

New vdsm log attached

Comment 2 Allon Mureinik 2015-07-21 11:06:07 UTC
From the logs:

hread-61097::DEBUG::2015-07-21 08:05:00,255::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/mount -t glusterfs 10.35.160.6:/compute_volume01 /rhev/data-center/mnt/glusterSD/10.35.160.6:_compute__volume01 (cwd None)
Thread-61097::ERROR::2015-07-21 08:05:00,383::storageServer::213::Storage.StorageServer.MountConnection::(connect) Mount failed: (1, ';Mount failed. Please check the log file for more details.\n')
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storageServer.py", line 211, in connect
  File "/usr/share/vdsm/storage/mount.py", line 223, in mount
  File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd
MountError: (1, ';Mount failed. Please check the log file for more details.\n')

Raz - please attach the relevant gluster log too.

Anyway, from trying to mount the same volume on my machine, I received the following error:

2015-07-21 11:01:06.728960] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-compute_volume01-client-0: changing port to 49248 (from 0)
[2015-07-21 11:01:06.733607] E [socket.c:2332:socket_connect_finish] 0-compute_volume01-client-0: connection to 10.35.160.6:49248 failed (Connection refused)
[2015-07-21 11:01:06.737247] I [fuse-bridge.c:5086:fuse_graph_setup] 0-fuse: switched to graph 0
[2015-07-21 11:01:06.737779] I [fuse-bridge.c:4012:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.23
[2015-07-21 11:01:06.738520] W [fuse-bridge.c:780:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2015-07-21 11:01:06.760107] I [fuse-bridge.c:4933:fuse_thread_proc] 0-fuse: unmounting /root/raz/g
[2015-07-21 11:01:06.760611] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (15), shutting down

This seems either like a configuration with the machine's firewall or a change in gluster's packages.

Comment 3 Raz Tamir 2015-07-21 11:26:15 UTC
gluster log attached.
From quick look it seems that the operation didn't reach to gluster

Comment 4 Raz Tamir 2015-07-21 11:26:59 UTC
Created attachment 1054305 [details]
gluster log

Comment 5 Sahina Bose 2015-07-24 06:35:36 UTC
Are both the volumes on the same host, i.e the first storage domain and the second storage domain - are they connecting to gluster volumes on the same hosts?
If not, are the ports open for the second storage domain's volume (on 10.35.160.6)?

Can you provide "gluster volume status compute_volume01" output?

Comment 6 Aharon Canan 2015-07-26 08:40:01 UTC
After fixing gluster configuration issues we do not see it happens anymore, in case we will, we will change severity back.

For now we are not closing as ovirt community member face that issue as well.

Comment 7 Allon Mureinik 2015-07-26 09:55:41 UTC
(In reply to Aharon Canan from comment #6)
> For now we are not closing as ovirt community member face that issue as well.
The attached logs are from the gluster setup with the aforementioned configuration issues, iiuc.
Can we get the logs from teh issue you're referencing here please?

Comment 8 Raz Tamir 2015-07-28 14:53:19 UTC
Created attachment 1057050 [details]
user logs

user logs attached

Comment 9 Allon Mureinik 2015-07-29 15:13:35 UTC
This seems like the brunt of it:

Thread-205::INFO::2015-07-20 16:23:30,644::nfsSD::69::Storage.StorageDomain::(create) sdUUID=d6df7930-342a-493a-b70b-fb1c52b0828c domainName=ovirtprd01 remotePath=superstore001-stor.cs.example.com:/ovirtprd01 domClass=1
Thread-205::ERROR::2015-07-20 16:23:30,659::task::866::Storage.TaskManager.Task::(_setError) Task=`050e9378-ba78-4e6f-b986-0dda7bb09aa7`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
  File "/usr/share/vdsm/storage/hsm.py", line 2670, in createStorageDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 80, in create
  File "/usr/share/vdsm/storage/nfsSD.py", line 49, in _preCreateValidation
  File "/usr/share/vdsm/storage/fileSD.py", line 88, in validateFileSystemFeatures
  File "/usr/share/vdsm/storage/outOfProcess.py", line 320, in directTouch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 2] No such file or directory

Offhand, seems like sending a wrong path, not sure what we can do here.

Comment 10 Yaniv Lavi 2015-08-12 08:53:38 UTC
What was the issue you were having?
What fix do you still need on this?
Can you attach the config that caused this issue?

Comment 11 Aharon Canan 2015-08-16 08:29:35 UTC
Can not reproduce

Will reopen if we will face again.