Description of problem: While adding a glusterfs volume as Storage Domain to a POSIXFS Data Center, the mount option of 'backupvolfile-server=<secondary RHS server>' may be used. However, when that option is used, and if the primary RHS server used for the mount is unavailable, and the secondary RHS server node is available, the Storage Domain add operation fails with message: "Error: Cannot add Storage. Internal error, Storage Connection doesn't exist." And the issue is the same whether the host used is RHEL 6.4, RHEL 6.3, or RHEV-H 6.4. The following is logged in the 'messages' log of the hosts. ----------------------------------------------------------------------- Red Hat Enterprise Linux Server release 6.3 (Santiago): Mar 18 13:51:50 rhs-client1 vdsm Storage.HSM ERROR Could not connect to storageServer#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/hsm.py", line 2237, in connectStorageServer#012 conObj.connect()#012 File "/usr/share/vdsm/storage/storageServer.py", line 208, in connect#012 fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file)#012 File "/usr/share/vdsm/storage/mount.py", line 244, in getRecord#012 (self.fs_spec, self.fs_file))#012OSError: [Errno 2] Mount of `rhs-client15.lab.eng.blr.redhat.com:/RHS_minTest` at `/rhev/data-center/mnt/rhs-client15.lab.eng.blr.redhat.com:_RHS__minTest` does not exist ---------------------------- Red Hat Enterprise Linux Server release 6.4 (Santiago): Mar 18 13:39:20 rhs-gp-srv15 vdsm Storage.HSM ERROR Could not connect to storageServer#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/hsm.py", line 2237, in connectStorageServer#012 conObj.connect()#012 File "/usr/share/vdsm/storage/storageServer.py", line 208, in connect#012 fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file)#012 File "/usr/share/vdsm/storage/mount.py", line 244, in getRecord#012 (self.fs_spec, self.fs_file))#012OSError: [Errno 2] Mount of `rhs-client15.lab.eng.blr.redhat.com:/RHS_minTest` at `/rhev/data-center/mnt/rhs-client15.lab.eng.blr.redhat.com:_RHS__minTest` does not exist ---------------------------- Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130306.2.el6_4) Mar 18 08:22:33 localhost vdsm Storage.HSM ERROR Could not connect to storageServer#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/hsm.py", line 2237, in connectStorageServer#012 File "/usr/share/vdsm/storage/storageServer.py", line 208, in connect#012 File "/usr/share/vdsm/storage/mount.py", line 244, in getRecord#012OSError: [Errno 2] Mount of `rhs-client15.lab.eng.blr.redhat.com:/RHS_minTest` at `/rhev/data-center/mnt/rhs-client15.lab.eng.blr.redhat.com:_RHS__minTest` does not exist ----------------------------------------------------------------------- However, the mount option works as expected, when used manually on the hosts. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Ensure that one of the RHS server nodes hosting a glusterfs volume is inaccessible. 2. Attempt to add glusterfs volume as Storage Domain to POSIXFS Data Center, using the inaccessible RHS server node as primary server, with mount option 'backupvolfile-server', pointing to an available RHS server node Actual results: When the option 'backupvolfile-server=<secondary RHS server>' is used, and if the primary RHS server used for the mount is unavailable, the Storage Domain add operation fails, even though the secondary RHS server is available. Expected results: When the option 'backupvolfile-server=<secondary RHS server>' is used, and if the primary RHS server used for the mount is unavailable, the Storage Domain add operation should still be successful, when the secondary RHS server is available. Additional info: RHEV-M : 3.1.0-49.el6ev Hypervisors: RHEV-H 6.4 (20130306.2.el6_4) RHEL 6.4 RHEL 6.3 RHS servers: RHS-2.0-20130311.0-RHS-x86_64
can you upload the logs (glusterfs specific)? (we want to see the 'command-line' option in logs to understand whats wrong.
Additional Info: [root@rhs-client45 ~]# gluster peer status Number of Peers: 3 Hostname: rhs-client37.lab.eng.blr.redhat.com Port: 24007 Uuid: 1f13b836-1bf9-4df2-ba42-c5bdf12a3c54 State: Peer in Cluster (Connected) Hostname: rhs-client15.lab.eng.blr.redhat.com Port: 24007 Uuid: 6d82cb77-39cf-48c6-8c9d-dfee5cebf30a State: Peer in Cluster (Disconnected) Hostname: rhs-client10.lab.eng.blr.redhat.com Port: 24007 Uuid: c1eb3946-771a-4347-b771-4722b2e4321d State: Peer in Cluster (Connected) [root@rhs-client45 ~]# gluster volume info Volume Name: RHS_vmstore Type: Distribute Volume ID: a39f5dd4-104d-4812-bbd3-3f7d7dd40b92 Status: Started Number of Bricks: 12 Transport-type: tcp Bricks: Brick1: rhs-client45.lab.eng.blr.redhat.com:/brick1 Brick2: rhs-client37.lab.eng.blr.redhat.com:/brick1 Brick3: rhs-client15.lab.eng.blr.redhat.com:/brick1 Brick4: rhs-client10.lab.eng.blr.redhat.com:/brick1 Brick5: rhs-client45.lab.eng.blr.redhat.com:/brick2 Brick6: rhs-client37.lab.eng.blr.redhat.com:/brick2 Brick7: rhs-client15.lab.eng.blr.redhat.com:/brick2 Brick8: rhs-client10.lab.eng.blr.redhat.com:/brick2 Brick9: rhs-client45.lab.eng.blr.redhat.com:/brick3 Brick10: rhs-client37.lab.eng.blr.redhat.com:/brick3 Brick11: rhs-client15.lab.eng.blr.redhat.com:/brick3 Brick12: rhs-client10.lab.eng.blr.redhat.com:/brick3 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 network.remote-dio: on cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off
Please note: The volume was 6x2 Distribute-Replicate when the issue was first noticed and reported. When I reproduced the issue again, for which the information is given above, I used a 12 Brick Distribute Volume. So the issue occurs regardless of volume type. The logs for the failures in both the volume types must be available in the Hypervisor sosreports attached, since the Hypervisors were not changed for the tests.
(In reply to comment #0) > > However, the mount option works as expected, when used manually on the hosts. > Rejy, I'm a little confused by this. So you mean when you use the 'backupvolfile-server' option with a mount command (from cli) on the same hosts it works, but when used from the RHEV ui it doesn't? Can you clarify this?
(In reply to comment #8) > (In reply to comment #0) > > > > > However, the mount option works as expected, when used manually on the hosts. > > > > Rejy, > > I'm a little confused by this. So you mean when you use the > 'backupvolfile-server' option with a mount command (from cli) on the same > hosts it works, but when used from the RHEV ui it doesn't? > > Can you clarify this? Your statement is correct. And from our discussion offline, there is a possibility that this could be caused by error in processing by RHEV-M. So I am going to raise this BZ issue in rhev-gluster mailing list, and see if we can get some RHEV guy to have a look at it. I will keep you in 'cc'. Cheers! rejy (rmc)
'/usr/bin/sudo -n /bin/mount -t glusterfs -o backupvolfile-server=rhs-client37.lab.eng.blr.redhat.com rhs-client45.lab.eng.blr.redhat.com:/RHS_vmstore /rhev/data-center/mnt/rhs-client45.lab.eng.blr.redhat.com:_RHS__vmstore' (cwd None) If above is the mount, vdsm/rhev has respected it .
Thread-227::INFO::2013-03-18 20:32:57,933::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': 'rhs-client37.lab.eng.blr.redhat.com:/RHS_vmstore', 'mnt_options': 'backupvolfile-server=rhs-client45.lab.eng.blr.redhat.com', 'portal': '', 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': '00000000-0000-0000-0000-000000000000'}], options=None) Thread-227::DEBUG::2013-03-18 20:32:57,953::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t glusterfs -o backupvolfile-server=rhs-client45.lab.eng.blr.redhat.com rhs-client37.lab.eng.blr.redhat.com:/RHS_vmstore /rhev/data-center/mnt/rhs-client37.lab.eng.blr.redhat.com:_RHS__vmstore' (cwd None) Thread-227::ERROR::2013-03-18 20:33:02,307::hsm::2241::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2237, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 208, in connect fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file) File "/usr/share/vdsm/storage/mount.py", line 244, in getRecord (self.fs_spec, self.fs_file)) OSError: [Errno 2] Mount of `rhs-client37.lab.eng.blr.redhat.com:/RHS_vmstore` at `/rhev/data-center/mnt/rhs-client37.lab.eng.blr.redhat.com:_RHS__vmstore` does not exist ^^^^ So, above is the story. but, unfortunately, the area of code path seems very vague to find out the exact cause of the issue. how-ever looking further on it.
It needs further inspection, but this smells like a dup of bug 883877.
(In reply to comment #12) > It needs further inspection, but this smells like a dup of bug 883877. Indeed.. The current strings are not useful. We should have some more detailed error strings as mentioned ( http://gerrit.ovirt.org/#/c/12042/2/vdsm/storage/storageServer.py) in 883877.
Is this env still exist ? if yes, we could include above mentioned patch and get more detailed error which can help us to find out the root cause.
(In reply to comment #14) > Is this env still exist ? if yes, we could include above mentioned patch and > get more detailed error which can help us to find out the root cause. The exact set-up based on which the issue was reported has been dismantled. However the issue is easily reproducible in my current set-up as well. I do not know how to apply the patch to get the detailed error. I can provide access to my set-up if you think it will be helpful to debug the issue.
FYI- It appears that the other mount options of glusterfs, are not being honoured, before they reach the 'mount' call. Have opened another one for it - BZ 927262
Referring to Comment 12 : It may be possible to get more information about the cause of the issue by using the patch, as given at https://bugzilla.redhat.com/show_bug.cgi?id=883877#c10
Moving the state to ON_QA as per comment 18
(In reply to comment #19) > Moving the state to ON_QA as per comment 18 The patch provided by the RHEV team is only to provide more debugging info, and DOES NOT resolve the issue. I believe that the Devel team needs to use the patch, and diagnose the issue from the extra debugging info available, and work to resolve the issue. Since the reported issue is still unresolved, moving back to ASSIGNED.
Rejy, From the updates on the related bug(927262) you'd filed, it is clear that this is not a glusterfs/RHS issue. It would make sense to either close this bug as notabug (this is not an RHS bug) or move the bug to the correct product. What do you think would be the better option? - Kaushal
(In reply to comment #21) > Rejy, > From the updates on the related bug(927262) you'd filed, it is clear that > this is not a glusterfs/RHS issue. It would make sense to either close this > bug as notabug (this is not an RHS bug) or move the bug to the correct > product. > > What do you think would be the better option? > > - Kaushal Kaushal, I think that moving this BZ to the correct product and component would be the best option. I will set the product and component similar to Bug 927262. I will also send out a mail about this BZ, to the related mailing list, so that someone from the related group can have a look. - rejy (rmc)
Updates at related BZ - Bug 927262 - mount options of glusterfs not being honoured, while adding POSIX compliant FS Storage Domain at RHEV-M - suggest that the cause of the issue reported in this BZ may not be RHS-related, rather it may be RHEVM-related. So moving this BZ to the relevant product and component, and moving BZ state to 'NEW'
I re-tested the issue on a set-up with the following components: RHEVM 3.2 - 3.2.0-10.21.master.el6ev Hypervisor -RHEL 6.4 with glusterfs-fuse-3.4.0.6rhs-1.el6.x86_64 and glusterfs-3.4.0.6rhs-1.el6.x86_64 installed Red Hat Storage with glusterfs-server-3.4.0.6rhs-1.el6rhs.x86_64 version The 'backupvolfile-server=<secondary RHS server>' mount option of glusterfs still works *only* on manual invocation of the mount command and option on the hypervisor. -------------------------------------------------------- [root@rhs-gp-srv12 ~]# /bin/mount -t glusterfs -o backupvolfile-server=rhs-client45.lab.eng.blr.redhat.com rhs-client15.lab.eng.blr.redhat.com:/RHEV-BigBend /mnt [root@rhs-gp-srv12 ~]# df -Th /mnt Filesystem Type Size Used Avail Use% Mounted on rhs-client45.lab.eng.blr.redhat.com:/RHEV-BigBend fuse.glusterfs 1.2T 206M 1.2T 1% /mnt -------------------------------------------------------- When the option is given over RHEVM, while adding a Storage Domain, the process still fails. The following is seen in the vdsm logs. -------------------------------------------------------- Thread-26253::DEBUG::2013-05-14 12:47:40,213::task::579::TaskManager.Task::(_updateState) Task=`e5444dea-aeb3-435e-9b3b-dd4a278ab26b`::moving from state init -> state preparing Thread-26253::INFO::2013-05-14 12:47:40,213::logUtils::40::dispatcher::(wrapper) Run and protect: validateStorageServerConnection(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': 'rhs-client15.lab.eng.blr.redhat.com:/RHEV-BigBend', 'mnt_options': 'backupvolfile-server=rhs-client45.lab.eng.blr.redhat.com', 'portal': '', 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': '00000000-0000-0000-0000-000000000000'}], options=None) Thread-26253::INFO::2013-05-14 12:47:40,213::logUtils::42::dispatcher::(wrapper) Run and protect: validateStorageServerConnection, Return response: {'statuslist': [{'status': 0, 'id': '00000000-0000-0000-0000-000000000000'}]} Thread-26253::DEBUG::2013-05-14 12:47:40,213::task::1168::TaskManager.Task::(prepare) Task=`e5444dea-aeb3-435e-9b3b-dd4a278ab26b`::finished: {'statuslist': [{'status': 0, 'id': '00000000-0000-0000-0000-000000000000'}]} Thread-26253::DEBUG::2013-05-14 12:47:40,213::task::579::TaskManager.Task::(_updateState) Task=`e5444dea-aeb3-435e-9b3b-dd4a278ab26b`::moving from state preparing -> state finished Thread-26253::DEBUG::2013-05-14 12:47:40,213::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-26253::DEBUG::2013-05-14 12:47:40,213::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-26253::DEBUG::2013-05-14 12:47:40,214::task::974::TaskManager.Task::(_decref) Task=`e5444dea-aeb3-435e-9b3b-dd4a278ab26b`::ref 0 aborting False Thread-26254::DEBUG::2013-05-14 12:47:40,273::BindingXMLRPC::161::vds::(wrapper) [10.70.34.108] Thread-26254::DEBUG::2013-05-14 12:47:40,273::task::579::TaskManager.Task::(_updateState) Task=`206c2ea5-dd67-4825-b544-c44e7566c974`::moving from state init -> state preparing Thread-26254::INFO::2013-05-14 12:47:40,274::logUtils::40::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': 'rhs-client15.lab.eng.blr.redhat.com:/RHEV-BigBend', 'mnt_options': 'backupvolfile-server=rhs-client45.lab.eng.blr.redhat.com', 'portal': '', 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': '00000000-0000-0000-0000-000000000000'}], options=None) Thread-26254::DEBUG::2013-05-14 12:47:40,278::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t glusterfs -o backupvolfile-server=rhs-client45.lab.eng.blr.redhat.com rhs-client15.lab.eng.blr.redhat.com:/RHEV-BigBend /rhev/data-center/mnt/rhs-client15.lab.eng.blr.redhat.com:_RHEV-BigBend' (cwd None) Thread-26254::ERROR::2013-05-14 12:47:40,462::hsm::2300::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2297, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 208, in connect fileSD.validateDirAccess(self.getMountObj().getRecord().fs_file) File "/usr/share/vdsm/storage/mount.py", line 244, in getRecord (self.fs_spec, self.fs_file)) OSError: [Errno 2] Mount of `rhs-client15.lab.eng.blr.redhat.com:/RHEV-BigBend` at `/rhev/data-center/mnt/rhs-client15.lab.eng.blr.redhat.com:_RHEV-BigBend` does not exist -------------------------------------------------------- Screen-shots from RHEVM, and vdsm logs from hypervisor being attached to this BZ
I believe the mount itself actually succeeded, however, vdsm has an assumption that the mount will actually point to where it asked and validates directory access after mounting which would fail since the mount points to the backup server.
Hi Rejy, I don't have a glusterfs environment. Do you think you can help with verifying the patch for this bug (http://gerrit.ovirt.org/#/c/16534/)? Thanks, Yeela
(In reply to Yeela Kaplan from comment #29) > Hi Rejy, > > I don't have a glusterfs environment. > Do you think you can help with verifying the patch for this bug > (http://gerrit.ovirt.org/#/c/16534/)? > > Thanks, > > Yeela Yeela, I would be glad to help you with verifying the patch. But I would need the patch incorporated into an rpm, with which I can update my RHEV environment, and test it. And would I need to just update the Hypervisors for the test ? - rejy (rmc)
we do not have glusterFS, we need to configure or find one then I will ACK.
(In reply to Aharon Canan from comment #31) > we do not have glusterFS, we need to configure or find one > then I will ACK. Please see comment 30 I am ready to help you guys :-) - rejy (rmc)
(In reply to Rejy M Cyriac from comment #32) > (In reply to Aharon Canan from comment #31) > > we do not have glusterFS, we need to configure or find one > > then I will ACK. > > Please see comment 30 > > I am ready to help you guys :-) > > - rejy (rmc) thanks, we need to test it using official build and not only the patch. Let's wait for official and check then.
Note that in RHS 2.1 this has been replaced by the option "backup-volfile-servers=<server-name1>:<server-name2>:..." Usage: mount -t glusterfs -obackup-volfile-servers=<server2>: \ <server3>:...:<serverN> <server1>:/<volname> <mount_point>
Itamar, Please be advised that the option 'backupvolfile-server' has now been changed to 'backup-volfile-servers' . The following documents this. https://access.redhat.com/site/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/sect-Administration_Guide-GlusterFS_Client-GlusterFS_Client-Mounting_Volumes.html You might want to use the new option in the doc-text for this BZ. - rejy (rmc)
(In reply to Rejy M Cyriac from comment #39) > Itamar, > > Please be advised that the option 'backupvolfile-server' has now been > changed to 'backup-volfile-servers' . The following documents this. > > https://access.redhat.com/site/documentation/en-US/Red_Hat_Storage/2.1/html/ > Administration_Guide/sect-Administration_Guide-GlusterFS_Client- > GlusterFS_Client-Mounting_Volumes.html > > You might want to use the new option in the doc-text for this BZ. > > - rejy (rmc) It looks like backward compatibility is coming up, to prevent regression due to the change - BZ 1023950
Should only be implemented in a glustfs domain (as the abandoned patch suggests)
This option should be add to the gluster connection details.
*** Bug 1177777 has been marked as a duplicate of this bug. ***
Required change: Engine currently allow gluster mounts on only one node to provide images to datacenter. This feature is to remove this limitation.
Ala, please provide some doctext for this feature. Thanks!
Verified with 3.6.0.20
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html