Created attachment 1512961 [details] logs Description of problem: Cannot create a Gluster storage domain Version-Release number of selected component (if applicable): Gluster: glusterfs-5.1-1.el7.x86_64 glusterfs-server-5.1-1.el7.x86_64 Hypervisor: vdsm-4.30.3-1.el7ev.x86_64 libvirt-4.5.0-10.el7_6.3.x86_64 sanlock-3.6.0-1.el7.x86_64 selinux-policy-3.13.1-229.el7_6.6.noarch kernel - 3.10.0-957.1.3.el7.x86_64 glusterfs-3.12.2-18.el7.x86_64 Engine: ovirt-engine-4.3.0-0.5.alpha1.el7.noarch How reproducible: Always Steps to Reproduce: 1. Create a Gluster replica3 volume and set volume owner user and group to 36:36 2. Create a storage domain with that volume Actual results: 2018-12-10 10:22:10,046+0200 INFO (jsonrpc/2) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0' (stor ageServer:168) 2018-12-10 10:22:10,046+0200 INFO (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0 mode: None (fileUtils:199) 2018-12-10 10:22:10,047+0200 INFO (jsonrpc/2) [storage.Mount] mounting gluster01.scl.lab.tlv.redhat.com:/storage_local_ge6_volume_0 at /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__ local__ge6__volume__0 (mount:204) 2018-12-10 10:22:10,451+0200 ERROR (jsonrpc/2) [storage.HSM] Could not connect to storageServer (hsm:2413) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2410, in connectStorageServer conObj.connect() File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 180, in connect six.reraise(t, v, tb) File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 172, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount cgroup=cgroup) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in mount File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) MountError: (1, ';Running scope as unit run-20541.scope.\nMount failed. Please check the log file for more details.\n') Expected results: Should work Additional info:
The same on 4.2.8 (vdsm-4.20.43-1.el7ev.x86_64) Hypervisors are RHEL7.6
After discussing with Elad, this occurs also when directly mounting from the host so it's a Gluster bug
Moving to RHGS. Not sure the component I set is correct though.
RHGS does not provide glusterfs-server-5.1 so this must be a Gluster Community bug. Moving it there now. On the hypervisor you are using an older version (3.12) of Gluster than on the storage servers (5.1). I am not aware (or can not remember) issues that give problems between these versions. But in general we recommend to use the same version everywhere. Is there a reason you have upgraded the storage servers before the clients?
From log looks like glusterd is not up. error: Connection failed. Please check if gluster daemon is operational. return code: 1 (storageServer:332) 2018-12-10 10:22:10,046+0200 INFO (jsonrpc/2) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0' (storageServer:168) 2018-12-10 10:22:10,046+0200 INFO (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0 mode: None (fileUtils:199) 2018-12-10 10:22:10,047+0200 INFO (jsonrpc/2) [storage.Mount] mounting gluster01.scl.lab.tlv.redhat.com:/storage_local_ge6_volume_0 at /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0 (mount:204) 2018-12-10 10:22:10,451+0200 ERROR (jsonrpc/2) [storage.HSM] Could not connect to storageServer (hsm:2413) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2410, in connectStorageServer conObj.connect() File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 180, in connect six.reraise(t, v, tb) File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 172, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount cgroup=cgroup) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File "<string>", line 2, in mount File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) Can you please check glusterd is up and running?
(In reply to Niels de Vos from comment #5) > Is there a reason you have upgraded the storage before the clients? No, just installed latest Gluster on the servers and on the clinet we have what vdsm (RHV) requires, which is 3.12 (In reply to Gobinda Das from comment #6) > Can you please check glusterd is up and running? Glusterd is running on all the peers: [root@gluster01 ~]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2018-12-09 11:38:11 IST; 1 day 22h ago Process: 14607 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 14608 (glusterd) CGroup: /system.slice/glusterd.service(In reply to Niels de Vos from comment #5)
Team, we just made few fixes to glusterfs-5.x series, and are in the process of next glusterfs release (5.4.1), can we upgrade to 5.4+ release and see if the issue persists?
Did the newer releases of glusterfs work? We have fixed some issues with glusterfs 5.x series, and made 5.6 release.
Elad, can you check and update?
re-assigning need info to Avihai
Hi Sahina, We currently have glusterfs-server-3.12.6-1.el7.x86_64, last time Elad tried to upgrade and due to this bug broke our/QE gluster mount's. Then he needed to downgrade/reinstall the gluster back to 3.12 so it could work. I do not want to go through this again, do you happen to have a gluster in 5.1 or higher ENV and I'll try to reproduce it from there?
Yossi, please upgrade our(Raanana site) gluster to latest upstream (V6) and see if this issue reproduces.
Gluster upgraded to version 6.3: glusterfs-libs-6.3-1.el7.x86_64 glusterfs-fuse-6.3-1.el7.x86_64 glusterfs-client-xlators-6.3-1.el7.x86_64 glusterfs-api-6.3-1.el7.x86_64 glusterfs-cli-6.3-1.el7.x86_64 glusterfs-6.3-1.el7.x86_64 glusterfs-server-6.3-1.el7.x86_64 Tried to reproduce this scenario and all went fine. From the VDSM log: 2019-06-18 17:37:31,886+0300 INFO (jsonrpc/2) [storage.Mount] mounting gluster01.scl.lab.tlv.redhat.com:/storage_local_ge6_volume_3 at /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__3 (mount:204) 2019-06-18 17:37:32,395+0300 DEBUG (check/loop) [storage.check] START check '/dev/c5f7e0ee-b117-4f62-8d2d-bcda1f61bd08/metadata' (delay=0.00) (check:289) 2019-06-18 17:37:32,435+0300 DEBUG (jsonrpc/2) [storage.Mount] /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__3 mounted: 0.55 seconds (utils:454)