1657645 – [Glusterfs-server-5.1] Gluster storage domain creation fails on MountError

Bug 1657645 - [Glusterfs-server-5.1] Gluster storage domain creation fails on MountError

Summary: [Glusterfs-server-5.1] Gluster storage domain creation fails on MountError

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Amar Tumballi
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-10 08:32 UTC by Elad
Modified:	2019-07-02 03:55 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-6.3
Clone Of:
Environment:
Last Closed:	2019-07-02 03:55:54 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:
Flags:	aefrat: needinfo- aefrat: needinfo-

Attachments	(Terms of Use)
logs (1.53 MB, application/gzip) 2018-12-10 08:32 UTC, Elad	no flags	Details
View All

Description Elad 2018-12-10 08:32:52 UTC

Created attachment 1512961 [details]
logs

Description of problem:
Cannot create a Gluster storage domain

Version-Release number of selected component (if applicable):
Gluster:
glusterfs-5.1-1.el7.x86_64
glusterfs-server-5.1-1.el7.x86_64

Hypervisor:
vdsm-4.30.3-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.3.x86_64
sanlock-3.6.0-1.el7.x86_64
selinux-policy-3.13.1-229.el7_6.6.noarch
kernel - 3.10.0-957.1.3.el7.x86_64
glusterfs-3.12.2-18.el7.x86_64

Engine:
ovirt-engine-4.3.0-0.5.alpha1.el7.noarch

How reproducible:
Always


Steps to Reproduce:
1. Create a Gluster replica3 volume and set volume owner user and group to 36:36
2. Create a storage domain with that volume 


Actual results:

2018-12-10 10:22:10,046+0200 INFO  (jsonrpc/2) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0' (stor
ageServer:168)
2018-12-10 10:22:10,046+0200 INFO  (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0 mode: None (fileUtils:199)
2018-12-10 10:22:10,047+0200 INFO  (jsonrpc/2) [storage.Mount] mounting gluster01.scl.lab.tlv.redhat.com:/storage_local_ge6_volume_0 at /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__
local__ge6__volume__0 (mount:204)
2018-12-10 10:22:10,451+0200 ERROR (jsonrpc/2) [storage.HSM] Could not connect to storageServer (hsm:2413)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2410, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 180, in connect
    six.reraise(t, v, tb)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 172, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount
    cgroup=cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in mount
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
MountError: (1, ';Running scope as unit run-20541.scope.\nMount failed. Please check the log file for more details.\n')


Expected results:
Should work


Additional info:

Comment 1 Elad 2018-12-10 08:38:31 UTC

The same on 4.2.8 (vdsm-4.20.43-1.el7ev.x86_64)

Hypervisors are RHEL7.6

Comment 2 Tal Nisan 2018-12-10 09:04:46 UTC

After discussing with Elad, this occurs also when directly mounting from the host so it's a Gluster bug

Comment 3 Elad 2018-12-10 14:50:46 UTC

Moving to RHGS. Not sure the component I set is correct though.

Comment 5 Niels de Vos 2018-12-10 16:00:10 UTC

RHGS does not provide glusterfs-server-5.1 so this must be a Gluster Community bug. Moving it there now.

On the hypervisor you are using an older version (3.12) of Gluster than on the storage servers (5.1). I am not aware (or can not remember) issues that give problems between these versions. But in general we recommend to use the same version everywhere. Is there a reason you have upgraded the storage servers before the clients?

Comment 6 Gobinda Das 2018-12-11 05:15:13 UTC

From log looks like glusterd is not up.
error: Connection failed. Please check if gluster daemon is operational.
return code: 1 (storageServer:332)
2018-12-10 10:22:10,046+0200 INFO  (jsonrpc/2) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0' (storageServer:168)
2018-12-10 10:22:10,046+0200 INFO  (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0 mode: None (fileUtils:199)
2018-12-10 10:22:10,047+0200 INFO  (jsonrpc/2) [storage.Mount] mounting gluster01.scl.lab.tlv.redhat.com:/storage_local_ge6_volume_0 at /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__0 (mount:204)
2018-12-10 10:22:10,451+0200 ERROR (jsonrpc/2) [storage.HSM] Could not connect to storageServer (hsm:2413)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2410, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 180, in connect
    six.reraise(t, v, tb)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 172, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount
    cgroup=cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda>
    **kwargs)
  File "<string>", line 2, in mount
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)

Can you please check glusterd is up and running?

Comment 7 Elad 2018-12-11 07:58:26 UTC

(In reply to Niels de Vos from comment #5)
> Is there a reason you have upgraded the storage before the clients?
No, just installed latest Gluster on the servers and on the clinet we have what vdsm (RHV) requires, which is 3.12



(In reply to Gobinda Das from comment #6)
> Can you please check glusterd is up and running?


Glusterd is running on all the peers:

[root@gluster01 ~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2018-12-09 11:38:11 IST; 1 day 22h ago
  Process: 14607 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 14608 (glusterd)
   CGroup: /system.slice/glusterd.service(In reply to Niels de Vos from comment #5)

Comment 10 Amar Tumballi 2019-03-13 13:41:11 UTC

Team, we just made few fixes to glusterfs-5.x series, and are in the process of next glusterfs release (5.4.1), can we upgrade to 5.4+ release and see if the issue persists?

Comment 11 Amar Tumballi 2019-05-21 04:43:17 UTC

Did the newer releases of glusterfs work? We have fixed some issues with glusterfs 5.x series, and made 5.6 release.

Comment 12 Sahina Bose 2019-06-17 05:34:29 UTC

Elad, can you check and update?

Comment 13 Elad 2019-06-17 08:52:32 UTC

re-assigning need info to Avihai

Comment 14 Avihai 2019-06-17 09:08:28 UTC

Hi Sahina,

We currently have glusterfs-server-3.12.6-1.el7.x86_64, last time Elad tried to upgrade and due to this bug broke our/QE gluster mount's.
Then he needed to downgrade/reinstall the gluster back to 3.12 so it could work.

I do not want to go through this again, do you happen to have a gluster in 5.1 or higher ENV and I'll try to reproduce it from there?

Comment 15 Avihai 2019-06-18 07:36:49 UTC

Yossi, please upgrade our(Raanana site) gluster to latest upstream (V6) and see if this issue reproduces.

Comment 16 Yosi Ben Shimon 2019-06-18 14:48:24 UTC

Gluster upgraded to version 6.3:

glusterfs-libs-6.3-1.el7.x86_64
glusterfs-fuse-6.3-1.el7.x86_64
glusterfs-client-xlators-6.3-1.el7.x86_64
glusterfs-api-6.3-1.el7.x86_64
glusterfs-cli-6.3-1.el7.x86_64
glusterfs-6.3-1.el7.x86_64
glusterfs-server-6.3-1.el7.x86_64

Tried to reproduce this scenario and all went fine.

From the VDSM log:

2019-06-18 17:37:31,886+0300 INFO  (jsonrpc/2) [storage.Mount] mounting gluster01.scl.lab.tlv.redhat.com:/storage_local_ge6_volume_3 at /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__3 (mount:204)
2019-06-18 17:37:32,395+0300 DEBUG (check/loop) [storage.check] START check '/dev/c5f7e0ee-b117-4f62-8d2d-bcda1f61bd08/metadata' (delay=0.00) (check:289)
2019-06-18 17:37:32,435+0300 DEBUG (jsonrpc/2) [storage.Mount] /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge6__volume__3 mounted: 0.55 seconds (utils:454)

Note You need to log in before you can comment on or make changes to this bug.