1706842 – Hard Failover with Samba and Glusterfs fails

Bug 1706842 - Hard Failover with Samba and Glusterfs fails

Summary: Hard Failover with Samba and Glusterfs fails

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	gluster-smb
Sub Component:
Version:	5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Anoop C S
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-06 11:39 UTC by david.spisla
Modified:	2023-09-14 05:28 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-12 12:30:26 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Backtrace of the SMBD and GLUSTER communication (2.78 KB, text/plain) 2019-05-06 11:39 UTC, david.spisla	no flags	Details
Logfiles from all nodes of glusterfs-plugin (SMB) (59.75 KB, application/gzip) 2019-05-07 10:14 UTC, david.spisla	no flags	Details
View All

Description david.spisla 2019-05-06 11:39:59 UTC

Created attachment 1564378 [details]
Backtrace of the SMBD and GLUSTER communication

Description of problem:

I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to access the volumes via vfs-glusterfs-plugin (each node has a VIP)

I was testing this failover scenario:

1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to node1
2. During the write process I hardly shutdown node1  (where the client is connect via VIP) by turn off the power

My expectation is, that the write process stops and after a while the Win10 Client offers me a Retry, so I can continue the write on different node (which has now the VIP of node1). In past time I did this observation (with Gluster v3.12), but now the system shows a strange bahaviour:

The Win10 Client do nothing and the Explorer freezes, in the backend CTDB can not perform the failover and throws errors. The glusterd from node2 and node3 logs this messages:

[2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held
[2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1
[2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held
[2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2
[2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held
[2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage

In my oponion Samba/CTDB can not perform the failover correctly and continue the write process because glusterfs didn't released the lock. But its not clear to me

Additional info:
I made a network trace on the Windows machine.
There it is visible that the client tries several times a TreeConnect.
This Tree Connect is the connection to a share. Samba answers this attempt with NT_STATUS_UNSUCCESSFUL, which was unfortunately a not very meaningful message.

Similarly, I "caught" the smbd in the debugger and was able to pull a backtrace while hangs in the futex-call we found in / proc / <pid> / stack. The backtrace smbd-gluster-bt.txt (attached) shows that the smbd hangs in the gluster module. You can see in Frame 9 that Samba is hanging in the TCON (smbd_smb2_tree_connect). In frame 2 the function appears
glfs_init () whose call you can find in source3 / modules / vfs_glusterfs.c, line 342 (in samba master). Then comes another frame in the gluster-lib and then immediately the pthread_condwait call, which ends up in the kernel in a futex call (see / proc / <pid> / stack).

Quintessence: Samba is waiting for gluster, and obviously pretty much 3 seconds. Gluster then gives an error and the client tries again. And obviously for 8 minutes.

Comment 1 david.spisla 2019-05-06 11:43:21 UTC

Here is the Volume configuration:
Volume Name: archive1
Type: Replicate
Volume ID: 0ed37705-e817-49c6-95c8-32f4931b597a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: fs-sernet-c2-n1:/gluster/brick1/glusterbrick
Brick2: fs-sernet-c2-n2:/gluster/brick1/glusterbrick
Brick3: fs-sernet-c2-n3:/gluster/brick1/glusterbrick
Brick4: fs-sernet-c2-n4:/gluster/brick1/glusterbrick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
user.smb: disable
features.read-only: off
features.worm: off
features.worm-file-level: on
features.retention-mode: enterprise
features.default-retention-period: 120
network.ping-timeout: 10
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.nl-cache: on
performance.nl-cache-timeout: 600
client.event-threads: 32
server.event-threads: 32
cluster.lookup-optimize: on
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-ima-xattrs: on
performance.io-thread-count: 64
cluster.use-compound-fops: on
performance.cache-size: 512MB
performance.cache-refresh-timeout: 10
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
storage.build-pgfid: on
features.utime: on
storage.ctime: on
cluster.quorum-type: fixed
cluster.quorum-count: 2
features.bitrot: on
features.scrub: Active
features.scrub-freq: daily
cluster.enable-shared-storage: enable

Comment 2 david.spisla 2019-05-07 10:11:28 UTC

Additional information: In the section "Description of problem" above there are shown log entries from glusterd while failover happens. These logs are from 2019-04-16. But the backtrace was created on 2019-04-30 and the attached logs of the glusterfs-plugin from all nodes contains information from 2019-04-30. Don't get irritated! The messages in glusterd are reproducible so one can find them also in 2019-04-30.

Comment 3 david.spisla 2019-05-07 10:14:19 UTC

Created attachment 1565074 [details]
Logfiles from all nodes of glusterfs-plugin (SMB)

Comment 4 david.spisla 2019-06-06 06:44:04 UTC

Additional Information: My setup was a 4-Node Cluster with VM machines (VmWare)

Comment 5 Anoop C S 2019-11-18 06:23:55 UTC

Did you get a chance to test this situation with later GlusterFS and/or Samba releases?

Comment 6 david.spisla 2019-11-18 15:57:57 UTC

No, not yet unfortunately

Comment 7 Worker Ant 2020-03-12 12:30:26 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/897, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 8 Red Hat Bugzilla 2023-09-14 05:28:12 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.