Bug 1706842

Summary: Hard Failover with Samba and Glusterfs fails
Product: [Community] GlusterFS Reporter: david.spisla
Component: gluster-smbAssignee: Anoop C S <anoopcs>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 5CC: anoopcs, bugs, gdeschner, info, ryan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:30:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backtrace of the SMBD and GLUSTER communication
none
Logfiles from all nodes of glusterfs-plugin (SMB) none

Description david.spisla 2019-05-06 11:39:59 UTC
Created attachment 1564378 [details]
Backtrace of the SMBD and GLUSTER communication

Description of problem:

I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to access the volumes via vfs-glusterfs-plugin (each node has a VIP)

I was testing this failover scenario:

1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to node1
2. During the write process I hardly shutdown node1  (where the client is connect via VIP) by turn off the power

My expectation is, that the write process stops and after a while the Win10 Client offers me a Retry, so I can continue the write on different node (which has now the VIP of node1). In past time I did this observation (with Gluster v3.12), but now the system shows a strange bahaviour:

The Win10 Client do nothing and the Explorer freezes, in the backend CTDB can not perform the failover and throws errors. The glusterd from node2 and node3 logs this messages:

[2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held
[2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1
[2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held
[2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2
[2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held
[2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage

In my oponion Samba/CTDB can not perform the failover correctly and continue the write process because glusterfs didn't released the lock. But its not clear to me

Additional info:
I made a network trace on the Windows machine.
There it is visible that the client tries several times a TreeConnect.
This Tree Connect is the connection to a share. Samba answers this attempt with NT_STATUS_UNSUCCESSFUL, which was unfortunately a not very meaningful message.

Similarly, I "caught" the smbd in the debugger and was able to pull a backtrace while hangs in the futex-call we found in / proc / <pid> / stack. The backtrace smbd-gluster-bt.txt (attached) shows that the smbd hangs in the gluster module. You can see in Frame 9 that Samba is hanging in the TCON (smbd_smb2_tree_connect). In frame 2 the function appears
glfs_init () whose call you can find in source3 / modules / vfs_glusterfs.c, line 342 (in samba master). Then comes another frame in the gluster-lib and then immediately the pthread_condwait call, which ends up in the kernel in a futex call (see / proc / <pid> / stack).

Quintessence: Samba is waiting for gluster, and obviously pretty much 3 seconds. Gluster then gives an error and the client tries again. And obviously for 8 minutes.

Comment 1 david.spisla 2019-05-06 11:43:21 UTC
Here is the Volume configuration:
Volume Name: archive1
Type: Replicate
Volume ID: 0ed37705-e817-49c6-95c8-32f4931b597a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: fs-sernet-c2-n1:/gluster/brick1/glusterbrick
Brick2: fs-sernet-c2-n2:/gluster/brick1/glusterbrick
Brick3: fs-sernet-c2-n3:/gluster/brick1/glusterbrick
Brick4: fs-sernet-c2-n4:/gluster/brick1/glusterbrick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
user.smb: disable
features.read-only: off
features.worm: off
features.worm-file-level: on
features.retention-mode: enterprise
features.default-retention-period: 120
network.ping-timeout: 10
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.nl-cache: on
performance.nl-cache-timeout: 600
client.event-threads: 32
server.event-threads: 32
cluster.lookup-optimize: on
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-ima-xattrs: on
performance.io-thread-count: 64
cluster.use-compound-fops: on
performance.cache-size: 512MB
performance.cache-refresh-timeout: 10
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
storage.build-pgfid: on
features.utime: on
storage.ctime: on
cluster.quorum-type: fixed
cluster.quorum-count: 2
features.bitrot: on
features.scrub: Active
features.scrub-freq: daily
cluster.enable-shared-storage: enable

Comment 2 david.spisla 2019-05-07 10:11:28 UTC
Additional information: In the section "Description of problem" above there are shown log entries from glusterd while failover happens. These logs are from 2019-04-16. But the backtrace was created on 2019-04-30 and the attached logs of the glusterfs-plugin from all nodes contains information from 2019-04-30. Don't get irritated! The messages in glusterd are reproducible so one can find them also in 2019-04-30.

Comment 3 david.spisla 2019-05-07 10:14:19 UTC
Created attachment 1565074 [details]
Logfiles from all nodes of glusterfs-plugin (SMB)

Comment 4 david.spisla 2019-06-06 06:44:04 UTC
Additional Information: My setup was a 4-Node Cluster with VM machines (VmWare)

Comment 5 Anoop C S 2019-11-18 06:23:55 UTC
Did you get a chance to test this situation with later GlusterFS and/or Samba releases?

Comment 6 david.spisla 2019-11-18 15:57:57 UTC
No, not yet unfortunately

Comment 7 Worker Ant 2020-03-12 12:30:26 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/897, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 8 Red Hat Bugzilla 2023-09-14 05:28:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days