Bug 1706842
Summary: | Hard Failover with Samba and Glusterfs fails | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | david.spisla | ||||||
Component: | gluster-smb | Assignee: | Anoop C S <anoopcs> | ||||||
Status: | CLOSED UPSTREAM | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5 | CC: | anoopcs, bugs, gdeschner, info, ryan | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-03-12 12:30:26 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Here is the Volume configuration: Volume Name: archive1 Type: Replicate Volume ID: 0ed37705-e817-49c6-95c8-32f4931b597a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: fs-sernet-c2-n1:/gluster/brick1/glusterbrick Brick2: fs-sernet-c2-n2:/gluster/brick1/glusterbrick Brick3: fs-sernet-c2-n3:/gluster/brick1/glusterbrick Brick4: fs-sernet-c2-n4:/gluster/brick1/glusterbrick Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet user.smb: disable features.read-only: off features.worm: off features.worm-file-level: on features.retention-mode: enterprise features.default-retention-period: 120 network.ping-timeout: 10 features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.nl-cache: on performance.nl-cache-timeout: 600 client.event-threads: 32 server.event-threads: 32 cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 performance.cache-samba-metadata: on performance.cache-ima-xattrs: on performance.io-thread-count: 64 cluster.use-compound-fops: on performance.cache-size: 512MB performance.cache-refresh-timeout: 10 performance.read-ahead: off performance.write-behind-window-size: 4MB performance.write-behind: on storage.build-pgfid: on features.utime: on storage.ctime: on cluster.quorum-type: fixed cluster.quorum-count: 2 features.bitrot: on features.scrub: Active features.scrub-freq: daily cluster.enable-shared-storage: enable Additional information: In the section "Description of problem" above there are shown log entries from glusterd while failover happens. These logs are from 2019-04-16. But the backtrace was created on 2019-04-30 and the attached logs of the glusterfs-plugin from all nodes contains information from 2019-04-30. Don't get irritated! The messages in glusterd are reproducible so one can find them also in 2019-04-30. Created attachment 1565074 [details]
Logfiles from all nodes of glusterfs-plugin (SMB)
Additional Information: My setup was a 4-Node Cluster with VM machines (VmWare) Did you get a chance to test this situation with later GlusterFS and/or Samba releases? No, not yet unfortunately This bug is moved to https://github.com/gluster/glusterfs/issues/897, and will be tracked there from now on. Visit GitHub issues URL for further details The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1564378 [details] Backtrace of the SMBD and GLUSTER communication Description of problem: I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to access the volumes via vfs-glusterfs-plugin (each node has a VIP) I was testing this failover scenario: 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to node1 2. During the write process I hardly shutdown node1 (where the client is connect via VIP) by turn off the power My expectation is, that the write process stops and after a while the Win10 Client offers me a Retry, so I can continue the write on different node (which has now the VIP of node1). In past time I did this observation (with Gluster v3.12), but now the system shows a strange bahaviour: The Win10 Client do nothing and the Explorer freezes, in the backend CTDB can not perform the failover and throws errors. The glusterd from node2 and node3 logs this messages: [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1 [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2 [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage In my oponion Samba/CTDB can not perform the failover correctly and continue the write process because glusterfs didn't released the lock. But its not clear to me Additional info: I made a network trace on the Windows machine. There it is visible that the client tries several times a TreeConnect. This Tree Connect is the connection to a share. Samba answers this attempt with NT_STATUS_UNSUCCESSFUL, which was unfortunately a not very meaningful message. Similarly, I "caught" the smbd in the debugger and was able to pull a backtrace while hangs in the futex-call we found in / proc / <pid> / stack. The backtrace smbd-gluster-bt.txt (attached) shows that the smbd hangs in the gluster module. You can see in Frame 9 that Samba is hanging in the TCON (smbd_smb2_tree_connect). In frame 2 the function appears glfs_init () whose call you can find in source3 / modules / vfs_glusterfs.c, line 342 (in samba master). Then comes another frame in the gluster-lib and then immediately the pthread_condwait call, which ends up in the kernel in a futex call (see / proc / <pid> / stack). Quintessence: Samba is waiting for gluster, and obviously pretty much 3 seconds. Gluster then gives an error and the client tries again. And obviously for 8 minutes.