Bug 1217576
Summary: | [HC] Gluster volume locks the whole cluster | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Christopher Pereira <kripper> |
Component: | core | Assignee: | bugs <bugs> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | amukherj, amureini, bugs, dfediuck, gluster-bugs, kripper, nsoffer, sabose, sasundar, sbonazzo, ylavi |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-05-25 21:36:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1175354 |
Description
Christopher Pereira
2015-04-30 17:21:42 UTC
Using statedump, we found out that Sanlock was holding leases. This was a replica-2 volume which is not supported by Sanlock. Please mark this bug as "Won't Fix" or as a "Sanlock + Gluster" dup. Just in case you want to support replica-2 in the future, the problem seems to be that Sanlock takes and holds the lease, but fails itself because the file is locked at the glusterfs level. A Gluster Statedump reveals this locks: [xlator.features.locks.vdisks-locks.inode] path=/ba7be27f-aee5-4436-ae9a-0764f551f9a7/dom_md/ids mandatory=0 conn.1.id=<Host H5>-3016-2015/05/01-17:54:57:109200-vdisks-client-0-0-0 conn.1.ref=1 --- NOTE: conn.1 is Sanlock after a reboot --- --- 17:54:45 is UTC-0 = 14:54:41 is UTC-3 = last reboot --- conn.1.bound_xl=/mnt/disk1/gluster-bricks/vdisks conn.2.id=<Host H6>-3369-2015/04/30-05:40:59:928550-vdisks-client-0-0-0 conn.2.ref=1 conn.2.bound_xl=/mnt/disk1/gluster-bricks/vdisks conn.3.id=<Host H4>-31780-2015/04/30-05:57:15:152009-vdisks-client-0-0-0 conn.3.ref=1 conn.3.bound_xl=/mnt/disk1/gluster-bricks/vdisks conn.4.id=<Host H6>-16034-2015/04/30-16:40:26:355759-vdisks-client-0-0-0 conn.4.ref=1 conn.4.bound_xl=/mnt/disk1/gluster-bricks/vdisks Sanlock was failing before and after the reboot (but took the lease). [...] 2015-04-30 15:11:55-0300 15473 [4535]: s732 add_lockspace fail result -5 2015-04-30 15:12:04-0300 15482 [4535]: s733 lockspace ba7be27f-aee5-4436-ae9a-0764f551f9a7:2:/rhev/data-center/mnt/glusterSD/h4.imatronix.com:vdisks/ba7be27f-aee5-4436-ae9a-0764f551f9a7/dom_md/ids:0 2015-04-30 15:12:05-0300 15482 [27586]: ba7be27f aio collect 0 0x7f8ac80008c0:0x7f8ac80008d0:0x7f8ac8101000 result -5:0 match res 2015-04-30 15:12:05-0300 15482 [27586]: read_sectors delta_leader offset 512 rv -5 /rhev/data-center/mnt/glusterSD/h4.imatronix.com:vdisks/ba7be27f-aee5-4436-ae9a-0764f551f9a7/dom_md/ids 2015-04-30 15:12:05-0300 15483 [4535]: s733 add_lockspace fail result -5 ---- Here Sanlock Started ---- NOTE: 14:54:41 is UTC-3 and it's the same time when the lease was taken in gluster --- 2015-05-01 14:54:41-0300 1 [637]: sanlock daemon started 3.2.2 host e727194d-c2cb-4785-bd79-24277674bd2c.h5.imatron 2015-05-01 14:54:58-0300 19 [644]: s1 lockspace 3233144b-7be1-445f-9ea6-6aebbacbb93f:2:/rhev/data-center/mnt/glusterSD/h4.imatronix.com:vdisks-ssd/3233144b-7be1-445f-9ea6-6aebbacbb93f/dom_md/ids:0 2015-05-01 14:54:58-0300 19 [643]: s2 lockspace ba7be27f-aee5-4436-ae9a-0764f551f9a7:2:/rhev/data-center/mnt/glusterSD/h4.imatronix.com:vdisks/ba7be27f-aee5-4436-ae9a-0764f551f9a7/dom_md/ids:0 2015-05-01 14:54:58-0300 19 [3300]: ba7be27f aio collect 0 0x7f0f280008c0:0x7f0f280008d0:0x7f0f28001000 result -5:0 match res 2015-05-01 14:54:58-0300 19 [3300]: read_sectors delta_leader offset 512 rv -5 /rhev/data-center/mnt/glusterSD/h4.imatronix.com:vdisks/ba7be27f-aee5-4436-ae9a-0764f551f9a7/dom_md/ids 2015-05-01 14:54:59-0300 20 [643]: s2 add_lockspace fail result -5 2015-05-01 14:55:08-0300 29 [643]: s3 lockspace ba7be27f-aee5-4436-ae9a-0764f551f9a7:2:/rhev/data-center/mnt/glusterSD/h4.imatronix.com:vdisks/ba7be27f-aee5-4436-ae9a-0764f551f9a7/dom_md/ids:0 Anyway, on the brick logs we see since some days before: 2015-04-26 17:20:36.773252] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 544: READ => -1 (Input/output error) [2015-04-26 17:20:46.740232] W [fuse-bridge.c:1262:fuse_err_cbk] 0-glusterfs-fuse: 570: REMOVEXATTR() /__DIRECT_IO_TEST__ => -1 (No data available) [2015-04-26 17:20:46.808687] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 592: READ => -1 (Input/output error) [2015-04-26 17:20:56.821828] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 613: READ => -1 (Input/output error) [2015-04-26 17:21:06.834420] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 629: READ => -1 (Input/output error) [2015-04-26 17:21:16.835813] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 645: READ => -1 (Input/output error) [2015-04-26 17:21:27.088096] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 661: READ => -1 (Input/output error) [2015-04-26 17:21:37.071644] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 677: READ => -1 (Input/output error) [2015-04-26 17:21:47.274038] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 693: READ => -1 (Input/output error) [2015-04-26 17:21:57.236858] W [fuse-bridge.c:1262:fuse_err_cbk] 0-glusterfs-fuse: 739: REMOVEXATTR() /__DIRECT_IO_TEST__ => -1 (No data available) [2015-04-26 17:21:57.306904] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 761: READ => -1 (Input/output error) [2015-04-26 17:22:07.337893] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 790: READ => -1 (Input/output error) [2015-04-26 17:22:17.331690] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 806: READ => -1 (Input/output error) [2015-04-26 17:22:27.343796] W [fuse-bridge.c:2190:fuse_readv_cbk] 0-glusterfs-fuse: 822: READ => -1 (Input/output error) This comment0, says the version of glusterfs as 3.8dev and so this bug should be in product 'glusterfs" and not in "Red Hat Gluster Storage" Moving it accordingly Did you have the same issue with a replica-3 volume as well? Hi Sahina, I'm not sure, because it seems like we corrupted the volume at some point and got overwhelmed with many different issues and workarounds, so testing again will be required. I didn't see this issue again. I will close and reopen if necessary. Thanks. |