Bug 1329466 - Gluster brick got inode-locked and freeze the whole cluster
Summary: Gluster brick got inode-locked and freeze the whole cluster
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: 3.7.10
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1330997 1344836 1360576 1361402
TreeView+ depends on / blocked
 
Reported: 2016-04-22 02:50 UTC by Chen Chen
Modified: 2023-09-14 03:21 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-08-19 07:39:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
gluster volume info (1.33 KB, text/plain)
2016-04-22 02:50 UTC, Chen Chen
no flags Details
gluster volume statedump [nfs] when the volume is frozen (5.88 MB, application/x-xz)
2016-04-22 02:52 UTC, Chen Chen
no flags Details
/var/log/glusterfs of sm11 (whose brick reported blocked) after "start force" (3.26 MB, application/x-xz)
2016-04-22 02:55 UTC, Chen Chen
no flags Details

Description Chen Chen 2016-04-22 02:50:14 UTC
Created attachment 1149632 [details]
gluster volume info

Description of problem:

Node is Distributed-Disperse 2 x (4 + 2) = 12.

When I make parallel writing to the same file on the volume, the brick will occasionally got locked down and freeze the cluster. Sometimes one of the peer's OS also become unreachable via ssh (could still reached by ping).

"volume status" reports all bricks are online. (even if it cannot be sshed)

"volume start force" (suggested by <aspandey>) could resume the cluster, if and only if all peers are reachable via ssh. Otherwise, it reports operation timed out.

I've discussed this in the mailing list:
http://www.gluster.org/pipermail/gluster-users/2016-April/026122.html

How reproducible:

Occasionally under heavy parallel IO load. Met 4 times in the last month.

Additional info:

Snapshot of inode-lock in statedump:
[xlator.features.locks.mainvol-locks.inode]
path=<gfid:2092ae08-81de-4717-a7d5-6ad955e18b58>/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf
mandatory=0
inodelk-count=2
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc3dbfac887f0000, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, granted at 2016-04-21 11:45:30
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=d433bfac887f0000, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, blocked at 2016-04-21 11:45:33

Comment 1 Chen Chen 2016-04-22 02:52:00 UTC
Created attachment 1149635 [details]
gluster volume statedump [nfs] when the volume is frozen

Comment 2 Chen Chen 2016-04-22 02:55:54 UTC
Created attachment 1149637 [details]
/var/log/glusterfs of sm11 (whose brick reported blocked) after "start force"

Comment 3 Niels de Vos 2016-04-26 12:44:09 UTC
Could you provide the script or program that does the parallel I/O on the same file? Are you executing this from one client system, or from multiple clients?

Comment 4 Chen Chen 2016-04-27 04:27:17 UTC
I was running GATK CombineVariants (multi-threaded mode, -nt 16) when I noticed this inode lock. I executed this from one client system.
https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

Besides, in another lock (which failed to recover from "start force"), I was running GATK HaplotypeCaller (single-threaded) from multiple clients.

The following is the statedump snapshot from this lock. I don't really know why there were *write* operation on GATK jar. Both the brick and the volume were mounted with noatime flag, and "ls -la" showed it has not been modified since I downloaded the jar.

[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
mandatory=0
inodelk-count=4
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09

Comment 5 Chen Chen 2016-05-12 02:40:47 UTC
Any scheduled update? I don't know what is the GlusterFS equivalent of RHGS 3.1.3.

Again, another tight lock here. Couldn't release by force start. So I cold-reset the infected node (which has a noticable huge 1min-load).

[xlator.features.locks.mainvol-locks.inode]
path=<gfid:62adaa3a-a1b8-458c-964f-5742f942cd0f>/.WGC037694D_combined_R1.fastq.gz.gzT9LR
mandatory=0
inodelk-count=38
lock-dump.domain.domain=mainvol-disperse-1:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=c4dd93ee0c7f0000, client=0x7f3eb4887cc0, connection-id=hw10-18694-2016/05/02-05:57:47:620063-mainvol-client-6-0, blocked at 2016-05-11 08:22:56, granted at 2016-05-11 08:27:07
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=98acff20d77f0000, client=0x7f3ea869e250, connection-id=sm16-23349-2016/05/02-05:57:01:49902-mainvol-client-6-0, blocked at 2016-05-11 08:27:07
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=8004a958097f0000, client=0x7f3ebc66e630, connection-id=sm15-28555-2016/05/02-05:57:53:27608-mainvol-client-6-0, blocked at 2016-05-11 08:27:07
......(tailored)

Comment 6 Chen Chen 2016-07-15 03:33:25 UTC
Got another block here. I'm a bit puzzled.

I'm only initializing *ONE* rsync process on *ONE* client connecting to *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?

[xlator.features.locks.mainvol-locks.inode]
path=/home/support/bak2t/camel.alone/mapping/camel62/.camel62.clean.r1.fastq.SnlBzz
mandatory=0
inodelk-count=8
lock-dump.domain.domain=dht.file.migrate
lock-dump.domain.domain=mainvol-disperse-1:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=a85c3672497f0000, client=0x7f3910091b20, connection-id=sm14-20329-2016/06/28-05:35:35:487901-mainvol-client-6-0, blocked at 2016-07-14 11:44:42, granted at 2016-07-14 12:13:25
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=606f8a77c07f0000, client=0x7f39101d1c80, connection-id=sm13-14349-2016/06/28-05:35:35:486931-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=2cbb10b1b57f0000, client=0x7f3910004ca0, connection-id=hw10-63151-2016/06/28-05:35:33:427463-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=28b725b5337f0000, client=0x7f39100ce4c0, connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=c8e18ca77c7f0000, client=0x7f391024c340, connection-id=sm16-16031-2016/06/28-05:35:35:487112-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=08026bee887f0000, client=0x7f391c701700, connection-id=sm15-29608-2016/06/28-05:35:35:523099-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=005ad35f5b7f0000, client=0x7f39102b24f0, connection-id=sm12-22762-2016/06/28-05:35:35:487941-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=507f26b5337f0000, client=0x7f39100ce4c0, connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
lock-dump.domain.domain=mainvol-disperse-1

Comment 7 Atin Mukherjee 2016-07-15 04:52:31 UTC
(In reply to Chen Chen from comment #6)
> Got another block here. I'm a bit puzzled.
> 
> I'm only initializing *ONE* rsync process on *ONE* client connecting to
> *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
> 
> [xlator.features.locks.mainvol-locks.inode]
> path=/home/support/bak2t/camel.alone/mapping/camel62/.camel62.clean.r1.fastq.
> SnlBzz
> mandatory=0
> inodelk-count=8
> lock-dump.domain.domain=dht.file.migrate
> lock-dump.domain.domain=mainvol-disperse-1:self-heal
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=a85c3672497f0000, client=0x7f3910091b20,
> connection-id=sm14-20329-2016/06/28-05:35:35:487901-mainvol-client-6-0,
> blocked at 2016-07-14 11:44:42, granted at 2016-07-14 12:13:25
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=606f8a77c07f0000, client=0x7f39101d1c80,
> connection-id=sm13-14349-2016/06/28-05:35:35:486931-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=2cbb10b1b57f0000, client=0x7f3910004ca0,
> connection-id=hw10-63151-2016/06/28-05:35:33:427463-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=28b725b5337f0000, client=0x7f39100ce4c0,
> connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=c8e18ca77c7f0000, client=0x7f391024c340,
> connection-id=sm16-16031-2016/06/28-05:35:35:487112-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=08026bee887f0000, client=0x7f391c701700,
> connection-id=sm15-29608-2016/06/28-05:35:35:523099-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=005ad35f5b7f0000, client=0x7f39102b24f0,
> connection-id=sm12-22762-2016/06/28-05:35:35:487941-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=507f26b5337f0000, client=0x7f39100ce4c0,
> connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> lock-dump.domain.domain=mainvol-disperse-1

Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?

Comment 8 Chen Chen 2016-07-15 05:09:48 UTC
(In reply to Atin Mukherjee from comment #7)
> (In reply to Chen Chen from comment #6)
> > Got another block here. I'm a bit puzzled.
> > 
> > I'm only initializing *ONE* rsync process on *ONE* client connecting to
> > *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
>
> Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?

Yes, I'm still on 3.7.10.

Are you sure this one is touched in updates? The two #BZs blocked by this one are still in ON_QA/POST status. If so, I'll schedule a down time.

I'm hesitant to upgrade, fearing it might introduce some new bugs.

Comment 9 Atin Mukherjee 2016-07-15 05:16:00 UTC
(In reply to Chen Chen from comment #8)
> (In reply to Atin Mukherjee from comment #7)
> > (In reply to Chen Chen from comment #6)
> > > Got another block here. I'm a bit puzzled.
> > > 
> > > I'm only initializing *ONE* rsync process on *ONE* client connecting to
> > > *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
> >
> > Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?
> 
> Yes, I'm still on 3.7.10.
> 
> Are you sure this one is touched in updates? The two #BZs blocked by this
> one are still in ON_QA/POST status. If so, I'll schedule a down time.
> 
> I'm hesitant to upgrade, fearing it might introduce some new bugs.

Well, fix for 1344836 is definitely in mainline, but not in 3.7 branch. 

@Pranith - Do you mind to backport this to 3.7?

Comment 10 Chen Chen 2016-07-15 05:23:38 UTC
(In reply to Atin Mukherjee from comment #9)
> (In reply to Chen Chen from comment #8)
> > (In reply to Atin Mukherjee from comment #7)
> > > (In reply to Chen Chen from comment #6)
> > > > Got another block here. I'm a bit puzzled.
> > > > 
> > > > I'm only initializing *ONE* rsync process on *ONE* client connecting to
> > > > *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
> > >
> > > Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?
> > 
> > Yes, I'm still on 3.7.10.
> > 
> > Are you sure this one is touched in updates? The two #BZs blocked by this
> > one are still in ON_QA/POST status. If so, I'll schedule a down time.
> > 
> > I'm hesitant to upgrade, fearing it might introduce some new bugs.
> 
> Well, fix for 1344836 is definitely in mainline, but not in 3.7 branch. 
> 
> @Pranith - Do you mind to backport this to 3.7?

I'm willing to jump to 3.8, if the upgrade process won't cause much turbulence (such as extensive configuration modification, possible data loss, etc.).

Comment 11 Chen Chen 2016-07-18 05:16:20 UTC
Error: Package: glusterfs-ganesha-3.8.1-1.el7.x86_64 (centos-gluster38)
           Requires: nfs-ganesha-gluster
 You could try using --skip-broken to work around the problem

So I'll fallback to gluster-native-nfs3 first.

Comment 12 Chen Chen 2016-08-19 07:39:00 UTC
After updated to 3.8.2, the cluster still sometimes hangs. However these is no longer "inodelk lock" in statedump, so I figure it is another bug. I'll close this bug report and open a new one.

Comment 13 Red Hat Bugzilla 2023-09-14 03:21:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.