Bug 1329466

Summary: Gluster brick got inode-locked and freeze the whole cluster
Product: [Community] GlusterFS Reporter: Chen Chen <aflyhorse>
Component: locksAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.10CC: aflyhorse, amukherj, aspandey, bugs, jbyers, ndevos, pkarampu, skoduri
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-19 07:39:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1330997, 1344836, 1360576, 1361402    
Attachments:
Description Flags
gluster volume info
none
gluster volume statedump [nfs] when the volume is frozen
none
/var/log/glusterfs of sm11 (whose brick reported blocked) after "start force" none

Description Chen Chen 2016-04-22 02:50:14 UTC
Created attachment 1149632 [details]
gluster volume info

Description of problem:

Node is Distributed-Disperse 2 x (4 + 2) = 12.

When I make parallel writing to the same file on the volume, the brick will occasionally got locked down and freeze the cluster. Sometimes one of the peer's OS also become unreachable via ssh (could still reached by ping).

"volume status" reports all bricks are online. (even if it cannot be sshed)

"volume start force" (suggested by <aspandey>) could resume the cluster, if and only if all peers are reachable via ssh. Otherwise, it reports operation timed out.

I've discussed this in the mailing list:
http://www.gluster.org/pipermail/gluster-users/2016-April/026122.html

How reproducible:

Occasionally under heavy parallel IO load. Met 4 times in the last month.

Additional info:

Snapshot of inode-lock in statedump:
[xlator.features.locks.mainvol-locks.inode]
path=<gfid:2092ae08-81de-4717-a7d5-6ad955e18b58>/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf
mandatory=0
inodelk-count=2
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc3dbfac887f0000, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, granted at 2016-04-21 11:45:30
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=d433bfac887f0000, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, blocked at 2016-04-21 11:45:33

Comment 1 Chen Chen 2016-04-22 02:52:00 UTC
Created attachment 1149635 [details]
gluster volume statedump [nfs] when the volume is frozen

Comment 2 Chen Chen 2016-04-22 02:55:54 UTC
Created attachment 1149637 [details]
/var/log/glusterfs of sm11 (whose brick reported blocked) after "start force"

Comment 3 Niels de Vos 2016-04-26 12:44:09 UTC
Could you provide the script or program that does the parallel I/O on the same file? Are you executing this from one client system, or from multiple clients?

Comment 4 Chen Chen 2016-04-27 04:27:17 UTC
I was running GATK CombineVariants (multi-threaded mode, -nt 16) when I noticed this inode lock. I executed this from one client system.
https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

Besides, in another lock (which failed to recover from "start force"), I was running GATK HaplotypeCaller (single-threaded) from multiple clients.

The following is the statedump snapshot from this lock. I don't really know why there were *write* operation on GATK jar. Both the brick and the volume were mounted with noatime flag, and "ls -la" showed it has not been modified since I downloaded the jar.

[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
mandatory=0
inodelk-count=4
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09

Comment 5 Chen Chen 2016-05-12 02:40:47 UTC
Any scheduled update? I don't know what is the GlusterFS equivalent of RHGS 3.1.3.

Again, another tight lock here. Couldn't release by force start. So I cold-reset the infected node (which has a noticable huge 1min-load).

[xlator.features.locks.mainvol-locks.inode]
path=<gfid:62adaa3a-a1b8-458c-964f-5742f942cd0f>/.WGC037694D_combined_R1.fastq.gz.gzT9LR
mandatory=0
inodelk-count=38
lock-dump.domain.domain=mainvol-disperse-1:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=c4dd93ee0c7f0000, client=0x7f3eb4887cc0, connection-id=hw10-18694-2016/05/02-05:57:47:620063-mainvol-client-6-0, blocked at 2016-05-11 08:22:56, granted at 2016-05-11 08:27:07
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=98acff20d77f0000, client=0x7f3ea869e250, connection-id=sm16-23349-2016/05/02-05:57:01:49902-mainvol-client-6-0, blocked at 2016-05-11 08:27:07
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=8004a958097f0000, client=0x7f3ebc66e630, connection-id=sm15-28555-2016/05/02-05:57:53:27608-mainvol-client-6-0, blocked at 2016-05-11 08:27:07
......(tailored)

Comment 6 Chen Chen 2016-07-15 03:33:25 UTC
Got another block here. I'm a bit puzzled.

I'm only initializing *ONE* rsync process on *ONE* client connecting to *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?

[xlator.features.locks.mainvol-locks.inode]
path=/home/support/bak2t/camel.alone/mapping/camel62/.camel62.clean.r1.fastq.SnlBzz
mandatory=0
inodelk-count=8
lock-dump.domain.domain=dht.file.migrate
lock-dump.domain.domain=mainvol-disperse-1:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=a85c3672497f0000, client=0x7f3910091b20, connection-id=sm14-20329-2016/06/28-05:35:35:487901-mainvol-client-6-0, blocked at 2016-07-14 11:44:42, granted at 2016-07-14 12:13:25
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=606f8a77c07f0000, client=0x7f39101d1c80, connection-id=sm13-14349-2016/06/28-05:35:35:486931-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=2cbb10b1b57f0000, client=0x7f3910004ca0, connection-id=hw10-63151-2016/06/28-05:35:33:427463-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=28b725b5337f0000, client=0x7f39100ce4c0, connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=c8e18ca77c7f0000, client=0x7f391024c340, connection-id=sm16-16031-2016/06/28-05:35:35:487112-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=08026bee887f0000, client=0x7f391c701700, connection-id=sm15-29608-2016/06/28-05:35:35:523099-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=005ad35f5b7f0000, client=0x7f39102b24f0, connection-id=sm12-22762-2016/06/28-05:35:35:487941-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=507f26b5337f0000, client=0x7f39100ce4c0, connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0, blocked at 2016-07-14 12:13:25
lock-dump.domain.domain=mainvol-disperse-1

Comment 7 Atin Mukherjee 2016-07-15 04:52:31 UTC
(In reply to Chen Chen from comment #6)
> Got another block here. I'm a bit puzzled.
> 
> I'm only initializing *ONE* rsync process on *ONE* client connecting to
> *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
> 
> [xlator.features.locks.mainvol-locks.inode]
> path=/home/support/bak2t/camel.alone/mapping/camel62/.camel62.clean.r1.fastq.
> SnlBzz
> mandatory=0
> inodelk-count=8
> lock-dump.domain.domain=dht.file.migrate
> lock-dump.domain.domain=mainvol-disperse-1:self-heal
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=a85c3672497f0000, client=0x7f3910091b20,
> connection-id=sm14-20329-2016/06/28-05:35:35:487901-mainvol-client-6-0,
> blocked at 2016-07-14 11:44:42, granted at 2016-07-14 12:13:25
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=606f8a77c07f0000, client=0x7f39101d1c80,
> connection-id=sm13-14349-2016/06/28-05:35:35:486931-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=2cbb10b1b57f0000, client=0x7f3910004ca0,
> connection-id=hw10-63151-2016/06/28-05:35:33:427463-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=28b725b5337f0000, client=0x7f39100ce4c0,
> connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=c8e18ca77c7f0000, client=0x7f391024c340,
> connection-id=sm16-16031-2016/06/28-05:35:35:487112-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=08026bee887f0000, client=0x7f391c701700,
> connection-id=sm15-29608-2016/06/28-05:35:35:523099-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=005ad35f5b7f0000, client=0x7f39102b24f0,
> connection-id=sm12-22762-2016/06/28-05:35:35:487941-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=507f26b5337f0000, client=0x7f39100ce4c0,
> connection-id=sm11-5958-2016/06/28-05:35:35:510742-mainvol-client-6-0,
> blocked at 2016-07-14 12:13:25
> lock-dump.domain.domain=mainvol-disperse-1

Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?

Comment 8 Chen Chen 2016-07-15 05:09:48 UTC
(In reply to Atin Mukherjee from comment #7)
> (In reply to Chen Chen from comment #6)
> > Got another block here. I'm a bit puzzled.
> > 
> > I'm only initializing *ONE* rsync process on *ONE* client connecting to
> > *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
>
> Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?

Yes, I'm still on 3.7.10.

Are you sure this one is touched in updates? The two #BZs blocked by this one are still in ON_QA/POST status. If so, I'll schedule a down time.

I'm hesitant to upgrade, fearing it might introduce some new bugs.

Comment 9 Atin Mukherjee 2016-07-15 05:16:00 UTC
(In reply to Chen Chen from comment #8)
> (In reply to Atin Mukherjee from comment #7)
> > (In reply to Chen Chen from comment #6)
> > > Got another block here. I'm a bit puzzled.
> > > 
> > > I'm only initializing *ONE* rsync process on *ONE* client connecting to
> > > *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
> >
> > Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?
> 
> Yes, I'm still on 3.7.10.
> 
> Are you sure this one is touched in updates? The two #BZs blocked by this
> one are still in ON_QA/POST status. If so, I'll schedule a down time.
> 
> I'm hesitant to upgrade, fearing it might introduce some new bugs.

Well, fix for 1344836 is definitely in mainline, but not in 3.7 branch. 

@Pranith - Do you mind to backport this to 3.7?

Comment 10 Chen Chen 2016-07-15 05:23:38 UTC
(In reply to Atin Mukherjee from comment #9)
> (In reply to Chen Chen from comment #8)
> > (In reply to Atin Mukherjee from comment #7)
> > > (In reply to Chen Chen from comment #6)
> > > > Got another block here. I'm a bit puzzled.
> > > > 
> > > > I'm only initializing *ONE* rsync process on *ONE* client connecting to
> > > > *ONE* Gluterfs via NFS. Why would *ALL* of the nodes want to lock it?
> > >
> > > Are you still running 3.7.10? How about upgrading it to 3.7.13 and retest?
> > 
> > Yes, I'm still on 3.7.10.
> > 
> > Are you sure this one is touched in updates? The two #BZs blocked by this
> > one are still in ON_QA/POST status. If so, I'll schedule a down time.
> > 
> > I'm hesitant to upgrade, fearing it might introduce some new bugs.
> 
> Well, fix for 1344836 is definitely in mainline, but not in 3.7 branch. 
> 
> @Pranith - Do you mind to backport this to 3.7?

I'm willing to jump to 3.8, if the upgrade process won't cause much turbulence (such as extensive configuration modification, possible data loss, etc.).

Comment 11 Chen Chen 2016-07-18 05:16:20 UTC
Error: Package: glusterfs-ganesha-3.8.1-1.el7.x86_64 (centos-gluster38)
           Requires: nfs-ganesha-gluster
 You could try using --skip-broken to work around the problem

So I'll fallback to gluster-native-nfs3 first.

Comment 12 Chen Chen 2016-08-19 07:39:00 UTC
After updated to 3.8.2, the cluster still sometimes hangs. However these is no longer "inodelk lock" in statedump, so I figure it is another bug. I'll close this bug report and open a new one.

Comment 13 Red Hat Bugzilla 2023-09-14 03:21:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days