Bug 1427131

Summary:	[Scale] : I/O hangs on Ganesha mounts during add-brick/remove-brick on DR volumes.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	nfs-ganesha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, bcodding, bturner, dang, ffilz, jthottan, mbenjamin, msaini, rhinduja, rhs-bugs, sheggodu, skoduri, steved, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.4.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	nfs-ganesha-2.5.4-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 06:53:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1503134

Description Ambarish 2017-02-27 12:27:05 UTC

Description of problem:
-----------------------

I had two 4 node clusters,with one DR volume each.I was doing add-brick on one(with a  change in replica count from 2 to3) and remove brick on another..

I could see that my I/O was hung on 2 of my clients in both setups  during rebalance for nearly 14 hours,and continues to be in hung state now,even when rebalance is over.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-15.el7rhgs.x86_64
nfs-ganesha-2.4.1-7.el7rhgs.x86_64


How reproducible:
------------------

1/1

Steps to Reproduce:
------------------

1. Mount volume via v4.

2. Run Bonnie/kernel untars.

3. add/remove brick

Actual results:
----------------

IO hangs indefinitely.

Expected results:
-----------------

Zero error status on clients on rebalance.

Additional info:
----------------

CLUSTER 1 (ADD BRICK) :

[root@gqas009 ~]# gluster v info
 
Volume Name: butcher
Type: Distributed-Replicate
Volume ID: b14eca7f-a1a2-4c6e-8380-355652ecf7fa
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/b1
Brick2: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/b1
Brick3: gqas010.sbu.lab.eng.bos.redhat.com:/bricks6/b1
Brick4: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/b1
Brick5: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/b1
Brick6: gqas009.sbu.lab.eng.bos.redhat.com:/bricks6/b1
Brick7: gqas010.sbu.lab.eng.bos.redhat.com:/bricks3/b1
Brick8: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/b1
Brick9: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/b1
Brick10: gqas010.sbu.lab.eng.bos.redhat.com:/bricks4/b1
Brick11: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/b1
Brick12: gqas015.sbu.lab.eng.bos.redhat.com:/bricks6/b1
Options Reconfigured:
ganesha.enable: on
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.cache-samba-metadata: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable


[root@gqas009 ~]# gluster v status
Status of volume: butcher
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks1/b1                                   49152     0          Y       28581
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks1/b1                                   49153     0          Y       29496
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks6/b1                                   49157     0          Y       1396 
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks2/b1                                   49153     0          Y       28601
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks2/b1                                   49154     0          Y       29516
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks6/b1                                   49158     0          Y       2439 
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks3/b1                                   49154     0          Y       28621
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks3/b1                                   49155     0          Y       29536
Brick gqas014.sbu.lab.eng.bos.redhat.com:/b
ricks6/b1                                   49154     0          Y       702  
Brick gqas010.sbu.lab.eng.bos.redhat.com:/b
ricks4/b1                                   49155     0          Y       28641
Brick gqas009.sbu.lab.eng.bos.redhat.com:/b
ricks4/b1                                   49156     0          Y       29556
Brick gqas015.sbu.lab.eng.bos.redhat.com:/b
ricks6/b1                                   49154     0          Y       2015 
Snapshot Daemon on localhost                49157     0          Y       30595
Self-heal Daemon on localhost               N/A       N/A        Y       23256
Quota Daemon on localhost                   N/A       N/A        Y       23264
Snapshot Daemon on gqas014.sbu.lab.eng.bos.
redhat.com                                  49153     0          Y       29299
Self-heal Daemon on gqas014.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       17428
Quota Daemon on gqas014.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       17436
Snapshot Daemon on gqas015.sbu.lab.eng.bos.
redhat.com                                  49153     0          Y       30507
Self-heal Daemon on gqas015.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       19700
Quota Daemon on gqas015.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       19708
Snapshot Daemon on gqas010.sbu.lab.eng.bos.
redhat.com                                  49156     0          Y       29606
Self-heal Daemon on gqas010.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       20404
Quota Daemon on gqas010.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       20412
 
Task Status of Volume butcher
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : d065363f-a3f8-4256-9d42-ef165e46e250
Status               : completed  



CLUSTER 2 - REMOVE BRICK :

[root@gqas013 ~]# gluster v info
 
Volume Name: butcher
Type: Distributed-Replicate
Volume ID: 0ff07f58-4097-4a99-9a5c-545851678f7d
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/b1
Brick2: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/b1
Brick3: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/b1
Brick4: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/b1
Brick5: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/b1
Brick6: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/b1
Brick7: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/b1
Brick8: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/b1
Brick9: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/b1
Options Reconfigured:
ganesha.enable: on
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.cache-samba-metadata: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable



[root@gqas013 ~]# gluster v status
Status of volume: butcher
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks1/b1                                   49153     0          Y       1532 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks1/b1                                   49153     0          Y       1538 
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks1/b1                                   49153     0          Y       2938 
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks2/b1                                   49154     0          Y       1554 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks2/b1                                   49154     0          Y       1558 
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks2/b1                                   49154     0          Y       2958 
Brick gqas005.sbu.lab.eng.bos.redhat.com:/b
ricks3/b1                                   49155     0          Y       1574 
Brick gqas006.sbu.lab.eng.bos.redhat.com:/b
ricks3/b1                                   49155     0          Y       1578 
Brick gqas013.sbu.lab.eng.bos.redhat.com:/b
ricks3/b1                                   49155     0          Y       2978 
Snapshot Daemon on localhost                49156     0          Y       4096 
Self-heal Daemon on localhost               N/A       N/A        Y       27036
Quota Daemon on localhost                   N/A       N/A        Y       27044
Snapshot Daemon on gqas008.sbu.lab.eng.bos.
redhat.com                                  49152     0          Y       2553 
Self-heal Daemon on gqas008.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       21030
Quota Daemon on gqas008.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       21138
Snapshot Daemon on gqas006.sbu.lab.eng.bos.
redhat.com                                  49156     0          Y       2693 
Self-heal Daemon on gqas006.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26922
Quota Daemon on gqas006.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       26930
Snapshot Daemon on gqas005.sbu.lab.eng.bos.
redhat.com                                  49156     0          Y       2667 
Self-heal Daemon on gqas005.sbu.lab.eng.bos
.redhat.com                                 N/A       N/A        Y       26059
Quota Daemon on gqas005.sbu.lab.eng.bos.red
hat.com                                     N/A       N/A        Y       26072
 
Task Status of Volume butcher
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : 11222ad2-3492-499e-95a6-57c41b9bfa69
Removed bricks:     
gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/b1
gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/b1
gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/b1
Status               : completed

Comment 3 Ambarish 2017-02-27 13:01:55 UTC

Workload Description :

Bonnie on 3 clients,kernel untar on one.

Comment 5 Ambarish 2017-02-28 09:04:11 UTC

I've reproduced this issue consistently on :

> Adding a brick and changing the replica count

> Adding a brick without changing the replica count.

> Remove-brick.


Proposing this to block RHGS 3.2,to be taken up for consideration in the next blocker triage.

Comment 6 Soumya Koduri 2017-02-28 13:38:20 UTC

tcpdumps collected were after few hours of hung being observed and it contained only RENEW fops. pstack on server process also dint show any threads waiting or in hung state. Moreover, other client are able to access the same volume via same server. 


Looking at client logs -

Feb 27 23:00:39 gqac015 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 27 23:00:39 gqac015 kernel: bonnie++        D ffffffff816895f0     0 11340  11338 0x00000080
Feb 27 23:00:39 gqac015 kernel: ffff880036477c30 0000000000000086 ffff880c1732ce70 ffff880036477fd8
Feb 27 23:00:39 gqac015 kernel: ffff880036477fd8 ffff880036477fd8 ffff880c1732ce70 ffff880617a56c40
Feb 27 23:00:39 gqac015 kernel: 0000000000000000 7fffffffffffffff ffff88062ff95de8 ffffffff816895f0
Feb 27 23:00:39 gqac015 kernel: Call Trace:
Feb 27 23:00:39 gqac015 kernel: [<ffffffff816895f0>] ? bit_wait+0x50/0x50
Feb 27 23:00:39 gqac015 kernel: [<ffffffff8168b579>] schedule+0x29/0x70
Feb 27 23:00:39 gqac015 kernel: [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
Feb 27 23:00:39 gqac015 kernel: [<ffffffffa066d124>] ? nfs_initiate_pgio+0xd4/0x160 [nfs]
Feb 27 23:00:39 gqac015 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Feb 27 23:00:39 gqac015 kernel: [<ffffffff816895f0>] ? bit_wait+0x50/0x50
Feb 27 23:00:39 gqac015 kernel: [<ffffffff8168ab1e>] io_schedule_timeout+0xae/0x130
Feb 27 23:00:39 gqac015 kernel: [<ffffffff8168abb8>] io_schedule+0x18/0x20
Feb 27 23:00:39 gqac015 kernel: [<ffffffff81689601>] bit_wait_io+0x11/0x50
Feb 27 23:00:39 gqac015 kernel: [<ffffffff81689125>] __wait_on_bit+0x65/0x90
Feb 27 23:00:39 gqac015 kernel: [<ffffffff8117fb61>] wait_on_page_bit+0x81/0xa0
Feb 27 23:00:39 gqac015 kernel: [<ffffffff810b16c0>] ? wake_bit_function+0x40/0x40
Feb 27 23:00:39 gqac015 kernel: [<ffffffff8117fc91>] filemap_fdatawait_range+0x111/0x1b0
Feb 27 23:00:39 gqac015 kernel: [<ffffffff81181f96>] filemap_write_and_wait_range+0x56/0x90
Feb 27 23:00:39 gqac015 kernel: [<ffffffffa06bca71>] nfs4_file_fsync+0x81/0x150 [nfsv4]
Feb 27 23:00:39 gqac015 kernel: [<ffffffff811fe1e2>] ? vfs_write+0x172/0x1e0
Feb 27 23:00:39 gqac015 kernel: [<ffffffff8122fdb5>] do_fsync+0x65/0xa0
Feb 27 23:00:39 gqac015 kernel: [<ffffffff81230080>] SyS_fsync+0x10/0x20
Feb 27 23:00:39 gqac015 kernel: [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b


NFS-Client may have got hung. Ben (cced) from NFS-Client team mentioned that recent v4 clients will wait forever if the server drops or fails to respond to a WRITE. Probably in this case as well server may have not responded or dropped response to WRITE request.

Below warnings were observed on the server machine but cannot be sure if they are related - 

27/02/2017 11:09:27 : epoch 48620000 : gqas013.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-22354[work-208] nfs_dupreq_finish :DUPREQ :WARN :DRC retire entries: unable to find reclaimable dupreq LRU entry after 5 tries on DRC=0x7fda2406ce40, drc->size=1025
27/02/2017 11:09:27 : epoch 48620000 : gqas013.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-22354[work-202] nfs_dupreq_finish :DUPREQ :WARN :DRC retire entries: unable to find reclaimable dupreq LRU entry after 5 tries on DRC=0x7fda2406ce40, drc->size=1026
27/02/2017 11:09:27 : epoch 48620000 : gqas013.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-22354[work-75] nfs_dupreq_finish :DUPREQ :WARN :DRC retire entries: unable to find reclaimable dupreq LRU entry after 5 tries on DRC=0x7fda2406ce40, drc->size=1124
27/02/2017 11:09:27 : epoch 48620000 : gqas013.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-22354[work-198] nfs_dupreq_finish :DUPREQ :WARN :DRC retire entries: unable to find reclaimable dupreq LRU entry after 5 tries on DRC=0x7fda2406ce40, drc->size=1125
27/02/2017 11:10:00 : epoch 48620000 : gqas013.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-22354[work-152] nfs_dupreq_finish :DUPREQ :WARN :DRC retire entries: unable to find reclaimable dupreq LRU entry after 5 tries on DRC=0x7fda2406ce40, drc->size=1136


@Frank/Dan/Matt,

Do you infer anything from above warnings which may have caused server not to respond to the client. Please note that many of the drc fixes which went in upstream next branch are missing in the current downstream codebase (which is at 2.4.1). 


Also when Ambarish reproduced the issue today, the client was hung for long time (almost half a day) but seem to have recovered now (not sure how, could be network restart), but bonnie++ test-suite seem to be still in hung state. Could there be any issues with bonnie test-suite itself?

[root@gqac015 ~]# 
[root@gqac015 ~]# ls /gluster-mount/
d1  dir1  dir4                 new      run11309  run1352  run2753  run3666  run5004
d2  dir2  dir5                 run1011  run11315  run1477  run3030  run3739  run5177
d3  dir3  linux-4.9.12.tar.xz  run1108  run1201   run1672  run3391  run4897  T
[root@gqac015 ~]# 


[root@gqac015 ~]# ps aux | grep bonnie
root     11315  0.0  0.0 113260  1568 pts/1    S+   Feb27   0:00 /bin/bash /opt/qa/tools/system_light/run.sh -w /gluster-mount -t bonnie -l /var/tmp/bonnie.log
root     11338  0.0  0.0 113128  1428 pts/1    S+   Feb27   0:00 /bin/bash /opt/qa/tools/system_light/scripts/bonnie/bonnie.sh
root     11340  0.1  0.0  16960  1380 pts/1    D+   Feb27   0:58 bonnie++ -u root -d /gluster-mount/run11315/
root     16791  0.0  0.0 112656   964 pts/3    S+   06:59   0:00 grep --color=auto bonnie
[root@gqac015 ~]#

Comment 7 Daniel Gryniewicz 2017-02-28 15:45:08 UTC

I'm not sure if the warning has anything to do with the hang.  I suspect it may be possible.  Regardless, 1100 is a *lot* of dupreq entries, so something is wrong with the dupreq cache.

There have been a lot of fixes recently relating to dupreq refcounting and races, mostly done by Malahal.  Since this is reproducible, trying with a backport of these fixes may be a good idea.

I haven't gone through all these fixes to see which ones might be necessary, but here's all of the potentially relevant ones, from oldest to newest:

223d1304dc920da574bfa4e203a19f602ae22c07
8abb27f582f67d018989a9033bd17697fdae85bc
44f9a0f56aaa258d2221729e878e7479342aa70f
e3e88ab565d906b3283c3655985c22f83ced0ac2
9da22db00c28edc0e56fb6fbc105d5b82ea4cc71
269f0c3c23b4c0b82b18a9b5a5af466111cdbc10 (just changes log messages)
1e8e3158a57f6631e4b581bf8e4016b9989d2d17
d2acc676428e2cb504410799fb5eff920a3c1277
9cf5429fda1bfd07d98c8807d51754aadb26d8a0
b190d195279158e59867738904697ff125bce0b2

Comment 10 Kaleb KEITHLEY 2017-08-21 12:27:47 UTC

retest after rebase to 2.5.x

Comment 11 Kaleb KEITHLEY 2017-10-05 11:29:18 UTC

POST with rebase to nfs-ganesha-2.5.x

Comment 18 errata-xmlrpc 2018-09-04 06:53:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610