Bug 1330997 - [Disperse volume]: IO hang seen on mount with file ops
Summary: [Disperse volume]: IO hang seen on mount with file ops
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Ashish Pandey
QA Contact: Upasana
: 1334860 1339465 (view as bug list)
Depends On: 1329466 1330132 1342426 1344675 1345909 1345911
Blocks: 1344836 1360576 1361402
TreeView+ depends on / blocked
Reported: 2016-04-27 12:21 UTC by krishnaram Karthick
Modified: 2018-10-28 15:57 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.7.9-7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1344836 (view as bug list)
Last Closed: 2018-10-28 15:57:27 UTC

Attachments (Terms of Use)

Description krishnaram Karthick 2016-04-27 12:21:25 UTC
Description of problem:
IO hang is seen on NFS ganesha mount with file operations such as file creations, directory creations and lookups.

Version-Release number of selected component (if applicable):


How reproducible:
yet to determine

Steps to Reproduce:
1. create a EC volume
2. mount the volume using NFS ganesha v3 protocol
3. create 100000 files and 10000 directories with multiple sub-dirs
4. run lookup on the mount point
5. Allow file and directory creation to complete

Actual results:
IO operation hung

Expected results:
No disruption to IO should be seen

Additional info:
sosreports and statedumps shall be attached shortly

Comment 3 krishnaram Karthick 2016-04-27 16:26:06 UTC
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1330997/

Comment 4 krishnaram Karthick 2016-04-28 06:24:15 UTC
changing the summary to reflect the - EC volume - IO hang seen on ganesha mount with file ops

Comment 11 Nithya Balachandran 2016-05-12 08:30:52 UTC
*** Bug 1334860 has been marked as a duplicate of this bug. ***

Comment 13 Ashish Pandey 2016-05-25 09:02:39 UTC
*** Bug 1339465 has been marked as a duplicate of this bug. ***

Comment 19 nchilaka 2016-06-06 10:40:46 UTC
While verifying this bug I hit Bug 1342426 - self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha 
Hence this bug verification is blocked till Bug 1342426 is fixed

Comment 20 nchilaka 2016-06-10 13:29:38 UTC
Restarted validating this bug again after having  "Bug 1342426 - self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha " this bug fixed.

I hit an issue again with ls -lRt hanging and also stale file handle error with an oom kill of ganesha process. Raised a bug for the same 1344675 - Stale file handle seen on the mount of dist-disperse volume when doing IOs with nfs-ganesha protocol

So, discussed with stakeholders and unmounted the volume and cleaned the client side.
then remounted the same volume on only one client and issued an ls -lRt on root of the mount
the ls -lRt was hung (observered for atleast 1 hr and was still not responding till end of day)

[root@dhcp35-126 ~]# mkdir /mnt/ec-nfsganesha;mount -t nfs -o vers=3 /mnt/ec-nfsganesha
[root@dhcp35-126 ~]# cd /mnt/ec-nfsganesha
[root@dhcp35-126 ec-nfsganesha]# ls -lRt |& tee -a refreshed.ls.log

Comment 22 Soumya Koduri 2016-06-10 13:42:15 UTC
Plain 'ls' command on root directory seems to have worked fine. Only issue is with 'ls -lRT' deep directories lookup. As part of readdir(/plus), NFS-ganesha caches all dirents along with their attributes.

The issue seems like, since the volume contains millions of file, its taking lot of time for NFS-ganesha to cache all of them and then respond. Also we see that in case of disperse volumes, we see that for each file there are lookup requests sent to almost all the ec sub-volumes (bricks) adding up to the latency. Attached the pkt trace.

Comment 27 Soumya Koduri 2016-06-10 19:00:05 UTC

Any thoughts on the deadlock of locks mentioned above?

Comment 29 Pranith Kumar K 2016-06-11 02:16:06 UTC
Disabling readdirp doesn't disable readdirp. Both md-cache and dht still do readdirp. So we need to disable them too. I will take a look at the setup about the locks.

Comment 30 Atin Mukherjee 2016-06-11 09:20:22 UTC
Did you get a chance to look at the setup, Pranith?

Comment 31 Pranith Kumar K 2016-06-11 09:34:24 UTC
Yes, there is a stale lock. I am trying to find the reasons why it could get into this state. I will update the bz as soon as I find something. This is multi-threaded code and there could be races. I need to find which race could lead to this state. May take a while to find the problem

Comment 32 Pranith Kumar K 2016-06-11 10:35:32 UTC
TL:DR: I recreated the hang on my laptop.

The locks were getting acquired at the time when bricks were going down because of ping timeouts. 4 of the 6 bricks went down at that time. 2 of the 6 bricks have locks which are not being unlocked for some reason and were left stale. To come to this conclusion it took almost 7 hours :-). Had to look at statedumps and took statedump of nfs ganesha process as well using gdb.

Steps to recreate the issue:
1) create a plain disperse volume
2) Put a breakpoint at ec_wind_inodelk
3) From the fuse mount issue ls -laR <mount>
4) as soon as the break point is hit in gdb, from other terminal kill 4 of the 6 bricks
5) quit gdb
6) Wait for a second or two to confirm that there are stale locks on the remaining bricks
7) In my case there were, so I issued ls -laR on the mount and it hung.

Comment 33 Pranith Kumar K 2016-06-11 13:34:28 UTC
Posted the patch upstream: http://review.gluster.org/14703

This is day-1 bug in ec. Was able to recreate it in 3.1.1 as well. I don't remember seeing disconnects first time we looked at this bz. So I think it is different RC.

Comment 34 Atin Mukherjee 2016-06-11 16:31:59 UTC
It seems like ec stale lock issue is *only* hit if brick disconnection (ping timer expiry) happens. Question for Nagpavan is are you hitting brick disconnection every time while verifying this bug, if so does that mean you are blocked on verifying it? 


If Nagpavan does confirm the same, can you shed some light on this why are we seeing the ping timer expiry? Does it have anything to do with the access protocol?

Comment 35 Raghavendra G 2016-06-11 17:11:41 UTC
Me and Pranith looked into client and brick logs.

1. clients were able to reconnect without any issues after disconnect
2. There is not much of information in logs which help to figure out what caused the ping-timer-expiry.

To summarize, ping-timer-expiry is not a problem in rpc/transport. It can be caused because of many reasons, one of which can be poller thread being not able to respond back to ping requests. We need more investigation and the bug need not be in rpc/transport layer. What can help here is:

1. brick logs in DEBUG log-level so that we can try to analyse what the brick process is doing during ping-expiry.
2. exact steps/workload during which ping timer expired.


Comment 36 nchilaka 2016-06-12 07:10:06 UTC
I didn't observe any brick disconnections.
Also, the dd IO was successfully going on(when I ran it as part of validation before comment 20). Only ls -lrt was hanging(with tar untar failing after certain iterations, due to oom kill of nfs ganesha)

Comment 37 nchilaka 2016-06-13 12:33:27 UTC
unable to mark this as verified due to following reason:
the writes were not having issues but doing an ls -lRt was either getting hung or started to display contents only after a very long delay of more than 15 hrs
hence the use case is incomplete.
Marking this bug as blocked to verify due to below BZs:
1344675 - Stale file handle seen on the mount of dist-disperse volume when doing IOs with nfs-ganesha protocol
1345911 - locks on file in dist-disperse not released leading to IO hangs
1345909 - ls -lRt taking very long time to display contents on an dist-disperse volume with nfs ganesha protocol

Comment 44 Atin Mukherjee 2018-10-28 15:57:27 UTC
As I see the fix is already available in the latest release of RHGS, closing this bug.

Note You need to log in before you can comment on or make changes to this bug.