Bug 667707 - [xfs] Stress testing with swap usage resulted in unresponsive processes
[xfs] Stress testing with swap usage resulted in unresponsive processes
Status: ASSIGNED
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5
x86_64 Unspecified
medium Severity medium
: rc
: ---
Assigned To: Dave Chinner
Red Hat Kernel QE team
:
Depends On:
Blocks: 640580
  Show dependency treegraph
 
Reported: 2011-01-06 10:18 EST by Boris Ranto
Modified: 2016-07-05 14:45 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Few calltraces from the machine (12.73 KB, text/plain)
2011-01-07 11:05 EST, Boris Ranto
no flags Details

  None (edit)
Description Boris Ranto 2011-01-06 10:18:24 EST
Description of problem:
Based on testing suggested in bug 666477 for xfs filesystem with memory filled by memhog and several (10) dd's, the dd's and memhog get stuck after a while (within an hour) becoming unresponsive.

Version-Release number of selected component (if applicable):
kernel-2.6.18-238.el5

How reproducible:
I've tried only two times but I managed to reproduce the issue both times.

Steps to Reproduce:
1. Create ~2 TB sparse file
dd if=/dev/zero of=xfs.img count=0 bs=4096 seek=500M
2. Start running memhog in while loop so that memory can get completely filled with it
while true;do memhog 11g; done
# Machine I used had 8 GB ram and 10 GB swap
3. Make xfs filesystem on the sparse file and mount the file
mkfs.xfs xfs.img && mount xfs.img /xfs -o loop && cf /xfs
4. Run several (in my case 10) dd's in background
for i in $(seq 1 10); do dd if=/dev/zero of=test$i bs=4096 count=2M &; done
5. After a while (approximately in the middle of the writes) kill the dd's (I'm not sure this step is needed but both times it reproduced after I did this)
killall -9 dd
6. Run the dd's again (with same command) so that files get overwritten (possibly repeatedly until reproduction)
7. Run ll -sh /xfs to see what the dd's are doing while the dd's are running

Actual results:
After a while (within an hour), memhog gets stuck (no new points in its output) and dd's gets stuck (filesize of the test$i files does not update anymore). Both dd and memhog processes are unkillable (not event with -9 option).

Expected results:
dd and memhog will finish successfully.

Additional info:
Machine: hp-bl685cg6-01.rhts.eng.bos.redhat.com
Comment 1 CAI Qian 2011-01-06 22:19:41 EST
Boris, Can you please generate a sysrq-t output when this is happening?
Comment 2 Boris Ranto 2011-01-07 07:05:17 EST
I've managed to reproduce it the same way on different machine therefore I suppose it is not machine specific.
Problem is that after another while the machine becomes completely unresponsive (let's assume you had unused terminal opened, the terminal accepts input but that is all it can do, when you try to run any command, nothing happens and command is unkillable (with ctrl-c)). Therefore it is quite difficult to generate sysrq-t (I can't enter any command and I have no idea how to send the key combination remotely).
I'll try to catch the window between memhog + dd freeze and complete freeze next time I'll reproduce the problem. If I catch it I'll send the output of sysrq-t here.
Comment 3 Boris Ranto 2011-01-07 11:05:05 EST
Created attachment 472253 [details]
Few calltraces from the machine

I didn't manage to get into the window to do echo t >/proc/sysrq-trigger but I can provide at least call traces that pop up because of hung_task_timeout.
Comment 7 Ric Wheeler 2011-01-10 16:22:45 EST
Eric, can you summarize where we are with this? thanks!
Comment 8 Eric Sandeen 2011-01-10 16:41:28 EST
I don't see this as a blocker; I doubt that it is a regression (can that be tested?), and it is a bit of an odd use case.  How often will we badly stress a sparse loopback file containing xfs?

BTW, what filesystem hosts the sparse file?

Dave, do the backtraces speak to you at all?
Comment 9 CAI Qian 2011-01-10 19:34:19 EST
It would be better to use a real XFS filesystem partition other than loopback.

It might also possible to generate sysrq-t and sysrq-m via conserve like this.

# echo 1 >/proc/sys/kernel/sysrq

Then,from the conserv serial console,
ctrl-e, c, l, 0, t

ctrl-e, c, l, 0, m
Comment 10 Boris Ranto 2011-01-11 08:06:43 EST
The sparse file is hosted on ext3 filesystem as installed by default on rhel5.

I reproduced it again and it seems that the step when dd's are killed is not important for reproduction (although it might shorten the time necessary for reproduction). I also tried the conserver console commands but they didn't generate anything:
[root@nec-em19 ~]# cat /proc/sys/kernel/sysrq 
1
[root@nec-em19 ~]# [halt sent]
[halt sent]
t
-bash: t: command not found
[root@nec-em19 ~]# [halt sent]
m
Comment 11 CAI Qian 2011-01-11 08:42:01 EST
> reproduction). I also tried the conserver console commands but they didn't
> generate anything:
> [root@nec-em19 ~]# cat /proc/sys/kernel/sysrq 
> 1
> [root@nec-em19 ~]# [halt sent]
> [halt sent]
> t
> -bash: t: command not found
> [root@nec-em19 ~]# [halt sent]
> m
It is unlucky to reproduce on one of those boxes did not support conserv sysrq.
Comment 12 Dave Chinner 2011-01-12 00:31:44 EST
Looks familiar. Probably a different manifestation of the problem the below upstream commit (which is in RHEL6) fixes. In this case, it is writeback holding the ilock, waiting for metadata buffer IO completion, which can't occur because all the IO completion queues are blocked in the ilock held by writeback.

$ gl 77d7a0c2eeb285c9069e15396703d0cb9690ac50 -n 1
commit 77d7a0c2eeb285c9069e15396703d0cb9690ac50
Author: Dave Chinner <david@fromorbit.com>
Date:   Wed Feb 17 05:36:29 2010 +0000

    xfs: Non-blocking inode locking in IO completion
    
    The introduction of barriers to loop devices has created a new IO
    order completion dependency that XFS does not handle. The loop
    device implements barriers using fsync and so turns a log IO in the
    XFS filesystem on the loop device into a data IO in the backing
    filesystem. That is, the completion of log IOs in the loop
    filesystem are now dependent on completion of data IO in the backing
    filesystem.
    
    This can cause deadlocks when a flush daemon issues a log force with
    an inode locked because the IO completion of IO on the inode is
    blocked by the inode lock. This in turn prevents further data IO
    completion from occuring on all XFS filesystems on that CPU (due to
    the shared nature of the completion queues). This then prevents the
    log IO from completing because the log is waiting for data IO
    completion as well.
    
    The fix for this new completion order dependency issue is to make
    the IO completion inode locking non-blocking. If the inode lock
    can't be grabbed, simply requeue the IO completion back to the work
    queue so that it can be processed later. This prevents the
    completion queue from being blocked and allows data IO completion on
    other inodes to proceed, hence avoiding completion order dependent
    deadlocks.
    
    Signed-off-by: Dave Chinner <david@fromorbit.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Alex Elder <aelder@sgi.com>
Comment 16 RHEL Product and Program Management 2011-06-20 18:26:42 EDT
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.
Comment 18 RHEL Product and Program Management 2012-01-09 09:18:44 EST
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Note You need to log in before you can comment on or make changes to this bug.