Bug 719415 - dd on GFS gets stuck in glock_wait_internal
Summary: dd on GFS gets stuck in glock_wait_internal
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod
Version: 5.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-06 18:27 UTC by Harald Klein
Modified: 2018-11-14 12:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-01 11:46:46 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Harald Klein 2011-07-06 18:27:08 UTC
Description of problem:

When running the following "stress test" on GFS, the dd processes get stuck after about 30 minutes:

root@nodea:~# touch /tmp/dd_running; for i in $(seq 1 48); do (while [ -e /tmp/dd_running ]; do dd if=/dev/mpath/P9500_0548 of=/mnt/gfstest/lilc066/dd.out.$i bs=64k count=16384 iflag=direct skip=$(echo "16384*$i"|bc) oflag=direct >/dev/null 2>&1; done& ) ; done

root@nodeb:~# touch /tmp/dd_running; for i in $(seq 1 48); do (while [ -e /tmp/dd_running ]; do dd if=/dev/mpath/P9500_0548 of=/mnt/gfstest/lilc067/dd.out.$i bs=64k count=16384 iflag=direct skip=$(echo "16384*$i"|bc) oflag=direct >/dev/null 2>&1; done& ) ; done

Version-Release number of selected component (if applicable):


How reproducible:
run the command listed above on both nodes
  
Actual results:
After a while < 30min all I/O to the GFS filesystem stops. All dd processes are waiting in glock_wait_internal:

19694 D dd glock_wait_internal
19701 D dd glock_wait_internal
19702 D dd glock_wait_internal
19706 D dd glock_wait_internal
19710 D dd glock_wait_internal
19714 D dd glock_wait_internal

Expected results:
dd should not get stuck

Additional info:
2-Node Cluster: lilc066, lilc067
Storage: HP P9500

Comment 10 Steve Whitehouse 2011-08-01 11:46:46 UTC
I don't think we can realistically figure out what is going on here if the customer has given up on it. We don't have the daemon which appears to be at the root of the problem. Also, the dd test is a very strange one:

1. It reads from a block device (is this separate from the one the fs is one? At least I hope it is)
2. It reads and writes with the odirect flag
3. It does not appear that the destination files are pre-allocated, so losing all the benefits of writing with odirect since this will turn into a buffered sync write in that case.

That makes no sense to me as a use case unless the destination files have been preallocated.

As a result I'm going to close this. If you think that is wrong, then please reopen.


Note You need to log in before you can comment on or make changes to this bug.