Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 719415

Summary:	dd on GFS gets stuck in glock_wait_internal
Product:	Red Hat Enterprise Linux 5	Reporter:	Harald Klein <hklein>
Component:	gfs-kmod	Assignee:	Robert Peterson <rpeterso>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5.6	CC:	adas, anprice, bmarzins, rpeterso, swhiteho, teigland
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-08-01 11:46:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Harald Klein 2011-07-06 18:27:08 UTC

Description of problem:

When running the following "stress test" on GFS, the dd processes get stuck after about 30 minutes:

root@nodea:~# touch /tmp/dd_running; for i in $(seq 1 48); do (while [ -e /tmp/dd_running ]; do dd if=/dev/mpath/P9500_0548 of=/mnt/gfstest/lilc066/dd.out.$i bs=64k count=16384 iflag=direct skip=$(echo "16384*$i"|bc) oflag=direct >/dev/null 2>&1; done& ) ; done

root@nodeb:~# touch /tmp/dd_running; for i in $(seq 1 48); do (while [ -e /tmp/dd_running ]; do dd if=/dev/mpath/P9500_0548 of=/mnt/gfstest/lilc067/dd.out.$i bs=64k count=16384 iflag=direct skip=$(echo "16384*$i"|bc) oflag=direct >/dev/null 2>&1; done& ) ; done

Version-Release number of selected component (if applicable):


How reproducible:
run the command listed above on both nodes
  
Actual results:
After a while < 30min all I/O to the GFS filesystem stops. All dd processes are waiting in glock_wait_internal:

19694 D dd glock_wait_internal
19701 D dd glock_wait_internal
19702 D dd glock_wait_internal
19706 D dd glock_wait_internal
19710 D dd glock_wait_internal
19714 D dd glock_wait_internal

Expected results:
dd should not get stuck

Additional info:
2-Node Cluster: lilc066, lilc067
Storage: HP P9500

Comment 10 Steve Whitehouse 2011-08-01 11:46:46 UTC

I don't think we can realistically figure out what is going on here if the customer has given up on it. We don't have the daemon which appears to be at the root of the problem. Also, the dd test is a very strange one:

1. It reads from a block device (is this separate from the one the fs is one? At least I hope it is)
2. It reads and writes with the odirect flag
3. It does not appear that the destination files are pre-allocated, so losing all the benefits of writing with odirect since this will turn into a buffered sync write in that case.

That makes no sense to me as a use case unless the destination files have been preallocated.

As a result I'm going to close this. If you think that is wrong, then please reopen.