Bug 212627
| Summary: | GFS2 Direct IO deadlock | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||||
| Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | GFS Bugs <gfs-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 5.0 | CC: | dzickus, kanderso, lwang, rkenna | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | beta2 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2006-12-23 01:48:55 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 218883 | ||||||||
| Attachments: |
|
||||||||
|
Description
Wendy Cheng
2006-10-27 19:20:51 UTC
Devel ACK, not setting blocker flag at this point but should fix before GA. This looks odds - i_mutex is taken in generic_file_write, followed by glock while gfs2_direct_IO obtains i_mutex before glock. Nothing wrong with that. Then what is the __blockdev_direct_IO waiting for ? I'll re-run the test. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. The deadlock comes from:
1. Read side (i_mutex->glock->i_mutex):
GFS2 uses XFS' DIO_OWN_LOCKING which will release i_mutex in the
middle of __blockdev_direct_IO call after obtaining the SHARED
mode glock upon entering gfs2_direct_IO:
1257
1258 if (dio_lock_type == DIO_OWN_LOCKING) {
1259 mutex_unlock(&inode->i_mutex);
1260 acquire_i_mutex = 1;
1261 }
Then tries to obtain it before going out:
1284 out:
1285 if (release_i_mutex)
1286 mutex_unlock(&inode->i_mutex);
1287 else if (acquire_i_mutex)
1288 mutex_lock(&inode->i_mutex);
1289 return retval;
1290 }
1291 EXPORT_SYMBOL(__blockdev_direct_IO);
2. Write side (i_mutex->glock)
GFS2 direct write obtains the i_mutex (after reader temporarily
release i_mutex in the middle of the __blockdev_direct_IO) and then
tries to acquire EXCLUSIVE glock and gets blocked due to reader's
SHARED glock.
So the headache here is that the gfs2 has been written around DIO_OWN_LOCKING
logic so the tricks used in GFS1 CAN NOT apply here.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Created attachment 143748 [details]
Patch to fix the hang
This is very similar to Wendy's patch except for the alterations to the logic
determining when DIO is skipped. This should allow us to build on the
possibility of allocating writes for DIO in the future more easily.
I've been doing some testing with verify-data and it seems that this fixes the
problem.
184896 ignore previous useless comment in 2.6.18-1.2910.el5 A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |