Bug 165989
Summary: | The msync(MS_SYNC) call should fail after cable pulled from scsi disk | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||||
Component: | kernel | Assignee: | Peter Staubach <staubach> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | dff, kkruzich, lwang, petrides, sct, tao, tkincaid | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-03-15 16:24:40 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 168424 | ||||||||
Attachments: |
|
Description
Wendy Cheng
2005-08-15 15:10:16 UTC
Created attachment 117753 [details]
Test program to demonstrate the problem.
I've been looking at the test program. What is the typical number of iterations passed through the second command line argument? 1) Thanx for the information! 2) Makes sense to me. Once we figure out what needs to be done, then we can figure out where to put it. Yes, Stephen's comment had to do with ways of reproducing problems associated with issues in the storage subsystem, _without_ having to physically change the hardware. I have prototyped a solution and am currently discussing it with some of the engineers to get some feedback on it. Created attachment 117974 [details]
Proposed patch
The msync(2) code in RHEL-3 works by walking a list of dirty pages and arranging to have them written out to storage. When the i/o is done on each page, they are moved to a clean_pages list. However, when ext3 finds that the file system is readonly, it sets the PG_dirty bit in the page struct again. Thus, the page is marked as dirty, ie. needs to be written to storage, but is on the clean_pages list. Having the dirty bit already set prevents the page from moving from the clean_pages to the dirty_pages list, thus preventing msync from finding the page again. Since ext3 is not called into again, the readonly file system error is not returned again. The solution is to place the page on the dirty_pages list instead of the clean_pages list if PG_dirty is set in the page struct. Thus, msync will continue to find the page and attempt to write it out. This will call into ext3 and ext3 can then return the readonly file system error. Please note that RHEL-4 did not suffer from this issue. A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.1.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html |