Bug 537475
| Summary: | Write barrier operations not working for libata and general SCSI disks | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Brent Holden <bholden> | ||||
| Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Eryu Guan <eguan> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 4.6 | CC: | dhoward, eguan, emcnabb, jbacik, phan, plyons, rwheeler, yugzhang | ||||
| Target Milestone: | rc | Keywords: | ZStream | ||||
| Target Release: | 4.9 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Red Hat Enterprise Linux 4 kernel 2.6.9-67 does not enable barriers if it detects WCE (Write Cache Enabled) as being disabled. However, the PERC 6/i controller on Dell 2950 hardware improperly reports WCE as being disabled while still maintaining a write cache. This could have caused file system corruption in the event of a power outage due to data being maintained in the controller cache even though the kernel was told that data was being written through to disk.
The solution to this issue provides proper software ordering, with the result that proper data integrity is achieved with devices that have no write caching (or for which write caching is disabled), and no command queuing. If command queuing or write caching is enabled, there is no guarantee of data integrity following a crash.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-02-16 15:40:48 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 560563 | ||||||
| Attachments: |
|
||||||
|
Description
Brent Holden
2009-11-13 19:17:30 UTC
Note that this is a generic issue - not just a PERC controller one. We do not enable barriers by default and recommend that users run their local storage with write cache disabled when they do not have battery backup. However, if you do run with write cache enabled and mount with barriers, this fix is required for any drive exported via libata or normal SCSI. Created attachment 369486 [details]
proposed patch
proposed fix.
posted 11/13 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The solution to this issue doesn't actually give us cache flushes, it just provides proper software ordering. So this will give us proper data integrity with devices that have no write caching or have write caching disabled, but if the device has write cache enabled it will not get cache flushes and leave the possibility for data loss. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -The solution to this issue doesn't actually give us cache flushes, it just provides proper software ordering. So this will give us proper data integrity with devices that have no write caching or have write caching disabled, but if the device has write cache enabled it will not get cache flushes and leave the possibility for data loss.+The solution to this issue doesn't actually give us cache flushes, it just provides proper software ordering. So this will give us proper data integrity with devices that have no write caching (or write caching is disabled) and no command queuing. If you have command queuing or write cache enabled there is no guarantee of data integrity. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -The solution to this issue doesn't actually give us cache flushes, it just provides proper software ordering. So this will give us proper data integrity with devices that have no write caching (or write caching is disabled) and no command queuing. If you have command queuing or write cache enabled there is no guarantee of data integrity.+The solution to this issue doesn't actually give us cache flushes, it just provides proper software ordering. So this will give us proper data integrity with devices that have no write caching (or write caching is disabled) and no command queuing. If you have command queuing or write cache enabled there is no guarantee of data integrity after a crash. Committed in 89.21.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Please help to verify this bug with the rhel4 kernel -89.21.EL. Thanks. Followed the simple test from bug 560563 comment 7, the file/sec rates showed no big difference between barrier=1 and barrier=0 on 2.6.9-99 kernel. But I did notice that there was a message shown up on 2.6.9-89 kernel when mount ext3 with barrier=1 mount option. "JBD: barrier-based sync failed on sda1 - disabling barriers" On 2.6.9-99 kernel, there was no such message. I also ran fsstress, no issue found. Set it to VERIFIED.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1 +1,3 @@
-The solution to this issue doesn't actually give us cache flushes, it just provides proper software ordering. So this will give us proper data integrity with devices that have no write caching (or write caching is disabled) and no command queuing. If you have command queuing or write cache enabled there is no guarantee of data integrity after a crash.+Red Hat Enterprise Linux 4 kernel 2.6.9-67 does not enable barriers if it detects WCE (Write Cache Enabled) as being disabled. However, the PERC 6/i controller on Dell 2950 hardware improperly reports WCE as being disabled while still maintaining a write cache. This could have caused file system corruption in the event of a power outage due to data being maintained in the controller cache even though the kernel was told that data was being written through to disk.
+
+The solution to this issue provides proper software ordering, with the result that proper data integrity is achieved with devices that have no write caching (or for which write caching is disabled), and no command queuing. If command queuing or write caching is enabled, there is no guarantee of data integrity following a crash.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html |