Bug 692746 - Hard shutdowning of a machine with fs operations on lvm on top of hardware raid caused fs corruption
Summary: Hard shutdowning of a machine with fs operations on lvm on top of hardware ra...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks: 846704 961026
TreeView+ depends on / blocked
 
Reported: 2011-04-01 02:39 UTC by Igor Zhang
Modified: 2013-05-09 03:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-09 03:58:53 UTC
Target Upstream Version:


Attachments (Terms of Use)
When fs corruptions were found for the first time, its dmesg. (101.24 KB, application/octet-stream)
2011-04-01 02:41 UTC, Igor Zhang
no flags Details
When fs corruptions were found for the first time, its test log. (4.59 KB, application/octet-stream)
2011-04-01 02:42 UTC, Igor Zhang
no flags Details
When fs corruptions were found for the second time, its test log. (56.19 KB, application/octet-stream)
2011-04-01 02:43 UTC, Igor Zhang
no flags Details

Description Igor Zhang 2011-04-01 02:39:57 UTC
Description of problem:
Hard shutdowning of a machine with fs operations on lvm on top of hardware raid caused fs corruption.
when I did power failure testing for rhel6.1(https://tcms.engineering.redhat.com/run/18635/?from_plan=1232), I found:
For the test scenario "nobarriers and local write cache off" with workload "fs_mark -d /media/vol1/dir -d /media/vol2/dir -s 51200 -n 4096 -L 10 -r 8 -D 128", I got twice failed. Logs are attached.

Version-Release number of selected component (if applicable):
RHEL6.1-20110311.3
Kernel 2.6.32-122.el6.x86_64

How reproducible:
At times

Steps to Reproduce:
1.Please view https://tcms.engineering.redhat.com/run/18635/?from_plan=1232 and the case "power testing: fs build on LVM on hard RAID".

  
Actual results:
Filesystems corruptions found.

Expected results:
Filesystems are sane.

Additional info:

Comment 1 Igor Zhang 2011-04-01 02:41:44 UTC
Created attachment 489285 [details]
When fs corruptions were found for the first time, its dmesg.

Comment 3 Igor Zhang 2011-04-01 02:42:45 UTC
Created attachment 489286 [details]
When fs corruptions were found for the first time, its test log.

Comment 4 Igor Zhang 2011-04-01 02:43:59 UTC
Created attachment 489287 [details]
When fs corruptions were found for the second time, its test log.

Comment 5 Igor Zhang 2011-04-01 02:45:18 UTC
For the second failure of this case, I forgot to collect its dmesg. Sorry.

Comment 6 Ric Wheeler 2011-04-01 12:38:01 UTC
Hi Igor,

What hardware raid card do we have in this test?

Thanks!

Comment 7 Eric Sandeen 2011-04-01 14:30:08 UTC
When you say local write caches are off - are they off on both the raid card and the drive behind it?

Comment 8 Tom Coughlan 2011-04-01 15:14:52 UTC
The boot log in comment 1 shows: 

megaraid_sas 0000:07:00.0: irq 88 for MSI/MSI-X
megaraid_sas: fw state:c0000000
megasas: fwstate:c0000000, dis_OCR=0
scsi0 : LSI SAS based MegaRAID driver
scsi 0:0:10:0: Direct-Access     SEAGATE  ST9146802SS      0003 PQ: 0 ANSI: 5
scsi 0:0:11:0: Direct-Access     SEAGATE  ST9146802SS      0003 PQ: 0 ANSI: 5
scsi 0:0:12:0: Direct-Access     SEAGATE  ST9146802SS      0003 PQ: 0 ANSI: 5
scsi 0:0:13:0: Direct-Access     SEAGATE  ST9146802SS      0003 PQ: 0 ANSI: 5
scsi 0:2:0:0: Direct-Access     INTEL    RS2BL080         2.90 PQ: 0 ANSI: 5
...
sd 0:2:0:0: [sda] 1140621312 512-byte logical blocks: (583 GB/543 GiB)
sd 0:2:0:0: [sda] Write Protect is off
sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
 sda: sda1 sda2 sda3 sda4 < sda5 >
sd 0:2:0:0: [sda] Attached SCSI disk

The tcms test description referenced shows commands like:

/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp  -EnDskCache -L0 -a0
mkfs.ext4 /dev/mapper/vg1-vol1

and

/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp  -DisDskCache -L0 -a0
mkfs.ext4 /dev/mapper/vg1-vol1

Igor: as Eric said, these commands control the state of the cache on the RAID card. It is also necessary to determine the state of the volatile cache on the back-end disk drives. Please take a look at MegaCli64 and see if there is a way to determine this. 

Eric/Ric: Should we be doing some sort of a re-scan of the sd device, to update the  state in the o.s. block layer, after the MegaCli64 utility is used to change to the state of the device's cache?  Or is it adequate to just use mount -o barrier etc. ?

Comment 9 Eric Sandeen 2011-04-01 15:33:01 UTC
Tom, well, extN and XFS will both give up on sending barriers once they fail; but the filesystems themselves don't care directly about write cache state, I >think<.

So from the fs perspective, I don't think we need a rescan, but maybe lower levels care?

Comment 10 Ric Wheeler 2011-04-01 15:58:18 UTC
I think that we toggle barrier behavior correctly when the WCE bit changes for whatever reason. Christoph, is that correct?

Comment 11 Tom Coughlan 2011-04-01 16:25:21 UTC
(In reply to comment #10)
> I think that we toggle barrier behavior correctly when the WCE bit changes for
> whatever reason. Christoph, is that correct?

No. Christoph and Mike S. discussed this on IRC a bit. If the device's cache state changes (like if the battery dies, or someone uses an out-of-band utility to change it), the device should return a Unit Attention to the o.s. (which UA is an interesting question, since there is not one I know of for this specific event. Probably just Parameters Changed...). Linux currently ignores these UAs. (I thought that UA handling was proposed for the LSF agenda, but I don't see it there now...)

I do no know to what extent it matters, if the FC is explicit about barrier on/off when it is mounted. I'm sure Christoph can help with that.

Comment 12 Tom Coughlan 2011-04-01 16:27:18 UTC
I mean "...if the FS is explicit..."

Comment 13 Ric Wheeler 2011-04-01 16:36:43 UTC
That would be nice to have us do something with those.

Absent that support, I suppose that user space needs to monitor and remount (which is certainly not the best way to handle this).

Comment 14 RHEL Program Management 2011-04-04 02:42:30 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 15 Igor Zhang 2011-04-07 01:48:39 UTC
(In reply to comment #6)
> Hi Igor,
> 
> What hardware raid card do we have in this test?
> 
> Thanks!

One raid 1 disk with two SEAGATE ST9146802SS 145 GB
And it's LSI SAS based MegaRAID.

Comment 16 Igor Zhang 2011-04-07 01:52:18 UTC
(In reply to comment #7)
> When you say local write caches are off - are they off on both the raid card
> and the drive behind it?

I just set up the raid card cache and didn't touch the drive.

Comment 17 RHEL Program Management 2011-10-07 15:28:31 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 18 RHEL Program Management 2012-12-14 07:41:12 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.


Note You need to log in before you can comment on or make changes to this bug.