Bug 678359

Summary:

online disk resizing may cause data corruption

Product:

Red Hat Enterprise Linux 5

Reporter:

Jeff Moyer <jmoyer>

Component:

kernel

Assignee:

Jeff Moyer <jmoyer>

Status:

CLOSED ERRATA

QA Contact:

Eryu Guan <eguan>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

5.6

CC:

eguan, qcai

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

678357

Environment:

Last Closed:

2011-07-21 09:22:56 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
test WIP	none

Description Jeff Moyer 2011-02-17 17:10:57 UTC

+++ This bug was initially created as a clone of Bug #678357 +++

Description of problem:

https://lkml.org/lkml/2011/2/17/15

From: NeilBrown <neilb>
Date: Thu, 17 Feb 2011 16:37:30 +1100
Subject: [PATCH] Fix over-zealous flush_disk when changing device size.

There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a
which causes check_disk_size_change to call
flush_disk.

Version-Release number of selected component (if applicable):
kernel-2.6.18-112.el5

How reproducible:
Steps to Reproduce:
Actual results:
Expected results:

I will attempt to put together an automated test for this.

Comment 1 RHEL Program Management 2011-02-17 17:22:43 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Jeff Moyer 2011-02-21 17:58:51 UTC

Here's the email I got from Neil Brown outlining how he was able to reproduce the file system corruption.  I was unable to make things break using this.  I'll attach a script that I was writing for this purpose as well, though that hasn't triggered corruption for me either.

This script, using the mdadm from 
   git://neil.brown.name/mdadm devel-3.2

triggers it quite reliably for me.

I haven't tried to reproduce with native metadata, and that does take a
slightly different code path so it shouldn't be too hard.
The important steps are:

 1/ create a smallish array (So reshape only takes a few seconds)
 2/ mkfs ; mount; copy some files
          This leave you with some dirty data in RAM
 3/ reshape array
          when this finishes it changes the same of the device.
          flush_disk will then try to prune dentries, which makes the inodes
          dirty, and then will invalidate those inodes.
 4/ unmount
 5/ fsck - to discover the corruption.

NeilBrown


export IMSM_NO_PLATFORM=1
export IMSM_DEVNAME_AS_SERIAL=1
export MDADM_EXPERIMENTAL=1
umount /mnt/vol
mdadm -Ss
rm -f /backup.bak

#create container
mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sda /dev/sdb /dev/sdc -R

#create volume
mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 64 --size 104857 -n 3 /dev/sda /dev/sdb /dev/sdc -R
mkfs /dev/md/raid5vol_0
mount /dev/md/raid5vol_0 /mnt/vol

#copy some files from current directory
cp * /mnt/vol

#add spare
mdadm --add /dev/md/imsm0 /dev/sdd

mdadm --wait /dev/md/raid5vol_0

#start reshape
mdadm --grow /dev/md/imsm0 --raid-devices 4 --backup-file=/backup.bak
#mdadm  --wait /dev/md/raid5vol_0
sleep 10
while grep reshape /proc/mdstat > /dev/null
do sleep 1
done
while ps axgu | grep 'md[a]dm' > /dev/null
do sleep 1
done
umount /mnt/vol
fsck -f -n /dev/md/raid5vol_0

Comment 6 Jeff Moyer 2011-02-21 18:00:20 UTC

Created attachment 479974 [details]
test WIP

I tried both using loop devices (as that will be easier to automate) and several partitions on the same physical device.  I have not yet tried using 4 separate devices.

Comment 7 Jeff Moyer 2011-02-25 22:49:03 UTC

I was under the impression that the test was trying to perform 16 separate streaming reads from different areas of the disk.  From the blktrace data, it appears that all 16 threads are reading the same locations on disk.  Is this expected or not?

The reads are coming from scsi_id.  I'm not sure what is running scsi_id, but we should try to track that down and stop it so it doesn't interfere with the results.

Comment 8 Jeff Moyer 2011-03-01 19:47:43 UTC

(In reply to comment #7)

Ignore that update... wrong bug!  ;-)

Comment 11 Jarod Wilson 2011-03-23 21:45:00 UTC

Patch(es) available in kernel-2.6.18-250.el5
Detailed testing feedback is always welcomed.

Comment 14 errata-xmlrpc 2011-07-21 09:22:56 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html