Bug 808189 - split off raid images don't appear to contain proper mirrored data from the primary leg
split off raid images don't appear to contain proper mirrored data from the p...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
x86_64 Linux
low Severity low
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-29 14:51 EDT by Corey Marthaler
Modified: 2012-04-17 09:30 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-17 09:30:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2012-03-29 14:51:40 EDT
Description of problem:
This looks like a major data corruption issue. I was writing an fs data verification test case for raid split images with tracking enabled and noticed that none of the files i had put on the raid appeared on the split off raid image.

SCENARIO - [split_w_tracking_io_merge]
Create a 3-way raid1 with fs data, verify data, split image with tracking, change data on raid vol, merge split image data back, verify origin data
taft-01: lvcreate --type raid1 -m 2 -n split_tracking -L 500M split_image
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 63.55% )
   1/1 mirror(s) are fully synced: ( 100.00% )

Placing an ext filesystem on raid1 volume
mke2fs 1.41.12 (17-May-2010)
Mounting raid1 volume

Writing files to /mnt/split_tracking
checkit starting with:
CREATE
Num files:          500
Random Seed:        31292
Verify XIOR Stream: /tmp/split_tracking.1376
Working dir:        /mnt/split_tracking

Checking files on /mnt/split_tracking
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/split_tracking.1376
Working dir:        /mnt/split_tracking


splitting off leg from raid with tracking...

+++ Mounting and verifying split image data +++
mount: block device /dev/mapper/split_image-split_tracking_rimage_2 is write-protected, mounting read-only
Checking files on /mnt/split_tracking_rimage_2
checkit starting with:
VERIFY
Verify XIOR Stream: /tmp/split_tracking.1376
Working dir:        /mnt/split_tracking_rimage_2
Can not stat hqhywtvurjlwncpubifhmgfcnchenfswxavvw: No such file or directory
checkit verify failed

# HERE THERE WERE NO FILES IN THAT MOUNTED RAID IMAGE

So I then attempted to boil this down a bit and found the following:

[root@taft-01 ~]# pvscan
  PV /dev/sdb1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdc1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdd1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sde1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdf1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdg1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sdh1   VG VG          lvm2 [135.66 GiB / 135.66 GiB free]
  PV /dev/sda2   VG vg_taft01   lvm2 [67.75 GiB / 0    free]
  Total: 8 [1017.40 GiB] / in use: 8 [1017.40 GiB] / in no VG: 0 [0   ]

[root@taft-01 ~]# lvcreate -m 2 --type raid1 -L 500M -n raid VG
  Logical volume "raid" created

[root@taft-01 ~]# lvs -a -o +devices
  LV              VG  Attr     LSize   Copy%  Devices
  raid            VG  rwi-a-m- 500.00m  18.40 raid_rimage_0(0),raid_rimage_1(0),raid_rimage_2(0)
  [raid_rimage_0] VG  Iwi-aor- 500.00m        /dev/sdb1(1)
  [raid_rimage_1] VG  Iwi-aor- 500.00m        /dev/sdc1(1)
  [raid_rimage_2] VG  Iwi-aor- 500.00m        /dev/sdd1(1)
  [raid_rmeta_0]  VG  ewi-aor-   4.00m        /dev/sdb1(0)
  [raid_rmeta_1]  VG  ewi-aor-   4.00m        /dev/sdc1(0)
  [raid_rmeta_2]  VG  ewi-aor-   4.00m        /dev/sdd1(0)

[root@taft-01 ~]# lvs -a -o +devices
  LV              VG  Attr     LSize   Copy%  Devices
  raid            VG  rwi-a-m- 500.00m 100.00 raid_rimage_0(0),raid_rimage_1(0),raid_rimage_2(0)
  [raid_rimage_0] VG  iwi-aor- 500.00m        /dev/sdb1(1)
  [raid_rimage_1] VG  iwi-aor- 500.00m        /dev/sdc1(1)
  [raid_rimage_2] VG  iwi-aor- 500.00m        /dev/sdd1(1)
  [raid_rmeta_0]  VG  ewi-aor-   4.00m        /dev/sdb1(0)
  [raid_rmeta_1]  VG  ewi-aor-   4.00m        /dev/sdc1(0)
  [raid_rmeta_2]  VG  ewi-aor-   4.00m        /dev/sdd1(0)

[root@taft-01 ~]# mkfs /dev/VG/raid 
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
128016 inodes, 512000 blocks
25600 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67633152
63 block groups
8192 blocks per group, 8192 fragments per group
2032 inodes per group
Superblock backups stored on blocks: 
        8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409

Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@taft-01 ~]# mkdir /mnt/raid /mnt/raid1 /mnt/raid2
[root@taft-01 ~]# mount /dev/VG/raid /mnt/raid
[root@taft-01 ~]# touch /mnt/raid/foobar
[root@taft-01 ~]# mount
/dev/mapper/VG-raid on /mnt/raid type ext2 (rw)

[root@taft-01 ~]# ls /mnt/raid
foobar  lost+found

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new1 VG/raid
[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new2 VG/raid

[root@taft-01 ~]# mount /dev/VG/new1 /mnt/raid1
[root@taft-01 ~]# mount /dev/VG/new2 /mnt/raid2

[root@taft-01 ~]# ls /mnt/raid1
ls: cannot access /mnt/raid1/foobar: Input/output error
foobar  lost+found
[root@taft-01 ~]# ls /mnt/raid2
ls: cannot access /mnt/raid2/foobar: Input/output error
foobar  lost+found

### I/O ERRORS !?!?

[root@taft-01 ~]# lvs -a -o +devices
  LV      VG  Attr     LSize   Copy%  Devices
  new1    VG  -wi-ao-- 500.00m        /dev/sdd1(1)
  new2    VG  -wi-ao-- 500.00m        /dev/sdc1(1)
  raid    VG  -wi-ao-- 500.00m        /dev/sdb1(1)

### I then removed everything and tried again:

[root@taft-01 ~]# lvs -a -o +devices
  LV              VG  Attr     LSize   Copy%  Devices
  raid            VG  rwi-a-m- 500.00m 100.00 raid_rimage_0(0),raid_rimage_1(0),raid_rimage_2(0)
  [raid_rimage_0] VG  iwi-aor- 500.00m        /dev/sdb1(1)
  [raid_rimage_1] VG  iwi-aor- 500.00m        /dev/sdc1(1)
  [raid_rimage_2] VG  iwi-aor- 500.00m        /dev/sdd1(1)
  [raid_rmeta_0]  VG  ewi-aor-   4.00m        /dev/sdb1(0)
  [raid_rmeta_1]  VG  ewi-aor-   4.00m        /dev/sdc1(0)
  [raid_rmeta_2]  VG  ewi-aor-   4.00m        /dev/sdd1(0)

[root@taft-01 ~]# mkfs /dev/VG/raid 
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
128016 inodes, 512000 blocks
25600 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67633152
63 block groups
8192 blocks per group, 8192 fragments per group
2032 inodes per group
Superblock backups stored on blocks: 
        8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409

Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@taft-01 ~]# mount /dev/VG/raid /mnt/raid
[root@taft-01 ~]# touch /mnt/raid/foobar
[root@taft-01 ~]# touch /mnt/raid/glarch
[root@taft-01 ~]# echo "11111111" > /mnt/raid/foobar
[root@taft-01 ~]# echo "22222222" > /mnt/raid/glarch
[root@taft-01 ~]# cat /mnt/raid/foobar
11111111
[root@taft-01 ~]# cat /mnt/raid/glarch
22222222

[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new1 VG/raid 
[root@taft-01 ~]# lvconvert --splitmirrors 1 --name new2 VG/raid 
[root@taft-01 ~]# lvs -a -o +devices
  LV      VG  Attr     LSize   Copy%  Devices
  new1    VG  -wi-a--- 500.00m        /dev/sdd1(1)
  new2    VG  -wi-a--- 500.00m        /dev/sdc1(1)
  raid    VG  -wi-ao-- 500.00m        /dev/sdb1(1)

[root@taft-01 ~]# mount /dev/VG/new1 /mnt/raid1
[root@taft-01 ~]# mount /dev/VG/new2 /mnt/raid2

[root@taft-01 ~]# mount
/dev/mapper/VG-raid on /mnt/raid type ext2 (rw)
/dev/mapper/VG-new1 on /mnt/raid1 type ext2 (rw)
/dev/mapper/VG-new2 on /mnt/raid2 type ext2 (rw)


[root@taft-01 ~]# ls -l /mnt/raid1
total 15
-rw-r--r--. 1 root root     9 Mar 29 11:52 foobar
-rw-r--r--. 1 root root     0 Mar 29 11:51 glarch
drwx------. 2 root root 12288 Mar 29 11:50 lost+found
[root@taft-01 ~]# ls -l /mnt/raid2
total 15
-rw-r--r--. 1 root root     9 Mar 29 11:52 foobar
-rw-r--r--. 1 root root     0 Mar 29 11:51 glarch
drwx------. 2 root root 12288 Mar 29 11:50 lost+found
[root@taft-01 ~]# cat /mnt/raid1/foobar
11111111
[root@taft-01 ~]# cat /mnt/raid2/foobar
11111111
[root@taft-01 ~]# cat /mnt/raid1/glarch
[root@taft-01 ~]# cat /mnt/raid2/glarch

### WHERE'S THE DATA ON THAT 2ND FILE !?!


Version-Release number of selected component (if applicable):
2.6.32-251.el6.x86_64
lvm2-2.02.95-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
lvm2-libs-2.02.95-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
lvm2-cluster-2.02.95-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.74-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
device-mapper-libs-1.02.74-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
device-mapper-event-1.02.74-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
device-mapper-event-libs-1.02.74-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
cmirror-2.02.95-2.el6    BUILT: Fri Mar 16 08:39:54 CDT 2012
Comment 1 Corey Marthaler 2012-03-29 16:59:11 EDT
Looks like this may just be a buffer issue and that the data just takes awhile to get to disk. Issuing a sync before the splitmirrors seems to solve the missing data problem. 

I don't remember having to issue syncs though when doing lvm mirror i/o verification.
Comment 3 Jonathan Earl Brassow 2012-04-17 09:30:10 EDT
I failed to check earlier, but the 'mirror' segment type also has this limitation.  You must call 'sync' (or similar) before splitting off the mirror leg.  It is expected that the file system is caching data and that we need to flush it.

A feature request could be created to make 'splitmirror' behave more like snapshots, which also flushes the file system when a suspend is issued.  For now though, this is not a bug.

Note You need to log in before you can comment on or make changes to this bug.