Bug 598135

Summary: LVM detection of MD volumes is broken
Product: [Fedora] Fedora Reporter: Mikuláš Patočka <mpatocka>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, mbroz, msnitzer, prajnoha, prockai
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.02.67 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-01 19:56:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mikuláš Patočka 2010-05-31 14:20:43 UTC
LVM should detect MD volumes and not treat them as LVM physical volumes if "md_component_detection = 1" is turned on (which is default). However, this detection is bypassed, when when a given volume is stored in the cache in /etc/lvm/cache/.cache Thus, LVM can access underyling MD devices
as it's physical volumes.

This can lead even to data corruption. For example, if the admin creates a PV and VG on MD-RAID0. Now, if MD-RAID0 is deactivated for some reason (for example unplugged cable on one of the disks), LVM treats the first RAID0 leg as its physical volume and allows accesses to it. But the sector numbers on RAID0 leg don't match the sector numbers on combined RAID0 device (where the VG was originally created), thus the situation can result in data corruption.

A script that triggers this behavior:

# Cleanup from previous script runs
modprobe dm-mod
vgchange -an
mdadm -S /dev/md1
# The next line adds /dev/sdc1 to lvm cache. This PV is ignored and overwritten # later.
# Alternatively, you can just add /dev/sdc1 to the cache manually.
pvcreate -ff /dev/sdc1
# Create the striped volume.
mdadm -C /dev/md1 -l 0 -c 256 -n 2 /dev/sdc1 /dev/sdd1
# Make a PV and VG on this volume
pvcreate -ff /dev/md1
vgcreate vg /dev/md1
# Make a LV and deactivate it
lvcreate -L 4M -n lv vg
lvchange -an vg/lv
# Stop the MD volume
mdadm -S /dev/md1
# Now, these three commands show that LVM treats /dev/sdc1 as a physical volume
# and reads VG metadata from it (it shouldn't because it has MD superblock).
# The sector offsets in /dev/sdc1 don't match the offsets in /dev/md1, thus
# if someone tried to manipulate VG in this state, it could cause data
# corruption.
pvs
vgs
lvs
# Here, LVM thinks that /dev/sdc1 is the underlying physical volume, not
# /dev/md1, thus this command overwrites random sectors of the original volume
# group.
lvcreate -L 4M -n lv2 vg

Comment 1 Alasdair Kergon 2010-06-01 12:00:37 UTC
1) mdadm should detect the PV and not permit the creation else wipe the PV signature;

2) lvm should detect the md signature more frequently - it can't rely on the cached result of an earlier check.  Would it eliminate races if we found a way to remove /dev/sdc1 from the cache when adding /dev/md1 to it?

Comment 2 Alasdair Kergon 2010-06-01 12:09:08 UTC
(If all else fails we'd need to add a 'is_md_device' flag to the PV header.)

Comment 3 Alasdair Kergon 2010-06-01 19:03:46 UTC
Doesn't address all the races, but try http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/lib/filters/filter-persistent.c.diff?cvsroot=lvm2&r1=1.41&r2=1.42

Comment 4 Mikuláš Patočka 2010-06-02 03:11:10 UTC
The patch works ... but I don't know what it does :)