Bug 176623

Summary: lvm2 - pvcreate refuses to work
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED CANTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dwysocha, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-01 19:15:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2005-12-27 20:49:17 UTC
Description of problem:

On an unused disk I am trying to run

   pvcreate -M2 /dev/sdb1

This invariably fails with "Can't open /dev/sdb1 exclusively.  Mounted 
filesystem?"  The filesystem is not mounted, selinux is turned off, lsof
and fuser do not show any users but pvcreate fails.  This happens even
when booted single-user.  In strace one can see the following:

........
stat("/dev/sdb1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
stat("/dev/sdb1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
stat("/dev/sdb1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
open("/dev/sdb1", O_RDWR|O_DIRECT|O_NOATIME) = 4
fstat(4, {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
ioctl(4, BLKBSZGET, 0x688020)           = 0
lseek(4, 0, SEEK_SET)                   = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
close(4)                                = 0
stat("/dev/sdb1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
stat("/dev/sdb1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
stat("/dev/sdb1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 17), ...}) = 0
open("/dev/sdb1", O_RDWR|O_EXCL|O_DIRECT|O_NOATIME) = -1 EBUSY (Device or
resource busy)
........

What gives?  A change in a meaning of O_EXCL flag?  A bug somewhere else?

If I will do instead 'pvcreate -M 2 /dev/sdb' results are even stranger.
I see "Device /dev/sdb not found." and strace shows

........
open("/dev/sdb", O_RDWR|O_DIRECT|O_NOATIME) = 4
fstat(4, {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 16), ...}) = 0
ioctl(4, BLKBSZGET, 0x68a990)           = 0
lseek(4, 0, SEEK_SET)                   = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
close(4)                                = 0
write(2, "  ", 2)                       = 2
write(2, "Device /dev/sdb not found.", 26) = 26
........

Huh?

Version-Release number of selected component (if applicable):
lvm2-2.02.01-1.1

How reproducible:
always

Comment 1 Michal Jaegermann 2005-12-27 20:58:57 UTC
I should add that if I will erase a partition table on /dev/sdb then
'pvcreate -M 2 /dev/sdb' results in "Can't open /dev/sdb exclusively...."
instead.

Comment 2 Alasdair Kergon 2006-02-01 16:39:30 UTC
Two separate issues here.

Firstly, the "Device not found" message is confusing: the lowest device library
layer in lvm2 has determined that it must not use that device (because it is
partitioned) so hides it from the upper layers.

Secondly, the O_EXCL implies something on your system is already using that
device.  If you can't see what, step through the boot/starting process manually
and keep checking to see at what point things go wrong.

Comment 3 Michal Jaegermann 2006-02-01 17:08:57 UTC
> has determined that it must not use that device (because it is partitioned)

Well, no.  Originally that device was not partitioned but 'system-config-lvm'
strongly advised to create a partition and put one with id 0x8e which covered
the whole disk.  The first I attempted to skip the step but ended with the
same complaint from pvcreate albeit about /dev/sdb and not /dev/sdb1.

> Secondly, the O_EXCL implies something on your system is already using that
> device.

This is not mounted, and not partitioned at this moment, disk. Both lsof
and fuser do not report anything.  The only thing I can think of which possibly
looks at that disk would be hal/dbus.  Doing '/etc/init.d/haldaemon stop'
does not change anything.  Going to level 3 and stopping both had and dbus
also does not help.

Comment 4 Alasdair Kergon 2006-02-01 18:01:40 UTC
For the first point, note that you often have to reboot after changing the
partition table before changes take effect in the kernel - but lvm2 works on the
basis of what it sees in the on-disk partition table.  So if you remove a
partition table but don't reboot, lvm2 might think that 'sdb' should be used -
but the partitions are still present in the kernel and lvm2 will get the O_EXCL
error until you reboot.

If you see O_EXCL after a reboot, then like I say, track down what's causing it
by booting to single user mode, checking it's not in use yet, then starting
things up one-at-a-time.

Comment 5 Michal Jaegermann 2006-02-02 20:39:25 UTC
> For the first point, note that you often have to reboot

Sigh!  To quote myself "Originally that device was not partitioned...".
That partition table was written by 'system-config-lvm' when I tried if
this will not make it happier after initial failures.

In the meantime this partition table was gone, and it showed up, a few
times and a test machine was rebooted many times.  It really does not matter.

> track down what's causing it by booting to single user mode ...

After booting into a single user mode and immediately after that running

    pvcreate -M2 /dev/sdb

I see the same "Mounted filesystem?" failure and results from strace which
do not substantially differ from what was quoted above.  BTW - there is no
/dev/sdb1 partition at this moment.


Comment 6 Michal Jaegermann 2006-03-14 18:59:34 UTC
'nash' on initrd does the following (in this particular setup and among
other things:
....
insmod /lib/dm-snapshot.ko
mkblkdevs
rmparts sdb
dm create pdc_cjfeejidea 0 488397056 linear 8:16 0
dm partadd pdc_cjfeejidea
rmparts sdc
dm create pdc_cjhbfdhhaa 0 488397056 linear 8:32 0
dm partadd pdc_cjhbfdhhaa
....

Is this that what makes /dev/sdb and /dev/sdc "busy"?  In such case
how one is supposed to add new disks to lvm save manually hacking a
special initrd which is not touching specific devices?  It seems like
a chicken-and-egg problem.


Comment 7 Michal Jaegermann 2006-05-24 19:22:07 UTC
As a matter of fact I was forced to hack initrd due to a bug #192157
and I commented out in init script used by initrd the whole block
doing 'rmparts ...', 'dm create ...' and 'dm partadd' or otherwise I cannot
boot.  The bug is still there and 'pvcreate' refuses to work as
it fails to open exclusively /dev/sdb, not used by anything I can tell,
and error message suggesting "mounted filesystem" - which is a clear
nonsense.

So the bug is still alive and kicking with lvm2-2.02.06-1.1 and how
to create new lvm2 volumes on a running system is a mystery.

Comment 8 Michal Jaegermann 2006-10-01 19:15:03 UTC
I know, at last, why this problem shows up.  The disk in question
was "recycled" and dmraid finds on it something which is passing
for dmraid signature.  Then initrd adds that dm map, apparently
even when dm modules are missing, and a disk becomes "busy",
or partitions on it "do not exist", without any hints anywhere
why this may be the case.

Only after such signature was removed with 'dmraid -r -E ...'
_and_ initrd rebuild _and_ reboot was performed with a new
initrd, then a disk is accessible to pvcreate (and mkswap, and
mkfs, and ...).  So the problem is really elsewhere.