Bug 509908

Summary: lvchange seems to return before having actually finished
Product: [Fedora] Fedora Reporter: Piergiorgio Sartor <piergiorgio.sartor>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: agk, bmarzins, bmr, dwysocha, heinzm, kzak, lvm-team, mbroz, msnitzer, prajnoha, prockai
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-27 16:15:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output during "cryptsetup luksClose"
none
output during "lvchange -an"
none
output during "mdadm --stop" none

Description Piergiorgio Sartor 2009-07-06 19:49:45 UTC
Description of problem:
I've some RAID arrays, grouped together with LVM.
This setup is started/stopped with scripts.
The start script activates the RAIDs, enables the LVM and mounts the partition.
The stop script does the reverse: umount partition, disable LVM, stop RAIDs.
It is done this way because the physical disks are USB ones and the "device" is connected and used on backup demands.

The "stop" script is:

#!/bin/bash
umount /mnt/raid &&
lvchange -an /dev/RAID_A/lvmStorage &&
sleep 1 &&
mdadm --stop /dev/md/vol*

Without the "sleep 1" (and maybe even "sleep 2"), I get errors from "mdadm --stop ..." about not being possible to stop some arrays, since they're in use.
With the "sleep 1", everything seems to run fine, even if I got some other errors later, but these maybe unrelated.

Under F-10 this was happening sometimes, now, with F-11, it seems to happen always.

Version-Release number of selected component (if applicable):
lvm2-2.02.45-4.fc11.x86_64

How reproducible:
Always, since F-11.

Steps to Reproduce:
1.
Start the "device" with the proper script.
2.
Stop the "device" with the above script.
  
Actual results:
Without "sleep 1" errors are reported about some RAID arrays being still in use.
With "sleep 1" everything seems to be fine.

Expected results:
The script should run through without "sleep 1".

Additional info:
"start" script:

#!/bin/bash
mdadm --assemble --scan -c mdadm.conf &&
lvchange -ay /dev/RAID_A/lvmStorage &&
mount /dev/mapper/luksStorage /mnt/raid

Used mdadm.conf:

ARRAY /dev/md/vol00 auto=md level=raid6 metadata=1.1 num-devices=10 UUID=5fd435f0:c665de15:14785720:9f8814cd name=vol00
ARRAY /dev/md/vol01 auto=md level=raid6 metadata=1.1 num-devices=9 UUID=467b8169:6d2838ca:2f3b5189:80e2f92d name=vol01
ARRAY /dev/md/vol02 auto=md level=raid6 metadata=1.1 num-devices=8 UUID=ea4a3ef2:b01c6f31:f331b125:39e6809f name=vol02
ARRAY /dev/md/vol03 auto=md level=raid6 metadata=1.1 num-devices=7 UUID=68663a34:7f3defd7:14145d67:34fd7ea8 name=vol03
ARRAY /dev/md/vol04 auto=md level=raid6 metadata=1.1 num-devices=6 UUID=e8c720e1:b0211066:57abd0e7:e5c301ce name=vol04
ARRAY /dev/md/vol05 auto=md level=raid6 metadata=1.1 num-devices=4 UUID=7343ddd9:0fd6910d:36561688:f990e92b name=vol05
ARRAY /dev/md/vol06 auto=md level=raid6 metadata=1.1 num-devices=4 UUID=6f2ebde4:d8bd3a0d:dad7d7ff:3357ddb1 name=vol06

A couple of questions pop up in my mind:

1) Is "lvchange -an /dev/RAID_A/lvmStorage" the correct way to disable an LVM volume?
2) Is there any way to wait until "lvchange -an /dev/RAID_A/lvmStorage" is really finished after it returns?

Thanks,

piergiorgio

Comment 1 Milan Broz 2009-07-06 20:02:35 UTC
(In reply to comment #0)
> 1) Is "lvchange -an /dev/RAID_A/lvmStorage" the correct way to disable an LVM
> volume?

yes (assuming that the volume is not mounted)

> 2) Is there any way to wait until "lvchange -an /dev/RAID_A/lvmStorage" is
> really finished after it returns?

the mapping should be removed after lvchange -an is finished, no sleep should be needed...

I think this can be problem with udev rules... to understand this problem better - please can you try to run "lsof" and "ps ax" (save output to file and paste here please) immediately after lvchange (before sleep) - is there anything runinng what can cause this problem?

Comment 2 Piergiorgio Sartor 2009-07-06 21:26:24 UTC
A couple of items.

The "lsof.txt" file is quite big, around 550KiB, is anyway OK to attach here?

Since I do not know how sensible is the information in "lsof.txt" and "ps.txt", I would like to restrict the access to this bug.
Or, you tell me which information to look for in the two files (which may also solve the 550KiB issue).

About the files: I generated them separately, in two runs.

In both case I added "lsof > lsof.txt" or "ps ax > ps.txt" after "lvchange" and before the "sleep", but still with "&&" as separator.

piergiorgio

Comment 3 Vincent Danen 2009-07-06 22:40:57 UTC
This is not a security bug, so please don't tag it as such unless there is a valid concern regarding the security of the package (not the security of the attachments).  I'm sure you can sanitize the output if you think it necessary, or you can email it to Milan directly if you do not wish it attached to bugzilla.

Thanks.

Comment 4 Milan Broz 2009-07-07 06:22:02 UTC
Not sure if you can flag attachment private (there is an option for that), but you can mail it to me directly.

Or, just look for anything what can have md device or lv open in output.

Comment 5 Piergiorgio Sartor 2009-07-07 06:41:33 UTC
Well, grep I can use... :-)

grep md lsof.txt shows the list of md devices/threads, mdadm with its stuff, hald and devkit accessing /proc/mdstat and, as last line:

vol_id    7103    nobody    3r      BLK                9,4 0x9b6960200      30823 /dev/md4

which seems to me suspicious.

grep lv lsof.txt catches only "libresolv" and "libcalvcal".

grep vg lsof.txt catches "librsvg" and "svg_loader.so".

grep pv lsof.txt doesn't get anything.

grep mv ps.txt shows the usual stuff, i.e. mdadm --monitor and the kernel threads, plus:

... /sbin/mdadm --examine --export /dev/block/8:85

egrep 'lv|vg|pv' ps.txt does not report anything.

Hope this helps,

piergiorgio

Comment 6 Milan Broz 2009-07-07 08:13:57 UTC
(In reply to comment #5)
> vol_id    7103    nobody    3r      BLK                9,4 0x9b6960200     
> 30823 /dev/md4

yes, that's probably the problem.

so some someone (udev rule probably) run vol_id on all change device events...

Comment 7 Piergiorgio Sartor 2009-07-07 09:36:11 UTC
Well, looking into /etc/udev did not bring anything, but /lib/udev/rules.d returned:

$> grep vol_id *
60-persistent-storage.rules:KERNEL=="sr*", ENV{ID_CDROM_MEDIA}=="?*", IMPORT{program}="vol_id --export --skip-raid --offset=$env{ID_CDROM_MEDIA_SESSION_LAST_OFFSET} $tempnode"
60-persistent-storage.rules:KERNEL!="sr*", IMPORT{program}="vol_id --export $tempnode"
64-md-raid.rules:IMPORT{program}="vol_id --export $tempnode"
95-devkit-disks.rules:IMPORT{program}="vol_id --export $tempnode"

$> rpm -qf 60-persistent-storage.rules
udev-141-3.fc11.i586

$> rpm -qf 64-md-raid.rules
udev-141-3.fc11.i586

$> rpm -qf 95-devkit-disks.rules
DeviceKit-disks-004-3.fc11.i586

What could be a convenient way to check which of those is guilty?

Any suggestions or ideas?

Thanks,

piergiorgio

Comment 8 Peter Rajnoha 2009-07-07 10:05:49 UTC
Yes, the problem in here is the call to "vol_id" as Milan pointed out... 

All the rules (60-persistent-storage.rules, 64-md-raid.rules, 95-devkit-disks.rules) react to "change/add" events only, so when we call "lvchange -an"
which causes "remove" event only, the problem should not occur. Unfortunately
it still occurs which means that there is a previous "change" event still being
processed (this also happens sometimes with our tests included with the source
code and running "make check" -- some tests fail because devices could
not be cleaned up at the end of a test, they are still opened by "vol_id").

This situation is solved by waiting for udev rules in the lvm code and this is
a new feature that will be added to upstream LVM soon. However, the problem
still exists without using the "wait-for-udev" feature and I'm not quite sure
if this could be solved easily without the use of wait when dealing with
change/add udev events.

But I'm still curious, because the problem described in this report is about
"lvchange -an" (so only "remove" event considered within udev, which should be
OK). Do you run any other lvm commands JUST before this one? (...which could
possibly cause change/add udev events being generated)

Comment 9 Piergiorgio Sartor 2009-07-07 11:04:32 UTC
(In reply to comment #8)
[...]
> But I'm still curious, because the problem described in this report is about
> "lvchange -an" (so only "remove" event considered within udev, which should be
> OK). Do you run any other lvm commands JUST before this one? (...which could
> possibly cause change/add udev events being generated)  

Maybe quite a good call.

Not an LVM command, but a DM related.

After "umount", and before "lvhange -an", there is another command (I removed it to avoid confusion, but maybe I caused more, instead), which is "cryptsetup luksClose ..."

Could it be the one causing the udev events even after "lvchange"?

piergiorgio

Comment 10 Peter Rajnoha 2009-07-07 11:44:06 UTC
Well, luksClose shoudn't be a problem -- it does not generate change/add events. 

Isn't there anything else being run before "lvchange -an"? Or, if possible, could you please run udevd with debug trace (--debug --debug-trace) and try to send the log. There must be change/add event here that causes that "vol_id" udev rule to fire off...

Comment 11 Peter Rajnoha 2009-07-07 11:50:25 UTC
Or maybe try to send the log of "udevadm monitor --kernel --env" while running the script that causes the problem, too, so we can see directly if there's a change/add event before.

Comment 12 Piergiorgio Sartor 2009-07-07 19:17:08 UTC
Here we go.

I run "udevadm monitor --kernel --env" in three rounds, one for each "stop" script command (excluding "umount", which did not produced any udev related events).

So, basically I run by hand "luksClose", "lvchange" and "mdadm" and recorded separately the results of "udevadm monitor".

I'll attach the files here after.

Anyway, it seems "lvchange" is causing quite a mess, if you ask me... :-)

piergiorgio

Comment 13 Piergiorgio Sartor 2009-07-07 19:18:03 UTC
Created attachment 350854 [details]
output during "cryptsetup luksClose"

Comment 14 Piergiorgio Sartor 2009-07-07 19:18:30 UTC
Created attachment 350855 [details]
output during "lvchange -an"

Comment 15 Piergiorgio Sartor 2009-07-07 19:18:52 UTC
Created attachment 350856 [details]
output during "mdadm --stop"

Comment 16 Peter Rajnoha 2009-07-08 09:40:00 UTC
Hmm, looking at the logs -- luks and lvchange should be OK, there are "remove" events only. If it is the vol_id call in the udev rule that causes the problems, this one is run after change/add events only.

I've looked at the udev rules again and we can rule out 60-persistent-storage.rules from the list, because of this (dm-* and md* devices are skipped):

# skip rules for inappropriate block devices
KERNEL=="fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md*", GOTO="persistent_storage_end"

Also, looking at the udev log for mdadm, there's a "change" event just before "remove" event for md devices:

KERNEL[1246993983.472736] change   /devices/virtual/block/md127 (block)
KERNEL[1246993983.479704] remove   /devices/virtual/block/md127 (block)

Timestamps are quite close, so it's possible that vol_id has the device still opened. The problem is that this does not give an answer for the question why using "sleep" in the script helps, because it is used *before* mdadm command...

OK, just one little experiment -- could you please try commenting out this line in 64-md-raid.rules (temporarily) and see if the problem still occurs:

#IMPORT{program}="vol_id --export $tempnode

If it won't help, please, try commenting out temporarily the same line in 95-devkit-disks.rules. This way we can see which one is really causing problems. Thanks.

Comment 17 Piergiorgio Sartor 2009-07-08 10:00:36 UTC
One note, in "lvchange.log" I see things like:

KERNEL[1246993955.903619] change   /devices/pci0000:00/0000:00:0b.1/usb1/1-7/1-7.1/1-7.1:1.0/host8/target8:0:0/8:0:0:0/block/sdd/sdd6 (block)

/dev/sdd6, like others, is one of the RAID arrays components belonging to the LVM volume in question.

Specifically, everything from /dev/sddX on belongs to the RAID arrays grouped with LVM.

I'll try commenting the udev rules later, I'll let you know.

piergiorgio

Comment 18 Piergiorgio Sartor 2009-07-08 17:28:56 UTC
Uhm, it seems to me that vol_id might not be the cause.

I commented out the line in both the rules (actually even all three), reloaded the rules with "udevadm control --reload-rules" (which, by the way, does not seem to provide any feedback), removed the "sleep 1" from the stop script, and, initially, it seemed to work, but after a couple of trials I got again:

mdadm: stopped /dev/md/vol00
mdadm: failed to stop array /dev/md/vol01: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
mdadm: failed to stop array /dev/md/vol02: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
mdadm: failed to stop array /dev/md/vol03: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
mdadm: failed to stop array /dev/md/vol04: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
mdadm: failed to stop array /dev/md/vol05: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
mdadm: failed to stop array /dev/md/vol06: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?

I was able to capture, with "ps axww", the following:

...
21032 pts/2    S+     0:00 /bin/bash ./storage_stop.sh
21037 pts/2    D+     0:00 lvchange -an /dev/RAID_A/lvmStorage
21446 ?        S<     0:00 /sbin/udevd -d
21449 ?        S<     0:00 /sbin/udevd -d
21451 ?        S<     0:00 /sbin/udevd -d
21507 ?        D<     0:00 /sbin/mdadm --examine --export /dev/block/8:165
21508 ?        D<     0:00 /sbin/mdadm --examine --export /dev/block/8:164
21509 ?        S<     0:00 /sbin/udevd -d
21510 ?        S<     0:00 /sbin/udevd -d
21513 ?        S<     0:00 /sbin/udevd -d
21516 ?        D<     0:00 /sbin/mdadm --examine --export /dev/block/8:193
21518 ?        D<     0:00 /sbin/mdadm --examine --export /dev/block/8:163
21520 ?        D<     0:00 /sbin/mdadm --examine --export /dev/block/8:195
21522 ?        R<     0:00 /sbin/udevd -d

As you can see, while lvchange is still running, a lot of things involving udevd and mdadm are happening.

Are those normal?

piergiorgio

Comment 19 Peter Rajnoha 2009-07-09 16:14:45 UTC
No, they are not normal :) These change events shouldn't be there...

I've tried to remove all the udev rules and those change events were still there! It's the udevd itself that causes the problems. F10 uses an older version of udevd which is OK ("udev-127") and runs OK. F11 uses "udev-141" which generates a lot of change events when it should not.

I searched back and the problem appears since "udev-140". I guess this has to do something with inotify watches used in udevd internally. Debug traces differ right in messages about setting and unsetting these watches, also changelog for 140 mentions:

"udevd: switch watch symlinks to devpath"

...but this is probably a question for udev guys.

The outcome of this udev's new "feature" is that there are a lot of change events generated. Since they are change events, there are some rules applied for those devices which keeps them open while being processed. And this in turn does not allow us to remove those devices...

You mentioned that something similar happened in F10, but not too often (even if there's an older version of udev). I observed this, too, but in this case it was the "vol_id" that caused the problems (while running the commands one after another, generating udev events that are just too close in time, so the device is still opened by "vol_id" when we want to remove it with the another command in sequence).

But generaly, both situations are about the same...

Comment 20 Piergiorgio Sartor 2009-07-09 17:55:57 UTC
(In reply to comment #19)
> No, they are not normal :) These change events shouldn't be there...
[...]

Thanks for the investigation and the explanation.
Now it is clear what is going on and what was going on.

What should we do now?

Should we change component to "udev"?

Will you do? Should I do?

Thanks again,

piergiorgio

Comment 21 Karel Zak 2009-07-09 20:15:26 UTC
(In reply to comment #19)

> The outcome of this udev's new "feature" is that there are a lot of change
> events generated.

 Not that the "udev watch" is based on IN_CLOSE_WRITE inotify event, it means the device has to be modified.

Comment 22 Karel Zak 2009-07-09 21:08:38 UTC
Ah.. nonsense, inotify man page:

   IN_CLOSE_WRITE    File opened for writing was closed 

so open(O_RDWR) is enough to generate the event on the device. The write(2) is not required. Sorry.

Comment 23 Peter Rajnoha 2009-07-10 03:30:51 UTC
I was thinking about the "vol_id" situation if there's a way to make a workaround for this, but I'm afraid we can't do much in here -- it's the way it works. But the situation is solved by using udev-waiting code which should be included in F12, I hope..

(In reply to comment #22)
> so open(O_RDWR) is enough to generate the event on the device. The write(2) is
> not required. Sorry.

..well, I don't know if we can prevent those reads. Is there any chance that this would be tweaked in udevd somehow, so there are no confusing change events generated?

Comment 24 Peter Rajnoha 2009-07-13 08:25:53 UTC
OK, first we'll try to change the way the devices are openend (to read-only) where possible...

Comment 25 Peter Rajnoha 2009-07-17 09:37:47 UTC
Just for the record and completeness -- I think the exact path leading to the bug reported here is:

- creating md array (runs 64-md-raid.rules and sets inotify watch on md device)
- lvchange, scanning, opening the device RW, fires the inotify watch generating change event
- running "mdadm --examine" on md device (in 95-devkit-disks.rules)
- while still opened by "mdadm --examine", calling "mdadm --stop"

Nice chain, isn't it? :)

The patch for opening devices read-only while scanning has been uploaded to upstream CVS -- ver. 2.02.49 (but there is still problem with lvcreate, lvremove...).

Comment 26 Piergiorgio Sartor 2009-11-20 15:21:53 UTC
Hi all,

is there any news in regard of this issue and the new F12?

Is there any update there, different from F11?

Thanks,

bye,

pg

Comment 27 Peter Rajnoha 2009-11-23 10:19:24 UTC
As for the patch that opens devices read-only while scanning, this one is included in F12 already so the exact problem reported here should not occur anymore.

The udev-waiting feature (or udev_sync as we renamed it later on) will be included in F13 only (it's in current F13 rawhide). We did not manage to include this one in F12 because of some problems we had with other applications using new layout in /dev for device-mapper devices (and these ones had to be fixed first). This one prevents races when a device is opened from within a udev rule.

But still, there's a problem with inotify watch feature in udev that we find quite problematic from our point of view and is a source of hidden and quite complex situations/races. We can only synchronise with udev events that we know about, that means udev events generated while creating/resuming, renaming and removing LVM/DM devices. We can't apply the synchronisation with udev in the situation where the events are generated artificially (like with the inotify watch). And we can't open devices read-only in all situations either (that would prevent this to happen).

(..so I'll keep this report opened until we have a proper solution for the udev inotify watch feature as well, but that will require cooperation with udev team, of course..)

Comment 28 Piergiorgio Sartor 2010-04-07 08:59:01 UTC
Hi,

(In reply to comment #27)
> As for the patch that opens devices read-only while scanning, this one is
> included in F12 already so the exact problem reported here should not occur
> anymore.

I can confirm I was not able to reproduce the issue in F12, so it seems this specific problem is fixed.

[...]
> (..so I'll keep this report opened until we have a proper solution for the udev
> inotify watch feature as well, but that will require cooperation with udev
> team, of course..)    

OK, thanks for the support.

bye,

pg

Comment 29 Bug Zapper 2010-04-27 15:31:00 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 30 Peter Rajnoha 2010-04-27 16:15:54 UTC
(In reply to comment #27)
> But still, there's a problem with inotify watch feature in udev that we find
> quite problematic from our point of view and is a source of hidden and quite
> complex situations/races.

As for the problem with the "watch" rule, we have this one mentioned in other bug reports as well (just like the bug 570359 or bug 577798).

(In reply to comment #28)
> I can confirm I was not able to reproduce the issue in F12, so it seems this
> specific problem is fixed.

...so I'll close this one then.