Summary: | lvchange seems to return before having actually finished | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Piergiorgio Sartor <piergiorgio.sartor> | ||||||||
Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 11 | CC: | agk, bmarzins, bmr, dwysocha, heinzm, kzak, lvm-team, mbroz, msnitzer, prajnoha, prockai | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-04-27 16:15:54 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Attachments: |
|
Description
Piergiorgio Sartor
2009-07-06 19:49:45 UTC
(In reply to comment #0) > 1) Is "lvchange -an /dev/RAID_A/lvmStorage" the correct way to disable an LVM > volume? yes (assuming that the volume is not mounted) > 2) Is there any way to wait until "lvchange -an /dev/RAID_A/lvmStorage" is > really finished after it returns? the mapping should be removed after lvchange -an is finished, no sleep should be needed... I think this can be problem with udev rules... to understand this problem better - please can you try to run "lsof" and "ps ax" (save output to file and paste here please) immediately after lvchange (before sleep) - is there anything runinng what can cause this problem? A couple of items. The "lsof.txt" file is quite big, around 550KiB, is anyway OK to attach here? Since I do not know how sensible is the information in "lsof.txt" and "ps.txt", I would like to restrict the access to this bug. Or, you tell me which information to look for in the two files (which may also solve the 550KiB issue). About the files: I generated them separately, in two runs. In both case I added "lsof > lsof.txt" or "ps ax > ps.txt" after "lvchange" and before the "sleep", but still with "&&" as separator. piergiorgio This is not a security bug, so please don't tag it as such unless there is a valid concern regarding the security of the package (not the security of the attachments). I'm sure you can sanitize the output if you think it necessary, or you can email it to Milan directly if you do not wish it attached to bugzilla. Thanks. Not sure if you can flag attachment private (there is an option for that), but you can mail it to me directly. Or, just look for anything what can have md device or lv open in output. Well, grep I can use... :-) grep md lsof.txt shows the list of md devices/threads, mdadm with its stuff, hald and devkit accessing /proc/mdstat and, as last line: vol_id 7103 nobody 3r BLK 9,4 0x9b6960200 30823 /dev/md4 which seems to me suspicious. grep lv lsof.txt catches only "libresolv" and "libcalvcal". grep vg lsof.txt catches "librsvg" and "svg_loader.so". grep pv lsof.txt doesn't get anything. grep mv ps.txt shows the usual stuff, i.e. mdadm --monitor and the kernel threads, plus: ... /sbin/mdadm --examine --export /dev/block/8:85 egrep 'lv|vg|pv' ps.txt does not report anything. Hope this helps, piergiorgio (In reply to comment #5) > vol_id 7103 nobody 3r BLK 9,4 0x9b6960200 > 30823 /dev/md4 yes, that's probably the problem. so some someone (udev rule probably) run vol_id on all change device events... Well, looking into /etc/udev did not bring anything, but /lib/udev/rules.d returned: $> grep vol_id * 60-persistent-storage.rules:KERNEL=="sr*", ENV{ID_CDROM_MEDIA}=="?*", IMPORT{program}="vol_id --export --skip-raid --offset=$env{ID_CDROM_MEDIA_SESSION_LAST_OFFSET} $tempnode" 60-persistent-storage.rules:KERNEL!="sr*", IMPORT{program}="vol_id --export $tempnode" 64-md-raid.rules:IMPORT{program}="vol_id --export $tempnode" 95-devkit-disks.rules:IMPORT{program}="vol_id --export $tempnode" $> rpm -qf 60-persistent-storage.rules udev-141-3.fc11.i586 $> rpm -qf 64-md-raid.rules udev-141-3.fc11.i586 $> rpm -qf 95-devkit-disks.rules DeviceKit-disks-004-3.fc11.i586 What could be a convenient way to check which of those is guilty? Any suggestions or ideas? Thanks, piergiorgio Yes, the problem in here is the call to "vol_id" as Milan pointed out... All the rules (60-persistent-storage.rules, 64-md-raid.rules, 95-devkit-disks.rules) react to "change/add" events only, so when we call "lvchange -an" which causes "remove" event only, the problem should not occur. Unfortunately it still occurs which means that there is a previous "change" event still being processed (this also happens sometimes with our tests included with the source code and running "make check" -- some tests fail because devices could not be cleaned up at the end of a test, they are still opened by "vol_id"). This situation is solved by waiting for udev rules in the lvm code and this is a new feature that will be added to upstream LVM soon. However, the problem still exists without using the "wait-for-udev" feature and I'm not quite sure if this could be solved easily without the use of wait when dealing with change/add udev events. But I'm still curious, because the problem described in this report is about "lvchange -an" (so only "remove" event considered within udev, which should be OK). Do you run any other lvm commands JUST before this one? (...which could possibly cause change/add udev events being generated) (In reply to comment #8) [...] > But I'm still curious, because the problem described in this report is about > "lvchange -an" (so only "remove" event considered within udev, which should be > OK). Do you run any other lvm commands JUST before this one? (...which could > possibly cause change/add udev events being generated) Maybe quite a good call. Not an LVM command, but a DM related. After "umount", and before "lvhange -an", there is another command (I removed it to avoid confusion, but maybe I caused more, instead), which is "cryptsetup luksClose ..." Could it be the one causing the udev events even after "lvchange"? piergiorgio Well, luksClose shoudn't be a problem -- it does not generate change/add events. Isn't there anything else being run before "lvchange -an"? Or, if possible, could you please run udevd with debug trace (--debug --debug-trace) and try to send the log. There must be change/add event here that causes that "vol_id" udev rule to fire off... Or maybe try to send the log of "udevadm monitor --kernel --env" while running the script that causes the problem, too, so we can see directly if there's a change/add event before. Here we go. I run "udevadm monitor --kernel --env" in three rounds, one for each "stop" script command (excluding "umount", which did not produced any udev related events). So, basically I run by hand "luksClose", "lvchange" and "mdadm" and recorded separately the results of "udevadm monitor". I'll attach the files here after. Anyway, it seems "lvchange" is causing quite a mess, if you ask me... :-) piergiorgio Created attachment 350854 [details]
output during "cryptsetup luksClose"
Created attachment 350855 [details]
output during "lvchange -an"
Created attachment 350856 [details]
output during "mdadm --stop"
Hmm, looking at the logs -- luks and lvchange should be OK, there are "remove" events only. If it is the vol_id call in the udev rule that causes the problems, this one is run after change/add events only. I've looked at the udev rules again and we can rule out 60-persistent-storage.rules from the list, because of this (dm-* and md* devices are skipped): # skip rules for inappropriate block devices KERNEL=="fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md*", GOTO="persistent_storage_end" Also, looking at the udev log for mdadm, there's a "change" event just before "remove" event for md devices: KERNEL[1246993983.472736] change /devices/virtual/block/md127 (block) KERNEL[1246993983.479704] remove /devices/virtual/block/md127 (block) Timestamps are quite close, so it's possible that vol_id has the device still opened. The problem is that this does not give an answer for the question why using "sleep" in the script helps, because it is used *before* mdadm command... OK, just one little experiment -- could you please try commenting out this line in 64-md-raid.rules (temporarily) and see if the problem still occurs: #IMPORT{program}="vol_id --export $tempnode If it won't help, please, try commenting out temporarily the same line in 95-devkit-disks.rules. This way we can see which one is really causing problems. Thanks. One note, in "lvchange.log" I see things like: KERNEL[1246993955.903619] change /devices/pci0000:00/0000:00:0b.1/usb1/1-7/1-7.1/1-7.1:1.0/host8/target8:0:0/8:0:0:0/block/sdd/sdd6 (block) /dev/sdd6, like others, is one of the RAID arrays components belonging to the LVM volume in question. Specifically, everything from /dev/sddX on belongs to the RAID arrays grouped with LVM. I'll try commenting the udev rules later, I'll let you know. piergiorgio Uhm, it seems to me that vol_id might not be the cause. I commented out the line in both the rules (actually even all three), reloaded the rules with "udevadm control --reload-rules" (which, by the way, does not seem to provide any feedback), removed the "sleep 1" from the stop script, and, initially, it seemed to work, but after a couple of trials I got again: mdadm: stopped /dev/md/vol00 mdadm: failed to stop array /dev/md/vol01: Device or resource busy Perhaps a running process, mounted filesystem or active volume group? mdadm: failed to stop array /dev/md/vol02: Device or resource busy Perhaps a running process, mounted filesystem or active volume group? mdadm: failed to stop array /dev/md/vol03: Device or resource busy Perhaps a running process, mounted filesystem or active volume group? mdadm: failed to stop array /dev/md/vol04: Device or resource busy Perhaps a running process, mounted filesystem or active volume group? mdadm: failed to stop array /dev/md/vol05: Device or resource busy Perhaps a running process, mounted filesystem or active volume group? mdadm: failed to stop array /dev/md/vol06: Device or resource busy Perhaps a running process, mounted filesystem or active volume group? I was able to capture, with "ps axww", the following: ... 21032 pts/2 S+ 0:00 /bin/bash ./storage_stop.sh 21037 pts/2 D+ 0:00 lvchange -an /dev/RAID_A/lvmStorage 21446 ? S< 0:00 /sbin/udevd -d 21449 ? S< 0:00 /sbin/udevd -d 21451 ? S< 0:00 /sbin/udevd -d 21507 ? D< 0:00 /sbin/mdadm --examine --export /dev/block/8:165 21508 ? D< 0:00 /sbin/mdadm --examine --export /dev/block/8:164 21509 ? S< 0:00 /sbin/udevd -d 21510 ? S< 0:00 /sbin/udevd -d 21513 ? S< 0:00 /sbin/udevd -d 21516 ? D< 0:00 /sbin/mdadm --examine --export /dev/block/8:193 21518 ? D< 0:00 /sbin/mdadm --examine --export /dev/block/8:163 21520 ? D< 0:00 /sbin/mdadm --examine --export /dev/block/8:195 21522 ? R< 0:00 /sbin/udevd -d As you can see, while lvchange is still running, a lot of things involving udevd and mdadm are happening. Are those normal? piergiorgio No, they are not normal :) These change events shouldn't be there... I've tried to remove all the udev rules and those change events were still there! It's the udevd itself that causes the problems. F10 uses an older version of udevd which is OK ("udev-127") and runs OK. F11 uses "udev-141" which generates a lot of change events when it should not. I searched back and the problem appears since "udev-140". I guess this has to do something with inotify watches used in udevd internally. Debug traces differ right in messages about setting and unsetting these watches, also changelog for 140 mentions: "udevd: switch watch symlinks to devpath" ...but this is probably a question for udev guys. The outcome of this udev's new "feature" is that there are a lot of change events generated. Since they are change events, there are some rules applied for those devices which keeps them open while being processed. And this in turn does not allow us to remove those devices... You mentioned that something similar happened in F10, but not too often (even if there's an older version of udev). I observed this, too, but in this case it was the "vol_id" that caused the problems (while running the commands one after another, generating udev events that are just too close in time, so the device is still opened by "vol_id" when we want to remove it with the another command in sequence). But generaly, both situations are about the same... (In reply to comment #19) > No, they are not normal :) These change events shouldn't be there... [...] Thanks for the investigation and the explanation. Now it is clear what is going on and what was going on. What should we do now? Should we change component to "udev"? Will you do? Should I do? Thanks again, piergiorgio (In reply to comment #19) > The outcome of this udev's new "feature" is that there are a lot of change > events generated. Not that the "udev watch" is based on IN_CLOSE_WRITE inotify event, it means the device has to be modified. Ah.. nonsense, inotify man page: IN_CLOSE_WRITE File opened for writing was closed so open(O_RDWR) is enough to generate the event on the device. The write(2) is not required. Sorry. I was thinking about the "vol_id" situation if there's a way to make a workaround for this, but I'm afraid we can't do much in here -- it's the way it works. But the situation is solved by using udev-waiting code which should be included in F12, I hope.. (In reply to comment #22) > so open(O_RDWR) is enough to generate the event on the device. The write(2) is > not required. Sorry. ..well, I don't know if we can prevent those reads. Is there any chance that this would be tweaked in udevd somehow, so there are no confusing change events generated? OK, first we'll try to change the way the devices are openend (to read-only) where possible... Just for the record and completeness -- I think the exact path leading to the bug reported here is: - creating md array (runs 64-md-raid.rules and sets inotify watch on md device) - lvchange, scanning, opening the device RW, fires the inotify watch generating change event - running "mdadm --examine" on md device (in 95-devkit-disks.rules) - while still opened by "mdadm --examine", calling "mdadm --stop" Nice chain, isn't it? :) The patch for opening devices read-only while scanning has been uploaded to upstream CVS -- ver. 2.02.49 (but there is still problem with lvcreate, lvremove...). Hi all, is there any news in regard of this issue and the new F12? Is there any update there, different from F11? Thanks, bye, pg As for the patch that opens devices read-only while scanning, this one is included in F12 already so the exact problem reported here should not occur anymore. The udev-waiting feature (or udev_sync as we renamed it later on) will be included in F13 only (it's in current F13 rawhide). We did not manage to include this one in F12 because of some problems we had with other applications using new layout in /dev for device-mapper devices (and these ones had to be fixed first). This one prevents races when a device is opened from within a udev rule. But still, there's a problem with inotify watch feature in udev that we find quite problematic from our point of view and is a source of hidden and quite complex situations/races. We can only synchronise with udev events that we know about, that means udev events generated while creating/resuming, renaming and removing LVM/DM devices. We can't apply the synchronisation with udev in the situation where the events are generated artificially (like with the inotify watch). And we can't open devices read-only in all situations either (that would prevent this to happen). (..so I'll keep this report opened until we have a proper solution for the udev inotify watch feature as well, but that will require cooperation with udev team, of course..) Hi, (In reply to comment #27) > As for the patch that opens devices read-only while scanning, this one is > included in F12 already so the exact problem reported here should not occur > anymore. I can confirm I was not able to reproduce the issue in F12, so it seems this specific problem is fixed. [...] > (..so I'll keep this report opened until we have a proper solution for the udev > inotify watch feature as well, but that will require cooperation with udev > team, of course..) OK, thanks for the support. bye, pg This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping (In reply to comment #27) > But still, there's a problem with inotify watch feature in udev that we find > quite problematic from our point of view and is a source of hidden and quite > complex situations/races. As for the problem with the "watch" rule, we have this one mentioned in other bug reports as well (just like the bug 570359 or bug 577798). (In reply to comment #28) > I can confirm I was not able to reproduce the issue in F12, so it seems this > specific problem is fixed. ...so I'll close this one then. |