Testing F20 Beta TC5, I did a UEFI install to an Intel firmware RAID-0 set, using automatic partitioning. This creates two ext4 partitions, md126p1 and md126p2, for /boot and /boot/efi , then LVs for / , swap and /home on md126p3. After install completes successfully, boot times out trying to mount /home and falls into rescue mode. /dev/mapper contains fedora_adam-root and fedora_adam-swap, but no fedora_adam-home . cmdline contains parameters "rd.lvm.lv=fedora_adam/root" and "rd.lvm.lv=fedora_adam/swap", but no "rd.lvm.lv=fedora_adam/home". Editing /boot/efi/EFI/fedora/grub.cfg and adding "rd.lvm.lv=fedora_adam/home" to the kernel parameters resolves the issue; the next boot completes successfully, /dev/mapper/fedora_adam-home exists, and /home is mounted. /var/log/anaconda/anaconda.log seems to indicate that anaconda set the cmdline, so this at first would appear to be an anaconda bug, but dracut may also be involved: CCing harald. Proposing as a Beta blocker per criterion "The installer must be able to detect and install to hardware or firmware RAID storage devices." - https://fedoraproject.org/wiki/Fedora_20_Beta_Release_Criteria#Hardware_and_firmware_RAID - though UEFI may be involved here too, I'll have to see if it happens to a BIOS install as well. Will attach anaconda logs.
Created attachment 815983 [details] anaconda.log
Created attachment 815984 [details] program.log
Created attachment 815985 [details] storage.log
Created attachment 815986 [details] journal.log
Same with a BIOS install, UEFI is not a factor here.
A non-RAID install also only has rd.lvm.lv kernel parameters for root and swap - not home - but it boots successfully; somehow, the home LV is found in this case. I don't know what's different between the non-RAID and RAID cases which causes the LV to be found in non-RAID but not in RAID.
Using rd.lvm.lv just causes that dracut will activate the home volume directly. I think what's failing here is the LVM autoactivation (using udev+lvmetad). There were some changes lately in this area in lvm2 which, I believe, should resolve also this issue. I'll try to prepare the update for F20 with proper patches...
lvm2-2.02.103-2.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/lvm2-2.02.103-2.fc20
Package lvm2-2.02.103-2.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing lvm2-2.02.103-2.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-19943/lvm2-2.02.103-2.fc20 then log in and leave karma (feedback).
+1 blocker
+1 blocker.
Marking as accepted blocker, proposed fix will be in TC6.
Sorry, lvm2-2.02.103-2.fc20 does not fix this. Tested Beta TC6, which has this lvm2 included: the bug still occurs. Did something else also have to be updated for the fix to be complete?
*** Bug 989607 has been marked as a duplicate of this bug. ***
lvm2-2.02.103-2.fc20 does not fix this for me either. Tested on F20 Beta TC6.
Could you please send the output of: - lsblk - udevadm info --export-db - systemctl -a | grep lvm2-pvscan (and possibly for each line found, systemctl status lvm2-pvscan for more detailed info) Thanks.
Created attachment 817377 [details] lsblk.out
Created attachment 817378 [details] udevadm.out
I attached outputs of lsblk and udevadm. lvm2-pvscan service is not found.
(In reply to Martin Krizek from comment #20) > I attached outputs of lsblk and udevadm. lvm2-pvscan service is not found. That's exactly the missing part! I don't see this in the udev database even: SYSTEMD_WANTS="lvm2-pvscan@<major>:<minor>.service" where major and minor are the exact device numbers for underlying PVs. For some reason, it's not assigned. Then the service that activates the volume is not run, of course (missing the exact lvm2-pvscan@<major>:<minor>.service). Investigating further...
OK, I've checked on Martin's machine directly and the problem is in processing the partitions on MD devices - they need to be handled in a special way, not exactly the same as bare MD devices. I'll prepare a patch and I'll update lvm2 then...
The patch to fix LVM autoactivation (as well as lvmetad update) for MD partitions: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f070e3543a411d483b7c34b6ea8e6e8e0cc35edf
lvm2-2.02.103-3.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/lvm2-2.02.103-3.fc20
Package lvm2-2.02.103-3.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing lvm2-2.02.103-3.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-20330/lvm2-2.02.103-3.fc20 then log in and leave karma (feedback).
Martin, please re-test again, thanks.
After updating to lvm2-2.02.103-3.fc20, the issue disappears.
lvm2-2.02.103-3.fc20 does not find the PV on a raid 10 md device on my machine. When the shell appears, mdadm --detail shows a properly activated raid10 array and then running pvscan --cache --activate ay /dev/md127 activates the VG and LVs. Downgrading to lvm2-2.02.102-1.fc20.x86_64 fixes the problem on my machine. This was on an installation from Fedora-20-Nightly-x86_64-Live-kde-20131029.09-1.iso
(In reply to Clyde E. Kunkel from comment #28) > lvm2-2.02.103-3.fc20 does not find the PV on a raid 10 md device on my > machine. > > When the shell appears, mdadm --detail shows a properly activated raid10 > array and then running pvscan --cache --activate ay /dev/md127 activates the > VG and LVs. > Please, provide the same debug info as described in comment #17. Thanks.
Created attachment 818314 [details] output from lsblk output from lsblk. output from udevadm, systemctl -a | grep lvm2-pvscan, then systemctl status <each pvscan.service>, and finally mdadm --detail /dev/md127. Above gathered after boot sequence dropped me into a shell. After taking diagnostics, pvscan --cache --activate ay /dev/md127 allowed me to continue.
Created attachment 818315 [details] output from udevadm
Created attachment 818316 [details] output from systemctl -a | grep lvm2-pvscan
Created attachment 818317 [details] output from systemctl status of the pvscan.services found
Created attachment 818318 [details] output from mdadm --detail /dev/md127
(In reply to Clyde E. Kunkel from comment #32) > Created attachment 818316 [details] > output from systemctl -a | grep lvm2-pvscan lvm2-pvscan@9:127.service loaded inactive dead LVM2 PV scan on device 9:127 This is the problem - the service's state is "inactive" so the "pvscan --cache -aay" was not run for the md device for some reason... Does it proceed when you call "systemctl restart lvm2-pvscan@9:127.service" just after the problem appears and you're dropped to the shell? Also, you can try switching to systemd debug mode with adding these to kernel command line: systemd.log_level=debug systemd.log_target=journal-or-kmsg Then getting the exact log with journalctl -b after boot.
(In reply to Peter Rajnoha from comment #35) > <snip> > Does it proceed when you call "systemctl restart lvm2-pvscan@9:127.service" > just after the problem appears and you're dropped to the shell? > > Also, you can try switching to systemd debug mode with adding these to > kernel command line: > > systemd.log_level=debug systemd.log_target=journal-or-kmsg > > Then getting the exact log with journalctl -b after boot. It does proceed after restart. checking the journal after desktop is up with journalctl | grep -A 40 'Emergency Mode' shows: Nov 01 09:53:23 new-host.home systemd[1]: Starting Emergency Mode. Nov 01 09:53:23 new-host.home systemd[1]: Reached target Emergency Mode. Nov 01 09:53:23 new-host.home systemd[1]: Job dev-mapper-VolGroup00\x2drhel6x.device/start timed out. Nov 01 09:53:23 new-host.home systemd[1]: Timed out waiting for device dev-mapper-VolGroup00\x2drhel6x.device. Nov 01 09:53:23 new-host.home systemd[1]: Dependency failed for /mnt/rhel6x. Nov 01 09:53:23 new-host.home systemd[1]: Dependency failed for /mnt/rhel6x/boot. Nov 01 09:53:23 new-host.home systemd[1]: Started Trigger Flushing of Journal to Persistent Storage. Nov 01 09:53:23 new-host.home systemd-journal[266]: Permanent journal is using 19.3M (max 4.0G, leaving 4.0G of free 40.5G, current limit 4.0G). Nov 01 09:53:23 new-host.home systemd[1]: Started Recreate Volatile Files and Directories. Nov 01 09:53:23 new-host.home systemd[1]: Started Security Auditing Service. Nov 01 09:53:23 new-host.home systemd[1]: Starting Update UTMP about System Reboot/Shutdown... Nov 01 09:53:23 new-host.home systemd[1]: Started Update UTMP about System Reboot/Shutdown. Nov 01 09:53:23 new-host.home systemd[1]: Starting Update UTMP about System Runlevel Changes... Nov 01 09:53:23 new-host.home systemd[1]: Started Update UTMP about System Runlevel Changes. Nov 01 09:53:23 new-host.home systemd[1]: Started Tell Plymouth To Write Out Runtime Data. Nov 01 09:53:23 new-host.home systemd[1]: Startup finished in 3.320s (kernel) + 4.111s (initrd) + 1min 31.667s (userspace) = 1min 39.098s. Nov 01 09:53:27 new-host.home auditctl[746]: No rules Nov 01 09:53:27 new-host.home auditctl[746]: AUDIT_STATUS: enabled=0 flag=1 pid=0 rate_limit=0 backlog_limit=320 lost=0 backlog=0 Nov 01 09:53:23 new-host.home auditd[745]: Started dispatcher: /sbin/audispd pid: 760 Nov 01 09:53:23 new-host.home audispd[760]: priority_boost_parser called with: 4 Nov 01 09:53:23 new-host.home audispd[760]: max_restarts_parser called with: 10 Nov 01 09:53:23 new-host.home audispd[760]: No plugins found, exiting Nov 01 09:53:23 new-host.home auditd[745]: Init complete, auditd 2.3.2 listening for events (startup state enable) Nov 01 09:53:27 new-host.home kernel: type=1305 audit(1383314003.068:2): audit_pid=745 old=0 auid=4294967295 ses=4294967295 res=1 Nov 01 09:53:52 new-host.home systemd[1]: Starting Stop Read-Ahead Data Collection... Nov 01 09:53:52 new-host.home systemd[1]: Started Stop Read-Ahead Data Collection. Nov 01 09:56:21 new-host.home systemd[1]: Starting LVM2 PV scan on device 9:127... Nov 01 09:56:21 new-host.home pvscan[814]: 4 logical volume(s) in volume group "VolGroup00" now active Nov 01 09:56:21 new-host.home systemd[1]: Found device /dev/dm-15. Nov 01 09:56:21 new-host.home systemd[1]: Found device /sys/devices/virtual/block/dm-15. Nov 01 09:56:21 new-host.home systemd[1]: Mounting /mnt/rawhide... Nov 01 09:56:21 new-host.home kernel: EXT4-fs (dm-15): mounted filesystem with ordered data mode. Opts: (null) Nov 01 09:56:21 new-host.home systemd[1]: Mounted /mnt/rawhide. Nov 01 09:56:21 new-host.home systemd[1]: Found device /dev/disk/by-uuid/e8caed50-16ad-4cc3-91ab-9a80dbbfb24d. Nov 01 09:56:21 new-host.home systemd[1]: Found device /dev/disk/by-id/dm-uuid-LVM-AibQIxauj3qmJqrkfR3iQyNvzeWv2XYJTeSbzECtD58ubDfEnhF3g4IY2pC0fHBA. Nov 01 09:56:21 new-host.home systemd[1]: Found device /dev/disk/by-id/dm-name-VolGroup00-rhel6x. Nov 01 09:56:21 new-host.home systemd[1]: Found device /dev/VolGroup00/rhel6x. Nov 01 09:56:21 new-host.home systemd[1]: Found device /dev/dm-13. Nov 01 09:56:21 new-host.home systemd[1]: Found device /sys/devices/virtual/block/dm-13. Nov 01 09:56:21 new-host.home systemd[1]: Mounting /mnt/rhel6x... etc and journalctl | grep 8:127 shows: Nov 01 09:56:21 new-host.home systemd[1]: Starting LVM2 PV scan on device 9:127... Nov 01 09:56:21 new-host.home systemd[1]: Started LVM2 PV scan on device 9:127. Nov 01 10:04:48 new-host.home systemd[1]: Stopping LVM2 PV scan on device 9:127... Nov 01 10:04:49 new-host.home systemd[1]: Stopped LVM2 PV scan on device 9:127. suggesting that the scan never took place BEFORE systemd attempted to mount the LVs on the PV.
(In reply to Clyde E. Kunkel from comment #36) > suggesting that the scan never took place BEFORE systemd attempted to mount > the LVs on the PV. It doesn't need to be in order - if there's the "mount" unit, it automatically waits for the device to appear and once it's in the system, the mount unit is started as well (if the device appears within a timeout). The actual thing here is that the lvm scan on that device never happens I've finally managed to reproduce it, I tried to restart several times and it's not 100% reproducible - it seems it's some race. I've two md arrays in my test system, and sometimes one is not processed correctly, sometimes both and sometimes it's just OK, for example in a test run, I've hit this situation where one MD device is processed correctly (9:127) and the other one not (9:0): [0] f20/~ # systemctl -a | grep lvm2-pvscan lvm2-pvscan@9:0.service loaded inactive dead LVM2 PV scan on device 9:0 lvm2-pvscan@9:127.service loaded active exited LVM2 PV scan on device 9:127 I've tried to enable systemd debug logging, and I can see: Nov 04 12:55:33 f20.virt systemd[1]: dev-block-9:127.device changed dead -> plugged Nov 04 12:55:33 f20.virt systemd[1]: sys-devices-virtual-block-md127.device changed dead -> plugged Nov 04 12:55:33 f20.virt systemd[1]: Trying to enqueue job lvm2-pvscan@9:127.service/start/fail Nov 04 12:55:33 f20.virt systemd[1]: Installed new job lvm2-pvscan@9:127.service/start as 351 Nov 04 12:55:33 f20.virt systemd[1]: Enqueued job lvm2-pvscan@9:127.service/start as 351 Nov 04 12:55:33 f20.virt systemd[1]: Starting LVM2 PV scan on device 9:127... Nov 04 12:55:33 f20.virt systemd[1]: About to execute: /usr/sbin/pvscan --cache --activate ay /dev/block/9:127 Which is for the pvscan on 9:127 MD device and which for which the pvscan was triggered and everything worked fine. However, for the other MD device (the 9:0 one), there's only this info in the systemd log: Nov 04 12:55:33 f20.virt systemd[1]: dev-block-9:0.device changed dead -> plugged And nothing else! Now, the udev information for that device is (selected relevant items only): [0] f20/~ # udevadm info --name=md0 E: MAJOR=9 E: MINOR=0 E: SYSTEMD_ALIAS=/dev/block/9:0 E: SYSTEMD_WANTS=lvm2-pvscan@9:0.service E: TAGS=:systemd: So from this side, everything seems to be correct. The only thing is that the lvm2-pvscan@9:0.service was not started for some reason. CC-ing Michal and Lennart - my question is, why the service is not started and it's kept inactive? Any idea what could be wrong here? The lvm2-pvscan@.service is defined as: [0] f20//lib/systemd/system # cat lvm2-pvscan@.service [Unit] Description=LVM2 PV scan on device %i Documentation=man:pvscan(8) DefaultDependencies=no BindsTo=dev-block-%i.device After=lvm2-lvmetad.socket Before=shutdown.target Conflicts=shutdown.target [Service] Type=oneshot RemainAfterExit=yes ExecStart=/usr/sbin/pvscan --cache --activate ay /dev/block/%i ExecStop=/usr/sbin/pvscan --cache %i
Re-tested with F20 beta RC2, the initial issue is gone.
The issue reported in comment 28 with a live kde 20131029.09 install also is present in an RC2 DVD kde install. Should this be a separate bz?
(In reply to Clyde E. Kunkel from comment #39) > The issue reported in comment 28 with a live kde 20131029.09 install also is > present in an RC2 DVD kde install. > > Should this be a separate bz? Yes, please.
(In reply to Kamil Páral from comment #40) > (In reply to Clyde E. Kunkel from comment #39) > > The issue reported in comment 28 with a live kde 20131029.09 install also is > > present in an RC2 DVD kde install. > > > > Should this be a separate bz? > > Yes, please. Reported as systemd bug #1026860 (at least for now until we have more detailed view of the problem from systemd side).
lvm2-2.02.103-3.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
lvm2-2.02.103-2.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
Issues with this release of LVM2 persist--please see bug #1026860.
This bug is back in Final TC1, because lvm2-2.02.103-2.fc20 was pushed stable later than lvm2-2.02.103-3.fc20 and 'superseded' it. We really need bodhi to stop doing that. As long as releng cleans up the mess ahead of TC2, the bug should go away again.
This issue also affects Fedora 19. Any chance we could get a fix in there as well?
(In reply to Philippe Troin from comment #46) > This issue also affects Fedora 19. > Any chance we could get a fix in there as well? For F19, we'd also need an update for dracut + probably a fix for the issue from bug #1026860 (if the systemd code is the same in F19 in this area). Then a few lvm2 patches... There's already an open bug for F19 to include fixes with respect to LVM on MD handling (985638). As a workround till the update is delivered, please, set global/use_lvmetad=0 in /etc/lvm/lvm.conf.
(In reply to Peter Rajnoha from comment #47) > (In reply to Philippe Troin from comment #46) > > This issue also affects Fedora 19. > > Any chance we could get a fix in there as well? > > For F19, we'd also need an update for dracut + probably a fix for the issue > from bug #1026860 (if the systemd code is the same in F19 in this area). > Then a few lvm2 patches... Well, I just recompiled lvm2 on F19, installed, and my issue was solved. My issue looked similar to this one: I have 2 VGs both backed by MD volumes, containing LVs which are all marked as mount-on-boot in /etc/fstab. Upon boot, only the VG containing the root filesystem was activated, and I was getting dropped in a systemd emergency shell after a lengthy timeout. I haven't regenerated the initrd images, I'm going to try that. > There's already an open bug for F19 to include fixes with respect to LVM on > MD handling (985638). I wasn't aware of that bug, thanks. > As a workround till the update is delivered, please, set > global/use_lvmetad=0 in /etc/lvm/lvm.conf. I already had global/use_lvmetad=0 as it was broken earlier in the F19 lifecycle and I had to disable it to make VGs on MD work earlier a couple of months ago. Having global/use_lvmetad=0 is not sufficient to make a 2 VG system both backed by MD boot unfortunately.
Sorry to comment myself, but I just wanted to add an extra point: (In reply to Philippe Troin from comment #48) > Well, I just recompiled lvm2 on F19, installed, and my issue was solved. I took and recompiled lvm2-2.02.103-3.fc20 on F19, and my problems are solved.
Bug still exists. Linux 3.14.5-200.fc20.x86_64 #1 SMP Mon Jun 2 14:26:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux rpm -qa|grep lvm2 lvm2-2.02.106-1.fc20.x86_64 lvm2-libs-2.02.106-1.fc20.x86_64 set global/use_lvmetad=0 in /etc/lvm/lvm.conf solve bug, but...
rsa: well, multiple other reporters explicitly report that it is fixed, and you provide absolutely no details about your configuration. it would be easier (or, well, at all possible) to see what's going on in your case if you provided...you know...any details at all. What hardware? What layout? Deployed when and how? thanks.
all sympthoms fully described in first messages of this bug definition. if need - i'm may give all details by my configurations, but this seems not needed - bug appears also in virtual (i'm make clean install in VMWare for test) environment, by making md0 device from two iSCSI tomes. PS bug not appears before last (3.14.5-200.fc20.x86_64) update, only after.
still doesn't sound like the same bug. it'd make things simpler if you could just file a new one, I think.