Bug 1772956 - lvm2-lvmpolld.service hangs boot if RAID array connected
Summary: lvm2-lvmpolld.service hangs boot if RAID array connected
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Heinz Mauelshagen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-15 17:20 UTC by bryanhoop@gmail.com
Modified: 2020-04-10 18:20 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-18 18:34:51 UTC
Type: Bug


Attachments (Terms of Use)

Description bryanhoop@gmail.com 2019-11-15 17:20:19 UTC
Description of problem:
The Fedora 31 live boot media hangs on boot when my SSATA controller is enabled in the BIOS. Fedora 30 live boot media boots OK when the sSATA is enabled. I have three drives connected via SSATA...two in RAID 1 and one with a backup of an F30 system (backed up via rsync)

Startup hangs on "A start job is running for Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling"


Version-Release number of selected component (if applicable):


How reproducible:
100% on this machine: Supermicro X10DRi dual-Xeon system

Steps to Reproduce:
1. Enabled my SSATA controller, boot Fedora 31 live media (I've tried Workstation, netinstall, and MATE-Compiz spin)
2. Disable SSATA controller and boot Fedora 31...boots fine.


Additional info: Cannot change tty. I've waited for as long as 15 minutes for the start job to complete. I have also error checked all of my disks using the Fedora 30 live installer. There are two other Fedora installations (F31--buggy and broken gnome-keyring-pam has prompted this reinstall) and a working backup F30 image installed on two of the disks in this system.

Comment 1 bryanhoop@gmail.com 2019-11-15 19:32:35 UTC
I just did some more testing and found out that I CAN enable my controller, I just must disconnect all disks in an Intel RAID array and then I can boot. Other disks connected to sSATA that aren't in an array can be connected and Fedora will boot.

Comment 2 bryanhoop@gmail.com 2019-11-16 14:43:42 UTC
I've done some additional testing and I can boot successfully with the array connected if I add systemd.mask=lvm2-lvmpolld.service to the kernel command line. So lvm2-lvmpoll appears to be the culprit here.

Comment 3 romainguinot 2019-11-26 17:30:25 UTC
Just upgraded to F31 today on 2 machines. 
My work laptop didn't have any issues except for some strange Nvidia behaviour which is now fixed. 

My personal desktop did experience the same hanging boot. I also have a 2-drive BIOS RAID array in this desktop that is used as a backup for some portions of my home NAS.
Adding  systemd.mask=lvm2-lvmpolld.service allows to workaround the issue so your bugreport was very timely thank you very much. 

I rarely reboot the desktop so i will not add the config permanently to the boot config and see if that gets resolved in the next few weeks.

Comment 4 Heinz Mauelshagen 2019-12-11 15:00:15 UTC
Bryan,

any log messages from lvmpolld in syslog (journalctl -t lvm)?

Also, could you add '-l all' to the command line of lvmpolld in /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the resulting log?

Comment 5 bryanhoop@gmail.com 2019-12-18 18:34:51 UTC
I just had a chance to test this. I ran a dnf update, removed the systemd.mask for lvmpolld, and my machine booted without issue. So it appears that this was fixed by some update in the meantime.

Comment 6 Reaper 2019-12-28 15:42:16 UTC
(In reply to Heinz Mauelshagen from comment #4)
> Bryan,
> 
> any log messages from lvmpolld in syslog (journalctl -t lvm)?
> 
> Also, could you add '-l all' to the command line of lvmpolld in
> /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the
> resulting log?

Not sure if it is ok to respond on a closed bug, but I got the same problem even with the latest updates.
I have two intel fake raid mirrors. Same story, anything Fedora 31 related does not boot.

I added -l all to /usr/lib/systemd/system/lvm2-lvmpolld.service (ExecStart=/usr/sbin/lvmpolld -t 60 -f -l all) but compared to any previous boots it shows the same things:
-- Reboot --
Dez 28 16:25:36 lvm[874]:   WARNING: Scan ignoring device 8:33 with no paths.
Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
Dez 28 16:25:36 lvm[874]:   Path /dev/sdd2 no longer valid for device(8,50)
Dez 28 16:25:36 lvm[874]:   /dev/sdd2: stat failed: Datei oder Verzeichnis nicht gefunden
Dez 28 16:25:36 lvm[874]:   Path /dev/sdd1 no longer valid for device(8,49)
Dez 28 16:25:36 lvm[874]:   /dev/sdd1: stat failed: Datei oder Verzeichnis nicht gefunden
Dez 28 16:25:36 lvm[874]:   Path /dev/sdc2 no longer valid for device(8,34)
Dez 28 16:25:36 lvm[874]:   /dev/sdc2: stat failed: Datei oder Verzeichnis nicht gefunden
Dez 28 16:25:36 lvm[874]:   WARNING: Failed to get udev device handler for device /dev/sdc1.
-- Reboot --


Adding >systemd.mask=lvm2-lvmpolld.service< to the kernel line in grub "solved" the issue for me.
If you need anything else from me I am happy to help as best I can.

Comment 7 bryanhoop@gmail.com 2019-12-29 13:26:48 UTC
I wonder what would happen if you now remove the lvm2-lvmpolld.service mask and try to reboot? Perhaps after a successful boot (when the service is masked) the array is somehow configured or flagged correctly and subsequent boots with the service enabled will work (as was my case).

Comment 8 Reaper 2019-12-29 13:32:59 UTC
(In reply to bryanhoop@gmail.com from comment #7)
> I wonder what would happen if you now remove the lvm2-lvmpolld.service mask
> and try to reboot? Perhaps after a successful boot (when the service is
> masked) the array is somehow configured or flagged correctly and subsequent
> boots with the service enabled will work (as was my case).

I previously removed the mask in grub to get the log when the boot fails.
As soon as it is gone the boot fails again. 
In my case it seems it does not matter if the previous boot was successful.

Comment 9 romainguinot 2019-12-29 17:25:45 UTC
Recently rebooted the desktop.
Fully up to date F31.

Issue still there as well for me.
Doesn't boot unless I add that systemd mask by editing the kernel arguments.

Comment 10 romainguinot 2020-02-16 09:49:10 UTC
Had to reboot the desktop today, and the issue is still present, with F31 fully up to date as of today Feb 16th.

Not really a big deal given how rarely i reboot the desktop, but if there is any way to diagnose, provide appropriate log files, or perhaps attempt to start/stop the service with debug arguments whilst the machine is up (without having to necessarily reboot) let me know. 

If that's useful, the motherboard on this desktop is rather old : ASUS Rampage III Extreme. I am using a pair of samsung SSD 850 (MZ-7KE512BW) in RAID 1 as one of the backup destinations for some parts of my NAS box.

Comment 11 Heinz Mauelshagen 2020-04-06 15:06:00 UTC
Looks like an ordering or udev settlement issue as of comment #7 as off -ENOENT proving /dev/sdc* entries don't exist (yet) at the time of lvm's device scan.
If so, this is a systemd sequencing/dependency flaw which you mitigated using the mask.

You may get more insight analysing the journal relative to systemd udev activities with "journalctl -t systemd" and look for "Started udev Wait for Complete Device Initialization" relative to your /dev/sdc* devices which may be out of order with lvm scanning.

Comment 12 Ondrej Kozina 2020-04-06 15:40:27 UTC
(In reply to Reaper from comment #6)
> (In reply to Heinz Mauelshagen from comment #4)
> > Bryan,
> > 
> > any log messages from lvmpolld in syslog (journalctl -t lvm)?
> > 
> > Also, could you add '-l all' to the command line of lvmpolld in
> > /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the
> > resulting log?
> 
> Not sure if it is ok to respond on a closed bug, but I got the same problem
> even with the latest updates.
> I have two intel fake raid mirrors. Same story, anything Fedora 31 related
> does not boot.
> 
> I added -l all to /usr/lib/systemd/system/lvm2-lvmpolld.service
> (ExecStart=/usr/sbin/lvmpolld -t 60 -f -l all) but compared to any previous
> boots it shows the same things:
> -- Reboot --
> Dez 28 16:25:36 lvm[874]:   WARNING: Scan ignoring device 8:33 with no paths.
> Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> Dez 28 16:25:36 lvm[874]:   Path /dev/sdd2 no longer valid for device(8,50)
> Dez 28 16:25:36 lvm[874]:   /dev/sdd2: stat failed: Datei oder Verzeichnis
> nicht gefunden
> Dez 28 16:25:36 lvm[874]:   Path /dev/sdd1 no longer valid for device(8,49)
> Dez 28 16:25:36 lvm[874]:   /dev/sdd1: stat failed: Datei oder Verzeichnis
> nicht gefunden
> Dez 28 16:25:36 lvm[874]:   Path /dev/sdc2 no longer valid for device(8,34)
> Dez 28 16:25:36 lvm[874]:   /dev/sdc2: stat failed: Datei oder Verzeichnis
> nicht gefunden
> Dez 28 16:25:36 lvm[874]:   WARNING: Failed to get udev device handler for
> device /dev/sdc1.
> -- Reboot --
> 
> 
> Adding >systemd.mask=lvm2-lvmpolld.service< to the kernel line in grub
> "solved" the issue for me.
> If you need anything else from me I am happy to help as best I can.

Hi,

when you start lvmpolld with all debug outputs (-l all), what's the actual output of "journalctl -t lvmpolld -t lvm2"?

I don't see any lvmpolld related messages in your original reply. Seems like lvmpolld does not even start (no lvm2 command contacted the daemon).

Comment 13 Reaper 2020-04-10 18:20:03 UTC
(In reply to Ondrej Kozina from comment #12)
> (In reply to Reaper from comment #6)
> > (In reply to Heinz Mauelshagen from comment #4)
> > > Bryan,
> > > 
> > > any log messages from lvmpolld in syslog (journalctl -t lvm)?
> > > 
> > > Also, could you add '-l all' to the command line of lvmpolld in
> > > /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the
> > > resulting log?
> > 
> > Not sure if it is ok to respond on a closed bug, but I got the same problem
> > even with the latest updates.
> > I have two intel fake raid mirrors. Same story, anything Fedora 31 related
> > does not boot.
> > 
> > I added -l all to /usr/lib/systemd/system/lvm2-lvmpolld.service
> > (ExecStart=/usr/sbin/lvmpolld -t 60 -f -l all) but compared to any previous
> > boots it shows the same things:
> > -- Reboot --
> > Dez 28 16:25:36 lvm[874]:   WARNING: Scan ignoring device 8:33 with no paths.
> > Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> > Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> > Dez 28 16:25:36 lvm[874]:   Path /dev/sdd2 no longer valid for device(8,50)
> > Dez 28 16:25:36 lvm[874]:   /dev/sdd2: stat failed: Datei oder Verzeichnis
> > nicht gefunden
> > Dez 28 16:25:36 lvm[874]:   Path /dev/sdd1 no longer valid for device(8,49)
> > Dez 28 16:25:36 lvm[874]:   /dev/sdd1: stat failed: Datei oder Verzeichnis
> > nicht gefunden
> > Dez 28 16:25:36 lvm[874]:   Path /dev/sdc2 no longer valid for device(8,34)
> > Dez 28 16:25:36 lvm[874]:   /dev/sdc2: stat failed: Datei oder Verzeichnis
> > nicht gefunden
> > Dez 28 16:25:36 lvm[874]:   WARNING: Failed to get udev device handler for
> > device /dev/sdc1.
> > -- Reboot --
> > 
> > 
> > Adding >systemd.mask=lvm2-lvmpolld.service< to the kernel line in grub
> > "solved" the issue for me.
> > If you need anything else from me I am happy to help as best I can.
> 
> Hi,
> 
> when you start lvmpolld with all debug outputs (-l all), what's the actual
> output of "journalctl -t lvmpolld -t lvm2"?
> 
> I don't see any lvmpolld related messages in your original reply. Seems like
> lvmpolld does not even start (no lvm2 command contacted the daemon).

Hi and sorry for the long wait!

I enabled as much debug log as I could for lvm and tried it again.
"journalctl -t lvmpolld -t lvm2" shows nothing, so you may be correct and it does not start at all.

I dumped the complete boot process for a good and bad boot into a file: https://marschos.ddns.net:9990/nextcloud/s/CGb7pzQiiaqDDRs
I cant figure out whats going wrong in there but maybe you see more in it.

If you need anything else please just ask.


Note You need to log in before you can comment on or make changes to this bug.