Bug 1772956

Summary: lvm2-lvmpolld.service hangs boot if RAID array connected
Product: [Fedora] Fedora Reporter: bryanhoop
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 31CC: agk, anprice, bmarzins, bmr, cfeist, heinzm, jbrassow, jonathan, kzak, lvm-team, mcsontos, msnitzer, okozina, prajnoha, prockai, Reaper, romainguinot, zkabelac
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-24 20:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description bryanhoop 2019-11-15 17:20:19 UTC
Description of problem:
The Fedora 31 live boot media hangs on boot when my SSATA controller is enabled in the BIOS. Fedora 30 live boot media boots OK when the sSATA is enabled. I have three drives connected via SSATA...two in RAID 1 and one with a backup of an F30 system (backed up via rsync)

Startup hangs on "A start job is running for Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling"


Version-Release number of selected component (if applicable):


How reproducible:
100% on this machine: Supermicro X10DRi dual-Xeon system

Steps to Reproduce:
1. Enabled my SSATA controller, boot Fedora 31 live media (I've tried Workstation, netinstall, and MATE-Compiz spin)
2. Disable SSATA controller and boot Fedora 31...boots fine.


Additional info: Cannot change tty. I've waited for as long as 15 minutes for the start job to complete. I have also error checked all of my disks using the Fedora 30 live installer. There are two other Fedora installations (F31--buggy and broken gnome-keyring-pam has prompted this reinstall) and a working backup F30 image installed on two of the disks in this system.

Comment 1 bryanhoop 2019-11-15 19:32:35 UTC
I just did some more testing and found out that I CAN enable my controller, I just must disconnect all disks in an Intel RAID array and then I can boot. Other disks connected to sSATA that aren't in an array can be connected and Fedora will boot.

Comment 2 bryanhoop 2019-11-16 14:43:42 UTC
I've done some additional testing and I can boot successfully with the array connected if I add systemd.mask=lvm2-lvmpolld.service to the kernel command line. So lvm2-lvmpoll appears to be the culprit here.

Comment 3 romainguinot 2019-11-26 17:30:25 UTC
Just upgraded to F31 today on 2 machines. 
My work laptop didn't have any issues except for some strange Nvidia behaviour which is now fixed. 

My personal desktop did experience the same hanging boot. I also have a 2-drive BIOS RAID array in this desktop that is used as a backup for some portions of my home NAS.
Adding  systemd.mask=lvm2-lvmpolld.service allows to workaround the issue so your bugreport was very timely thank you very much. 

I rarely reboot the desktop so i will not add the config permanently to the boot config and see if that gets resolved in the next few weeks.

Comment 4 Heinz Mauelshagen 2019-12-11 15:00:15 UTC
Bryan,

any log messages from lvmpolld in syslog (journalctl -t lvm)?

Also, could you add '-l all' to the command line of lvmpolld in /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the resulting log?

Comment 5 bryanhoop 2019-12-18 18:34:51 UTC
I just had a chance to test this. I ran a dnf update, removed the systemd.mask for lvmpolld, and my machine booted without issue. So it appears that this was fixed by some update in the meantime.

Comment 6 Markus 2019-12-28 15:42:16 UTC
(In reply to Heinz Mauelshagen from comment #4)
> Bryan,
> 
> any log messages from lvmpolld in syslog (journalctl -t lvm)?
> 
> Also, could you add '-l all' to the command line of lvmpolld in
> /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the
> resulting log?

Not sure if it is ok to respond on a closed bug, but I got the same problem even with the latest updates.
I have two intel fake raid mirrors. Same story, anything Fedora 31 related does not boot.

I added -l all to /usr/lib/systemd/system/lvm2-lvmpolld.service (ExecStart=/usr/sbin/lvmpolld -t 60 -f -l all) but compared to any previous boots it shows the same things:
-- Reboot --
Dez 28 16:25:36 lvm[874]:   WARNING: Scan ignoring device 8:33 with no paths.
Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
Dez 28 16:25:36 lvm[874]:   Path /dev/sdd2 no longer valid for device(8,50)
Dez 28 16:25:36 lvm[874]:   /dev/sdd2: stat failed: Datei oder Verzeichnis nicht gefunden
Dez 28 16:25:36 lvm[874]:   Path /dev/sdd1 no longer valid for device(8,49)
Dez 28 16:25:36 lvm[874]:   /dev/sdd1: stat failed: Datei oder Verzeichnis nicht gefunden
Dez 28 16:25:36 lvm[874]:   Path /dev/sdc2 no longer valid for device(8,34)
Dez 28 16:25:36 lvm[874]:   /dev/sdc2: stat failed: Datei oder Verzeichnis nicht gefunden
Dez 28 16:25:36 lvm[874]:   WARNING: Failed to get udev device handler for device /dev/sdc1.
-- Reboot --


Adding >systemd.mask=lvm2-lvmpolld.service< to the kernel line in grub "solved" the issue for me.
If you need anything else from me I am happy to help as best I can.

Comment 7 bryanhoop 2019-12-29 13:26:48 UTC
I wonder what would happen if you now remove the lvm2-lvmpolld.service mask and try to reboot? Perhaps after a successful boot (when the service is masked) the array is somehow configured or flagged correctly and subsequent boots with the service enabled will work (as was my case).

Comment 8 Markus 2019-12-29 13:32:59 UTC
(In reply to bryanhoop from comment #7)
> I wonder what would happen if you now remove the lvm2-lvmpolld.service mask
> and try to reboot? Perhaps after a successful boot (when the service is
> masked) the array is somehow configured or flagged correctly and subsequent
> boots with the service enabled will work (as was my case).

I previously removed the mask in grub to get the log when the boot fails.
As soon as it is gone the boot fails again. 
In my case it seems it does not matter if the previous boot was successful.

Comment 9 romainguinot 2019-12-29 17:25:45 UTC
Recently rebooted the desktop.
Fully up to date F31.

Issue still there as well for me.
Doesn't boot unless I add that systemd mask by editing the kernel arguments.

Comment 10 romainguinot 2020-02-16 09:49:10 UTC
Had to reboot the desktop today, and the issue is still present, with F31 fully up to date as of today Feb 16th.

Not really a big deal given how rarely i reboot the desktop, but if there is any way to diagnose, provide appropriate log files, or perhaps attempt to start/stop the service with debug arguments whilst the machine is up (without having to necessarily reboot) let me know. 

If that's useful, the motherboard on this desktop is rather old : ASUS Rampage III Extreme. I am using a pair of samsung SSD 850 (MZ-7KE512BW) in RAID 1 as one of the backup destinations for some parts of my NAS box.

Comment 11 Heinz Mauelshagen 2020-04-06 15:06:00 UTC
Looks like an ordering or udev settlement issue as of comment #7 as off -ENOENT proving /dev/sdc* entries don't exist (yet) at the time of lvm's device scan.
If so, this is a systemd sequencing/dependency flaw which you mitigated using the mask.

You may get more insight analysing the journal relative to systemd udev activities with "journalctl -t systemd" and look for "Started udev Wait for Complete Device Initialization" relative to your /dev/sdc* devices which may be out of order with lvm scanning.

Comment 12 Ondrej Kozina 2020-04-06 15:40:27 UTC
(In reply to Reaper from comment #6)
> (In reply to Heinz Mauelshagen from comment #4)
> > Bryan,
> > 
> > any log messages from lvmpolld in syslog (journalctl -t lvm)?
> > 
> > Also, could you add '-l all' to the command line of lvmpolld in
> > /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the
> > resulting log?
> 
> Not sure if it is ok to respond on a closed bug, but I got the same problem
> even with the latest updates.
> I have two intel fake raid mirrors. Same story, anything Fedora 31 related
> does not boot.
> 
> I added -l all to /usr/lib/systemd/system/lvm2-lvmpolld.service
> (ExecStart=/usr/sbin/lvmpolld -t 60 -f -l all) but compared to any previous
> boots it shows the same things:
> -- Reboot --
> Dez 28 16:25:36 lvm[874]:   WARNING: Scan ignoring device 8:33 with no paths.
> Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> Dez 28 16:25:36 lvm[874]:   Path /dev/sdd2 no longer valid for device(8,50)
> Dez 28 16:25:36 lvm[874]:   /dev/sdd2: stat failed: Datei oder Verzeichnis
> nicht gefunden
> Dez 28 16:25:36 lvm[874]:   Path /dev/sdd1 no longer valid for device(8,49)
> Dez 28 16:25:36 lvm[874]:   /dev/sdd1: stat failed: Datei oder Verzeichnis
> nicht gefunden
> Dez 28 16:25:36 lvm[874]:   Path /dev/sdc2 no longer valid for device(8,34)
> Dez 28 16:25:36 lvm[874]:   /dev/sdc2: stat failed: Datei oder Verzeichnis
> nicht gefunden
> Dez 28 16:25:36 lvm[874]:   WARNING: Failed to get udev device handler for
> device /dev/sdc1.
> -- Reboot --
> 
> 
> Adding >systemd.mask=lvm2-lvmpolld.service< to the kernel line in grub
> "solved" the issue for me.
> If you need anything else from me I am happy to help as best I can.

Hi,

when you start lvmpolld with all debug outputs (-l all), what's the actual output of "journalctl -t lvmpolld -t lvm2"?

I don't see any lvmpolld related messages in your original reply. Seems like lvmpolld does not even start (no lvm2 command contacted the daemon).

Comment 13 Markus 2020-04-10 18:20:03 UTC
(In reply to Ondrej Kozina from comment #12)
> (In reply to Reaper from comment #6)
> > (In reply to Heinz Mauelshagen from comment #4)
> > > Bryan,
> > > 
> > > any log messages from lvmpolld in syslog (journalctl -t lvm)?
> > > 
> > > Also, could you add '-l all' to the command line of lvmpolld in
> > > /usr/lib/systemd/system/lvm2-lvmpolld.service, reboot and provide the
> > > resulting log?
> > 
> > Not sure if it is ok to respond on a closed bug, but I got the same problem
> > even with the latest updates.
> > I have two intel fake raid mirrors. Same story, anything Fedora 31 related
> > does not boot.
> > 
> > I added -l all to /usr/lib/systemd/system/lvm2-lvmpolld.service
> > (ExecStart=/usr/sbin/lvmpolld -t 60 -f -l all) but compared to any previous
> > boots it shows the same things:
> > -- Reboot --
> > Dez 28 16:25:36 lvm[874]:   WARNING: Scan ignoring device 8:33 with no paths.
> > Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> > Dez 28 16:25:36 lvm[874]:   Device open /dev/sdc1 8:33 failed errno 2
> > Dez 28 16:25:36 lvm[874]:   Path /dev/sdd2 no longer valid for device(8,50)
> > Dez 28 16:25:36 lvm[874]:   /dev/sdd2: stat failed: Datei oder Verzeichnis
> > nicht gefunden
> > Dez 28 16:25:36 lvm[874]:   Path /dev/sdd1 no longer valid for device(8,49)
> > Dez 28 16:25:36 lvm[874]:   /dev/sdd1: stat failed: Datei oder Verzeichnis
> > nicht gefunden
> > Dez 28 16:25:36 lvm[874]:   Path /dev/sdc2 no longer valid for device(8,34)
> > Dez 28 16:25:36 lvm[874]:   /dev/sdc2: stat failed: Datei oder Verzeichnis
> > nicht gefunden
> > Dez 28 16:25:36 lvm[874]:   WARNING: Failed to get udev device handler for
> > device /dev/sdc1.
> > -- Reboot --
> > 
> > 
> > Adding >systemd.mask=lvm2-lvmpolld.service< to the kernel line in grub
> > "solved" the issue for me.
> > If you need anything else from me I am happy to help as best I can.
> 
> Hi,
> 
> when you start lvmpolld with all debug outputs (-l all), what's the actual
> output of "journalctl -t lvmpolld -t lvm2"?
> 
> I don't see any lvmpolld related messages in your original reply. Seems like
> lvmpolld does not even start (no lvm2 command contacted the daemon).

Hi and sorry for the long wait!

I enabled as much debug log as I could for lvm and tried it again.
"journalctl -t lvmpolld -t lvm2" shows nothing, so you may be correct and it does not start at all.

I dumped the complete boot process for a good and bad boot into a file: https://marschos.ddns.net:9990/nextcloud/s/CGb7pzQiiaqDDRs
I cant figure out whats going wrong in there but maybe you see more in it.

If you need anything else please just ask.

Comment 14 Ben Cotton 2020-11-03 16:54:46 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Ben Cotton 2020-11-24 20:06:28 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.