Description of problem: The boot of the host takes very long time. It can take about 40 minutes to boot. Sometimes the hosts end up in rescue shell. Version-Release number of selected component (if applicable): RHVH-4.0-20161018.0-RHVH-x86_64-dvd1.iso How reproducible: 100% Steps to Reproduce: This issue is caused by high amount of LUNs connected to the environment for direct pass through for the VMs. It should be reproducible on the following configuration: 290 (LUNs) physical volumes 170 VGs on top of them Each of LUNs has four paths to the storage As soon as a setup like this is establish restart the host. Actual results: The boot takes ages and the systemd-blame is reporting the following services consuming the time: 46min 33.479s lvm2-pvscan@132:1728.service 46min 33.361s lvm2-pvscan@132:1696.service 46min 33.300s lvm2-pvscan@134:1632.service 46min 33.172s lvm2-pvscan@134:1648.service 46min 33.082s lvm2-pvscan@134:1600.service 46min 33.008s lvm2-pvscan@133:1552.service Expected results: The boot is fast. I suppose that the bes is if we make the system ignore the devices. Additional info: We tried to do the following: implemented https://gerrit.ovirt.org/#/c/66893/1/static/etc/lvm/lvmlocal.conf from Bug 1374545 add autoactivation list to the lvm.conf auto_activation_volume_list = [ "rhvh" ] regenerated initramfs This may or may not overlap with Bug 1374545. This to make sure that this configuration is tested from the installation stage to the activation stage in the manager.
Sounds related to bug 1374545 indeed, Nir can you confirm?
I wonder if we can do something for these kind of bugs with (like this and bug 1374545) by leveraging the lvm.* kernel arguments?
(In reply to Roman Hodain from comment #0) > 46min 33.479s lvm2-pvscan@132:1728.service > 46min 33.361s lvm2-pvscan@132:1696.service > 46min 33.300s lvm2-pvscan@134:1632.service > 46min 33.172s lvm2-pvscan@134:1648.service > 46min 33.082s lvm2-pvscan@134:1600.service > 46min 33.008s lvm2-pvscan@133:1552.service I think these are related to lvmetad. Did you try to disable lvmetad? It should be possible in the same lvmlocal.conf file. See bug 1374545 for instructions. > Expected results: > The boot is fast. I suppose that the bes is if we make the system ignore the > devices. > > Additional info: > > We tried to do the following: The boot is slow with these changes? > implemented > https://gerrit.ovirt.org/#/c/66893/1/static/etc/lvm/lvmlocal.conf from Bug > 1374545 This helps by not scanning ovirt lvs, but the system is still scanning all the luns. I think the only solution is to have lvm filter (not global_filter) whitelisting only the devices used by rhvh. Any other device should not be scanned by lvm on the host. Vdsm run lvm commands with a special command line filter so lvm commands run by vdsm will scan the required luns. > > add autoactivation list to the lvm.conf > auto_activation_volume_list = [ "rhvh" ] > > regenerated initramfs > > > This may or may not overlap with Bug 1374545. Bug 1374545 is about guest lvs inside ovirt lvs, the issue here is gust lvs on luns. We don't manage luns; we don't create them on the storage server or map them for a specific host, we only discover them when they are added, so we cannot setup filters for luns with the current system design.
Zdenek, it seems that the solution you suggested in bug 1374545 is also the need for this bug, having filter whitelisting the devices needed by the host. Can you confirm or suggest a better configuration?
(In reply to Roman Hodain from comment #0) > Actual results: > The boot takes ages and the systemd-blame is reporting the following > services consuming the time: > > 46min 33.479s lvm2-pvscan@132:1728.service > 46min 33.361s lvm2-pvscan@132:1696.service > 46min 33.300s lvm2-pvscan@134:1632.service > 46min 33.172s lvm2-pvscan@134:1648.service > 46min 33.082s lvm2-pvscan@134:1600.service > 46min 33.008s lvm2-pvscan@133:1552.service > Can you add -vvvv to /lib/systemd/system/lvm2-pvscan@.service like this: ExecStart=/usr/sbin/lvm pvscan -vvvv --cache --activate ay %i Then reboot, and then do collect debug logs with lvmdump -u -l -s when the systemd is finally up and running (after that long delay) and attach the logs here (the tgz file).
I'd probably recommend to disable usage of 'lvmetad' and avoid using autoactivation. It looks the system is full of duplicate devices (multipath paths), but it also looks like these devices are scanned 'before' multipath is activade - so it's quite tricky to recognize this state. There would be a way to swich into udevDB mode for recognizing 'multipath' component devices - but I'd not recommend to use this setting ATM. If it's feasible - setting 'White-list' filtering to accept only very limited set of device as possible PV device is the best approach on minimizing delays here.
Is this resolved with BZ #1342786?
(In reply to Yaniv Dary from comment #13) > Is this resolved with BZ #1342786? The two bugs are in different branch. The Target Milestone of bug 1342786 is set to 3.6.11, and no 3.6.11 build available to QE testing. so I think the original bug still existing.
Moving out all non blocker\exceptions.
This should be solved by LVM filter accepting only the devices needed by the hypervisor, and rejecting any other device (e.g direct luns, luns used for ovirt storage domains).
The first step is reproducing this. - We need host with FC connection to FC storage - Best have ports on the host, and 4 paths to storage - create and map 300 (small) LUNs to this host. - on each LUN, create a vg and 4 lvs. This can be created using a script (the real lvs are created within a guest) This should be tested on both RHEL and RHEV-H. When we have a system reproducing this issue we can continue. Gil, can we schedule this?
(In reply to Nir Soffer from comment #19) > The first step is reproducing this. > > - We need host with FC connection to FC storage > - Best have ports on the host, and 4 paths to storage > - create and map 300 (small) LUNs to this host. > - on each LUN, create a vg and 4 lvs. This can be created using a script > (the real > lvs are created within a guest) > > This should be tested on both RHEL and RHEV-H. > > When we have a system reproducing this issue we can continue. > > Gil, can we schedule this? Raz, will you be able to handle this request?
Sure, Elad will take care of Nir's request
Provided offline
I tested this setup but could not reproduce the issue. The setup tested: # grep 'model name' /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 (Maipo) # rpm -q vdsm vdsm-4.19.11-1.el7ev.x86_64 Number of LUNs: 300 Number of VGs: 300 Number of LVs: 1200 Each LUN has 4 paths to storage: 3514f0c5a516007fb dm-87 XtremIO ,XtremApp size=5.0G features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 0:0:1:177 sdaif 129:816 active ready running |- 5:0:1:177 sdake 132:864 active ready running |- 5:0:0:177 sdmo 70:256 active ready running `- 0:0:0:177 sdlr 68:400 active ready running The test included: 1. Wait until host is up in engine 2. reboot 3. wait until host is up again I measured the time from starting the boot until vdsm was started, and then the time until vdsm initialization was completed. (getCapabilities succeeded - vdsm should be up shortly after that) Tested 2 configurations: 1. default lvm.conf 2. lvm.conf with devices/filter: devices { filter = [ "a|^/dev/sdate2$|", "r|.*|"] } This filter prevent lvm from scanning the the lvs on the luns. # Default configuration Events: 13:57:15 boot started 14:02:15 vdsm started 14:10:03 vdsm init completed Active lvs: 0 (deactivated by vdsm) $ systemd-analyze time Startup finished in 1.228s (kernel) + 29.110s (initrd) + 4min 50.648s (userspace) = 5min 20.986s $ head -20 boot-05/systemd-analyze-blame.out 1min 32.944s vdsm-network.service 1min 8.678s network.service 18.967s kdump.service 15.337s lvm2-activation-early.service 13.840s systemd-udev-settle.service 11.763s dev-mapper-VolGroup01\x2droot.device 7.281s dmraid-activation.service 6.360s lvm2-activation-net.service 5.778s lvm2-activation.service 5.171s lvm2-monitor.service 4.731s libvirtd.service 3.402s vdsmd.service 1.886s postfix.service 1.720s tuned.service 1.261s systemd-udevd.service 1.037s vdsm-network-init.service 778ms ovirt-imageio-daemon.service 677ms systemd-tmpfiles-setup-dev.service 655ms proc-fs-nfsd.mount 648ms rhel-readonly.service $ grep lvm2-pvscan boot-05/systemd-analyze-blame.out | head -10 543ms lvm2-pvscan@253:105.service 540ms lvm2-pvscan@253:109.service 539ms lvm2-pvscan@253:108.service 539ms lvm2-pvscan@253:104.service 538ms lvm2-pvscan@253:106.service 537ms lvm2-pvscan@253:98.service 535ms lvm2-pvscan@253:103.service 524ms lvm2-pvscan@253:102.service 523ms lvm2-pvscan@253:100.service 521ms lvm2-pvscan@253:101.service $ grep multipathd boot-05/journal.log | wc -l 8307 $ grep 'add path (uevent)' boot-05/journal.log | wc -l 2301 $ grep "devmap not registered, can't remove" boot-05/journal.log | wc -l 1200 $ grep 'remove map (uevent)' boot-05/journal.log | wc -l 2400 # With lvm filter Events: 14:30:53 boot started 14:36:17 vdsm started 14:36:38 vdsm init completed Active lvs: 0 (not activated by lvm because of the filter) $ systemd-analyze time Startup finished in 1.227s (kernel) + 29.645s (initrd) + 4min 57.321s (userspace) = 5min 28.194s $ head -20 boot-06-filter/systemd-analyze-blame.out 1min 25.276s vdsm-network.service 1min 7.960s network.service 13.340s systemd-udev-settle.service 11.186s dev-mapper-VolGroup01\x2droot.device 11.178s kdump.service 7.402s dmraid-activation.service 3.043s vdsmd.service 2.640s libvirtd.service 1.990s tuned.service 1.907s postfix.service 1.551s systemd-udevd.service 982ms vdsm-network-init.service 744ms rhel-readonly.service 700ms ovirt-imageio-daemon.service 662ms proc-fs-nfsd.mount 600ms systemd-tmpfiles-setup-dev.service 546ms systemd-modules-load.service 486ms lvm2-activation-net.service 448ms rc-local.service 430ms lvm2-activation-early.service $ grep lvm2-pvscan boot-06-filter/systemd-analyze-blame.out | head -10 297ms lvm2-pvscan@253:69.service 297ms lvm2-pvscan@253:67.service 294ms lvm2-pvscan@253:68.service 294ms lvm2-pvscan@253:70.service 292ms lvm2-pvscan@253:72.service 291ms lvm2-pvscan@253:75.service 291ms lvm2-pvscan@253:73.service 291ms lvm2-pvscan@253:74.service 289ms lvm2-pvscan@253:77.service 289ms lvm2-pvscan@253:76.service $ grep multipathd boot-06-filter/journal.log | wc -l 4707 $ grep 'add path (uevent)' boot-06-filter/journal.log | wc -l 2301 $ grep "devmap not registered, can't remove" boot-06-filter/journal.log | wc -l 0 $ grep 'remove map (uevent)' boot-06-filter/journal.log | wc -l 0 Summary: 1. Cannot reproduce, boot took about 13 minutes, nothing in the range of 40 minutes. 2. lvm filter does not change system boot time significantly, but has huge effect on vdsm initialization time: without filter: 9m48s with filter: 16s (36X faster) 3. multipath has some issues without lvm filter, lot of "devmap not registered, can't remove" and 'remove map (uevent)' events that we don't see when using lvm-filter. Maybe the problem is related to RHEV-H, I'm waiting for RHEV-H setup with same LUNs for testing. I also want to test a setup with more LUNs (600?), to see how the system scales up. Roman, do you have an idea what is missing in my setup to reproduce the issue?
Created attachment 1276414 [details] Test results on RHEL
Created attachment 1281763 [details] Test results on RHVH
Test results on RHVH - same setup as RHEL tests (see comment 23) One difference compared to RHEL tests, LUNs are 1G, in the RHEL test we used 5G LUNs. Same number of lv per LUNS (smaller). # Without filter (boot 3): events: 17:24:50 boot started 17:27:59 vdsm started 17:34:13 vdsm ready $ cat systemd-analyze-time.out Startup finished in 3.733s (kernel) + 17.437s (initrd) + 16min 13.033s (userspace) = 16min 34.204s $ head systemd-analyze-blame.out 13min 59.616s kdump.service 51.696s iscsi.service 32.182s vdsm-network.service 30.503s NetworkManager-wait-online.service 28.862s lvm2-activation-net.service 10.017s network.service 9.343s glusterd.service 3.909s vdsmd.service 3.664s postfix.service 3.587s systemd-udev-settle.service $ grep lvm2-pvscan systemd-analyze-blame.out | head 167ms lvm2-pvscan@253:36.service 164ms lvm2-pvscan@253:22.service 143ms lvm2-pvscan@253:171.service 141ms lvm2-pvscan@253:90.service 141ms lvm2-pvscan@253:176.service 135ms lvm2-pvscan@253:43.service 133ms lvm2-pvscan@253:159.service 131ms lvm2-pvscan@253:54.service 130ms lvm2-pvscan@253:161.service 129ms lvm2-pvscan@253:141.service # With filter 20:31:46 boot started 20:34:20 vdsm started 20:34:40 vdsm ready $ cat systemd-analyze-time.out Startup finished in 3.735s (kernel) + 17.336s (initrd) + 7min 56.826s (userspace) = 8min 17.898s $ head systemd-analyze-blame.out 6min 13.408s kdump.service 51.821s iscsi.service 30.389s NetworkManager-wait-online.service 26.926s vdsm-network.service 9.517s network.service 8.955s glusterd.service 3.927s vdsmd.service 3.600s systemd-udev-settle.service 3.479s postfix.service 2.707s dev-mapper-rhvh_purple\x2d\x2dvds2\x2drhvh\x2d\x2d4.1\x2d\x2d0.20170506.0\x2b1.device $ grep lvm2-pvscan systemd-analyze-blame.out | head 166ms lvm2-pvscan@253:113.service 160ms lvm2-pvscan@253:111.service 151ms lvm2-pvscan@253:162.service 146ms lvm2-pvscan@253:182.service 146ms lvm2-pvscan@253:112.service 145ms lvm2-pvscan@253:67.service 143ms lvm2-pvscan@253:72.service 141ms lvm2-pvscan@253:23.service 138ms lvm2-pvscan@253:22.service 132ms lvm2-pvscan@253:31.service Similar results, to RHEL, slightly better. So we cannot reproduce the issue reported in this bug. Adding lvm filter shorten vdsm ready time from 6:14 to 0:20 (19X faster), and overall boot time from 9:37 to 2:56 (3X faster).
Created attachment 1283163 [details] Tests resutls on RHV-H with 600 luns and 4800 lvs
To test how the system scales, I performed another tests on RHV-H setup. This time I had 600 LUNs, and to make the no filter case more interesting, 8 LVs per LUN, totalling active 4800 LVs when booting without a filter. # Without filter $ cat 01-no-filter/events 19:53:38 boot started 19:59:27 vdsm started 20:31:50 vdsm ready $ cat 01-no-filter/systemd-analyze-time.out Startup finished in 3.729s (kernel) + 17.430s (initrd) + 48min 7.574s (userspace) = 48min 28.734s $ head 01-no-filter/systemd-analyze-blame.out 44min 17.550s kdump.service 1min 43.876s iscsi.service 1min 30.193s vdsm-network.service 1min 15.326s lvm2-activation-net.service 30.396s NetworkManager-wait-online.service 8.976s network.service 8.428s glusterd.service 4.585s vdsmd.service 3.433s postfix.service 3.054s dev-mapper-rhvh_purple\x2d\x2dvds2\x2drhvh\x2d\x2d4.1\x2d\x2d0.20170506.0\x2b1.device $ grep lvm2-pvscan 01-no-filter/systemd-analyze-blame.out | head 232ms lvm2-pvscan@253:12.service 167ms lvm2-pvscan@253:147.service 155ms lvm2-pvscan@253:236.service 154ms lvm2-pvscan@253:310.service 153ms lvm2-pvscan@253:309.service 149ms lvm2-pvscan@253:249.service 145ms lvm2-pvscan@253:318.service 141ms lvm2-pvscan@253:134.service 140ms lvm2-pvscan@253:265.service 139ms lvm2-pvscan@253:17.service # With filter $ cat 03-with-filter/events 21:22:43 boot started 21:26:32 vdsm started 21:27:19 vdsm ready $ cat systemd-analyze-time.out $ cat 03-with-filter/systemd-analyze-time.out Startup finished in 3.734s (kernel) + 17.553s (initrd) + 14min 28.278s (userspace) = 14min 49.566s $ head 03-with-filter/systemd-analyze-blame.out 11min 53.197s kdump.service 1min 43.900s iscsi.service 49.409s vdsm-network.service 30.412s NetworkManager-wait-online.service 8.990s network.service 8.310s glusterd.service 4.068s vdsmd.service 3.587s systemd-udev-settle.service 3.567s dev-mapper-rhvh_purple\x2d\x2dvds2\x2drhvh\x2d\x2d4.1\x2d\x2d0.20170506.0\x2b1.device 3.476s postfix.service $ grep lvm2-pvscan 03-with-filter/systemd-analyze-blame.out | head 154ms lvm2-pvscan@253:271.service 151ms lvm2-pvscan@253:103.service 149ms lvm2-pvscan@253:19.service 148ms lvm2-pvscan@253:197.service 148ms lvm2-pvscan@253:229.service 145ms lvm2-pvscan@253:76.service 144ms lvm2-pvscan@253:192.service 141ms lvm2-pvscan@253:63.service 141ms lvm2-pvscan@253:129.service 140ms lvm2-pvscan@253:207.service We see same behaviour as previous tests but this time system boot time show big improvement with lvm filter (3:52 with filter, 5:49 without filter). Vdsm startup without lvm filter is 30 minutes, with lvm filter 47 seconds (39X times faster). Note that vdsm startup with 600 LUNs/4800 LVs is about 2X times slower compared with 300 LUNs/1200 LVs setup (see comment 26). I did not check why, but I assume this is the time needed to read lvm metadata to vdsm lvm cache.
Nir, do we need a bug on kdump for the slow boot? (note that in real life case we might even fence the host as it starts so slow!) Why is iscsi so slow?
Ryan, can you explain why kdump.service is taking so much time during boot? (see comment 26 and comment 28). Maybe we have misconfiguration issue? I think we need to investigate this with platform folks.
Ryan, iscsi.service is on the top of systemd-analyze-blame on rhev-h, but not in top 20 on rhel 7.3 (see comment 23). Do you have any idea why?
Roman, do you have an idea what is missing in my setup to reproduce the issue? see comment 23, comment 26 and comment 28.
No idea on either, Nir. We use the same configuration files as RHEL, and there should be no difference here. Can you provide a test environment?
Raz, can you provide a test environment to Ryan to investigate the kdump and iscsi issues?
What are the requirements?
Ideally, the same setup as Nir used -- 600 LUNs/4800 LVs. Unfortunately, I don't have an environment capable of creating that kind of load right now.
(In reply to Ryan Barry from comment #36) > Ideally, the same setup as Nir used -- 600 LUNs/4800 LVs. > > Unfortunately, I don't have an environment capable of creating that kind of > load right now. The 300 FC LUNS setup should be good enough, we see exactly the same behavior compared with RHEL.
Created attachment 1284218 [details] Script for creating lvs on all multipath devices This is the script used for creating the LVs on all the 300/600 LUNs.
Adding back needinfo removed in comment 33. Roman, see comment 32.
Nir / Allon, Please review the two solutions we have: 1) https://access.redhat.com/solutions/2777831 It has a better issue defined to address the bug. 2) https://access.redhat.com/solutions/2662261 The issues is not exactly matching this bug title, however the solution is more close to the one we discuss here. Let me know which approach to take and I can modify the solutions.
(In reply to Nir Soffer from comment #34) > Raz, can you provide a test environment to Ryan to investigate the kdump and > iscsi > issues? Hi, the test seems to be valid. The only difference is that the ENV experiencing this issue is in production. So may be the storage is under some load from other hypervisors. I will check if this is only hppening with other hypervisors are running or all the time.
(In reply to Roman Hodain from comment #42) > (In reply to Nir Soffer from comment #34) > > Raz, can you provide a test environment to Ryan to investigate the kdump and > > iscsi > > issues? > > Hi, > > the test seems to be valid. The only difference is that the ENV experiencing > this issue is in production. So may be the storage is under some load from > other hypervisors. > > I will check if this is only hppening with other hypervisors are running or > all the time. Do you still need a testing hypervisor?
As we can see in the comment 28, there is a huge improvement by using a proper lvm filter. Most of the time is spect in vdsm, trying to deactivate logical volumes activated during boot. With a proper lvm filter, vdsm does not have to deactivate anything. We introduced a new vdsm-tool command, "config-lvm-filter", automating lvm configuration. If you use block storage you should configure lvm filter properly on all hosts. See https://ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/
Nir, this is targeted to 4.2.1 but modified in 4.2.0. Can you please check / fix target milestone?
(In reply to Sandro Bonazzola from comment #56) > Nir, this is targeted to 4.2.1 but modified in 4.2.0. > Can you please check / fix target milestone? Same as https://bugzilla.redhat.com/show_bug.cgi?id=1130527#c26
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops
Verified, config-lvm-filter put the correct filter and with the correct filter as specified at comment 28 duration's are improved.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489
BZ<2>Jira Resync