| Summary: | RHV-H starts very slowly when too many LUNs are connected to the host (lvm filter?) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Roman Hodain <rhodain> | ||||||||||
| Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | guy chen <guchen> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 4.0.4 | CC: | cshao, dfediuck, ebenahar, gveitmic, huzhao, lsurette, mjankula, mkalinin, nsoffer, prajnoha, qiyuan, ratamir, rbarry, rgolan, rhodain, srevivo, tnisan, trichard, virt-bugs, ycui, ykaul, ylavi, zkabelac | ||||||||||
| Target Milestone: | ovirt-4.2.1 | Keywords: | Performance | ||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | vdsm-4.20.9 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: |
Currently, LVM scans and activates raw volumes during boot. Then it scans and activates guest logical volumes created inside a guest on top of the raw volumes. It also scans and activates guest logical volumes inside LUNs which are not part of a Red Hat Virtualization storage domain. As a result, there may be thousands of active logical volumes on a host, which should not be active. This leads to very slow boot time and may lead to data corruption later if a logical volume active on the host was extended on another host.
To avoid this, you can configure an LVM filter using the "vdsm-tool config-lvm-filter" command. The LVM filter prevents scanning and activation of logical volumes not required by the host, which improves boot time.
|
Story Points: | --- | ||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2018-05-15 17:49:33 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Bug Depends On: | 1374545 | ||||||||||||
| Bug Blocks: | 1450114 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Roman Hodain
2016-12-01 09:30:26 UTC
Sounds related to bug 1374545 indeed, Nir can you confirm? I wonder if we can do something for these kind of bugs with (like this and bug 1374545) by leveraging the lvm.* kernel arguments? (In reply to Roman Hodain from comment #0) > 46min 33.479s lvm2-pvscan@132:1728.service > 46min 33.361s lvm2-pvscan@132:1696.service > 46min 33.300s lvm2-pvscan@134:1632.service > 46min 33.172s lvm2-pvscan@134:1648.service > 46min 33.082s lvm2-pvscan@134:1600.service > 46min 33.008s lvm2-pvscan@133:1552.service I think these are related to lvmetad. Did you try to disable lvmetad? It should be possible in the same lvmlocal.conf file. See bug 1374545 for instructions. > Expected results: > The boot is fast. I suppose that the bes is if we make the system ignore the > devices. > > Additional info: > > We tried to do the following: The boot is slow with these changes? > implemented > https://gerrit.ovirt.org/#/c/66893/1/static/etc/lvm/lvmlocal.conf from Bug > 1374545 This helps by not scanning ovirt lvs, but the system is still scanning all the luns. I think the only solution is to have lvm filter (not global_filter) whitelisting only the devices used by rhvh. Any other device should not be scanned by lvm on the host. Vdsm run lvm commands with a special command line filter so lvm commands run by vdsm will scan the required luns. > > add autoactivation list to the lvm.conf > auto_activation_volume_list = [ "rhvh" ] > > regenerated initramfs > > > This may or may not overlap with Bug 1374545. Bug 1374545 is about guest lvs inside ovirt lvs, the issue here is gust lvs on luns. We don't manage luns; we don't create them on the storage server or map them for a specific host, we only discover them when they are added, so we cannot setup filters for luns with the current system design. Zdenek, it seems that the solution you suggested in bug 1374545 is also the need for this bug, having filter whitelisting the devices needed by the host. Can you confirm or suggest a better configuration? (In reply to Roman Hodain from comment #0) > Actual results: > The boot takes ages and the systemd-blame is reporting the following > services consuming the time: > > 46min 33.479s lvm2-pvscan@132:1728.service > 46min 33.361s lvm2-pvscan@132:1696.service > 46min 33.300s lvm2-pvscan@134:1632.service > 46min 33.172s lvm2-pvscan@134:1648.service > 46min 33.082s lvm2-pvscan@134:1600.service > 46min 33.008s lvm2-pvscan@133:1552.service > Can you add -vvvv to /lib/systemd/system/lvm2-pvscan@.service like this: ExecStart=/usr/sbin/lvm pvscan -vvvv --cache --activate ay %i Then reboot, and then do collect debug logs with lvmdump -u -l -s when the systemd is finally up and running (after that long delay) and attach the logs here (the tgz file). I'd probably recommend to disable usage of 'lvmetad' and avoid using autoactivation. It looks the system is full of duplicate devices (multipath paths), but it also looks like these devices are scanned 'before' multipath is activade - so it's quite tricky to recognize this state. There would be a way to swich into udevDB mode for recognizing 'multipath' component devices - but I'd not recommend to use this setting ATM. If it's feasible - setting 'White-list' filtering to accept only very limited set of device as possible PV device is the best approach on minimizing delays here. Is this resolved with BZ #1342786? (In reply to Yaniv Dary from comment #13) > Is this resolved with BZ #1342786? The two bugs are in different branch. The Target Milestone of bug 1342786 is set to 3.6.11, and no 3.6.11 build available to QE testing. so I think the original bug still existing. Moving out all non blocker\exceptions. This should be solved by LVM filter accepting only the devices needed by the hypervisor, and rejecting any other device (e.g direct luns, luns used for ovirt storage domains). The first step is reproducing this. - We need host with FC connection to FC storage - Best have ports on the host, and 4 paths to storage - create and map 300 (small) LUNs to this host. - on each LUN, create a vg and 4 lvs. This can be created using a script (the real lvs are created within a guest) This should be tested on both RHEL and RHEV-H. When we have a system reproducing this issue we can continue. Gil, can we schedule this? (In reply to Nir Soffer from comment #19) > The first step is reproducing this. > > - We need host with FC connection to FC storage > - Best have ports on the host, and 4 paths to storage > - create and map 300 (small) LUNs to this host. > - on each LUN, create a vg and 4 lvs. This can be created using a script > (the real > lvs are created within a guest) > > This should be tested on both RHEL and RHEV-H. > > When we have a system reproducing this issue we can continue. > > Gil, can we schedule this? Raz, will you be able to handle this request? Sure, Elad will take care of Nir's request Provided offline I tested this setup but could not reproduce the issue.
The setup tested:
# grep 'model name' /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
# rpm -q vdsm
vdsm-4.19.11-1.el7ev.x86_64
Number of LUNs: 300
Number of VGs: 300
Number of LVs: 1200
Each LUN has 4 paths to storage:
3514f0c5a516007fb dm-87 XtremIO ,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 0:0:1:177 sdaif 129:816 active ready running
|- 5:0:1:177 sdake 132:864 active ready running
|- 5:0:0:177 sdmo 70:256 active ready running
`- 0:0:0:177 sdlr 68:400 active ready running
The test included:
1. Wait until host is up in engine
2. reboot
3. wait until host is up again
I measured the time from starting the boot until vdsm was started,
and then the time until vdsm initialization was completed.
(getCapabilities succeeded - vdsm should be up shortly after that)
Tested 2 configurations:
1. default lvm.conf
2. lvm.conf with devices/filter:
devices {
filter = [ "a|^/dev/sdate2$|", "r|.*|"]
}
This filter prevent lvm from scanning the the lvs on the luns.
# Default configuration
Events:
13:57:15 boot started
14:02:15 vdsm started
14:10:03 vdsm init completed
Active lvs:
0 (deactivated by vdsm)
$ systemd-analyze time
Startup finished in 1.228s (kernel) + 29.110s (initrd) + 4min 50.648s (userspace) = 5min 20.986s
$ head -20 boot-05/systemd-analyze-blame.out
1min 32.944s vdsm-network.service
1min 8.678s network.service
18.967s kdump.service
15.337s lvm2-activation-early.service
13.840s systemd-udev-settle.service
11.763s dev-mapper-VolGroup01\x2droot.device
7.281s dmraid-activation.service
6.360s lvm2-activation-net.service
5.778s lvm2-activation.service
5.171s lvm2-monitor.service
4.731s libvirtd.service
3.402s vdsmd.service
1.886s postfix.service
1.720s tuned.service
1.261s systemd-udevd.service
1.037s vdsm-network-init.service
778ms ovirt-imageio-daemon.service
677ms systemd-tmpfiles-setup-dev.service
655ms proc-fs-nfsd.mount
648ms rhel-readonly.service
$ grep lvm2-pvscan boot-05/systemd-analyze-blame.out | head -10
543ms lvm2-pvscan@253:105.service
540ms lvm2-pvscan@253:109.service
539ms lvm2-pvscan@253:108.service
539ms lvm2-pvscan@253:104.service
538ms lvm2-pvscan@253:106.service
537ms lvm2-pvscan@253:98.service
535ms lvm2-pvscan@253:103.service
524ms lvm2-pvscan@253:102.service
523ms lvm2-pvscan@253:100.service
521ms lvm2-pvscan@253:101.service
$ grep multipathd boot-05/journal.log | wc -l
8307
$ grep 'add path (uevent)' boot-05/journal.log | wc -l
2301
$ grep "devmap not registered, can't remove" boot-05/journal.log | wc -l
1200
$ grep 'remove map (uevent)' boot-05/journal.log | wc -l
2400
# With lvm filter
Events:
14:30:53 boot started
14:36:17 vdsm started
14:36:38 vdsm init completed
Active lvs:
0 (not activated by lvm because of the filter)
$ systemd-analyze time
Startup finished in 1.227s (kernel) + 29.645s (initrd) + 4min 57.321s (userspace) = 5min 28.194s
$ head -20 boot-06-filter/systemd-analyze-blame.out
1min 25.276s vdsm-network.service
1min 7.960s network.service
13.340s systemd-udev-settle.service
11.186s dev-mapper-VolGroup01\x2droot.device
11.178s kdump.service
7.402s dmraid-activation.service
3.043s vdsmd.service
2.640s libvirtd.service
1.990s tuned.service
1.907s postfix.service
1.551s systemd-udevd.service
982ms vdsm-network-init.service
744ms rhel-readonly.service
700ms ovirt-imageio-daemon.service
662ms proc-fs-nfsd.mount
600ms systemd-tmpfiles-setup-dev.service
546ms systemd-modules-load.service
486ms lvm2-activation-net.service
448ms rc-local.service
430ms lvm2-activation-early.service
$ grep lvm2-pvscan boot-06-filter/systemd-analyze-blame.out | head -10
297ms lvm2-pvscan@253:69.service
297ms lvm2-pvscan@253:67.service
294ms lvm2-pvscan@253:68.service
294ms lvm2-pvscan@253:70.service
292ms lvm2-pvscan@253:72.service
291ms lvm2-pvscan@253:75.service
291ms lvm2-pvscan@253:73.service
291ms lvm2-pvscan@253:74.service
289ms lvm2-pvscan@253:77.service
289ms lvm2-pvscan@253:76.service
$ grep multipathd boot-06-filter/journal.log | wc -l
4707
$ grep 'add path (uevent)' boot-06-filter/journal.log | wc -l
2301
$ grep "devmap not registered, can't remove" boot-06-filter/journal.log | wc -l
0
$ grep 'remove map (uevent)' boot-06-filter/journal.log | wc -l
0
Summary:
1. Cannot reproduce, boot took about 13 minutes, nothing in the
range of 40 minutes.
2. lvm filter does not change system boot time significantly, but
has huge effect on vdsm initialization time:
without filter: 9m48s
with filter: 16s (36X faster)
3. multipath has some issues without lvm filter, lot of
"devmap not registered, can't remove" and 'remove map (uevent)'
events that we don't see when using lvm-filter.
Maybe the problem is related to RHEV-H, I'm waiting for RHEV-H setup
with same LUNs for testing.
I also want to test a setup with more LUNs (600?), to see how the
system scales up.
Roman, do you have an idea what is missing in my setup to reproduce
the issue?
Created attachment 1276414 [details]
Test results on RHEL
Created attachment 1281763 [details]
Test results on RHVH
Test results on RHVH - same setup as RHEL tests (see comment 23) One difference compared to RHEL tests, LUNs are 1G, in the RHEL test we used 5G LUNs. Same number of lv per LUNS (smaller). # Without filter (boot 3): events: 17:24:50 boot started 17:27:59 vdsm started 17:34:13 vdsm ready $ cat systemd-analyze-time.out Startup finished in 3.733s (kernel) + 17.437s (initrd) + 16min 13.033s (userspace) = 16min 34.204s $ head systemd-analyze-blame.out 13min 59.616s kdump.service 51.696s iscsi.service 32.182s vdsm-network.service 30.503s NetworkManager-wait-online.service 28.862s lvm2-activation-net.service 10.017s network.service 9.343s glusterd.service 3.909s vdsmd.service 3.664s postfix.service 3.587s systemd-udev-settle.service $ grep lvm2-pvscan systemd-analyze-blame.out | head 167ms lvm2-pvscan@253:36.service 164ms lvm2-pvscan@253:22.service 143ms lvm2-pvscan@253:171.service 141ms lvm2-pvscan@253:90.service 141ms lvm2-pvscan@253:176.service 135ms lvm2-pvscan@253:43.service 133ms lvm2-pvscan@253:159.service 131ms lvm2-pvscan@253:54.service 130ms lvm2-pvscan@253:161.service 129ms lvm2-pvscan@253:141.service # With filter 20:31:46 boot started 20:34:20 vdsm started 20:34:40 vdsm ready $ cat systemd-analyze-time.out Startup finished in 3.735s (kernel) + 17.336s (initrd) + 7min 56.826s (userspace) = 8min 17.898s $ head systemd-analyze-blame.out 6min 13.408s kdump.service 51.821s iscsi.service 30.389s NetworkManager-wait-online.service 26.926s vdsm-network.service 9.517s network.service 8.955s glusterd.service 3.927s vdsmd.service 3.600s systemd-udev-settle.service 3.479s postfix.service 2.707s dev-mapper-rhvh_purple\x2d\x2dvds2\x2drhvh\x2d\x2d4.1\x2d\x2d0.20170506.0\x2b1.device $ grep lvm2-pvscan systemd-analyze-blame.out | head 166ms lvm2-pvscan@253:113.service 160ms lvm2-pvscan@253:111.service 151ms lvm2-pvscan@253:162.service 146ms lvm2-pvscan@253:182.service 146ms lvm2-pvscan@253:112.service 145ms lvm2-pvscan@253:67.service 143ms lvm2-pvscan@253:72.service 141ms lvm2-pvscan@253:23.service 138ms lvm2-pvscan@253:22.service 132ms lvm2-pvscan@253:31.service Similar results, to RHEL, slightly better. So we cannot reproduce the issue reported in this bug. Adding lvm filter shorten vdsm ready time from 6:14 to 0:20 (19X faster), and overall boot time from 9:37 to 2:56 (3X faster). Created attachment 1283163 [details]
Tests resutls on RHV-H with 600 luns and 4800 lvs
To test how the system scales, I performed another tests on RHV-H setup. This time
I had 600 LUNs, and to make the no filter case more interesting, 8 LVs per LUN,
totalling active 4800 LVs when booting without a filter.
# Without filter
$ cat 01-no-filter/events
19:53:38 boot started
19:59:27 vdsm started
20:31:50 vdsm ready
$ cat 01-no-filter/systemd-analyze-time.out
Startup finished in 3.729s (kernel) + 17.430s (initrd) + 48min 7.574s (userspace) = 48min 28.734s
$ head 01-no-filter/systemd-analyze-blame.out
44min 17.550s kdump.service
1min 43.876s iscsi.service
1min 30.193s vdsm-network.service
1min 15.326s lvm2-activation-net.service
30.396s NetworkManager-wait-online.service
8.976s network.service
8.428s glusterd.service
4.585s vdsmd.service
3.433s postfix.service
3.054s dev-mapper-rhvh_purple\x2d\x2dvds2\x2drhvh\x2d\x2d4.1\x2d\x2d0.20170506.0\x2b1.device
$ grep lvm2-pvscan 01-no-filter/systemd-analyze-blame.out | head
232ms lvm2-pvscan@253:12.service
167ms lvm2-pvscan@253:147.service
155ms lvm2-pvscan@253:236.service
154ms lvm2-pvscan@253:310.service
153ms lvm2-pvscan@253:309.service
149ms lvm2-pvscan@253:249.service
145ms lvm2-pvscan@253:318.service
141ms lvm2-pvscan@253:134.service
140ms lvm2-pvscan@253:265.service
139ms lvm2-pvscan@253:17.service
# With filter
$ cat 03-with-filter/events
21:22:43 boot started
21:26:32 vdsm started
21:27:19 vdsm ready
$ cat systemd-analyze-time.out
$ cat 03-with-filter/systemd-analyze-time.out
Startup finished in 3.734s (kernel) + 17.553s (initrd) + 14min 28.278s (userspace) = 14min 49.566s
$ head 03-with-filter/systemd-analyze-blame.out
11min 53.197s kdump.service
1min 43.900s iscsi.service
49.409s vdsm-network.service
30.412s NetworkManager-wait-online.service
8.990s network.service
8.310s glusterd.service
4.068s vdsmd.service
3.587s systemd-udev-settle.service
3.567s dev-mapper-rhvh_purple\x2d\x2dvds2\x2drhvh\x2d\x2d4.1\x2d\x2d0.20170506.0\x2b1.device
3.476s postfix.service
$ grep lvm2-pvscan 03-with-filter/systemd-analyze-blame.out | head
154ms lvm2-pvscan@253:271.service
151ms lvm2-pvscan@253:103.service
149ms lvm2-pvscan@253:19.service
148ms lvm2-pvscan@253:197.service
148ms lvm2-pvscan@253:229.service
145ms lvm2-pvscan@253:76.service
144ms lvm2-pvscan@253:192.service
141ms lvm2-pvscan@253:63.service
141ms lvm2-pvscan@253:129.service
140ms lvm2-pvscan@253:207.service
We see same behaviour as previous tests but this time system boot time show
big improvement with lvm filter (3:52 with filter, 5:49 without filter).
Vdsm startup without lvm filter is 30 minutes, with lvm filter 47 seconds
(39X times faster).
Note that vdsm startup with 600 LUNs/4800 LVs is about 2X times slower compared
with 300 LUNs/1200 LVs setup (see comment 26). I did not check why, but I assume
this is the time needed to read lvm metadata to vdsm lvm cache.
Nir, do we need a bug on kdump for the slow boot? (note that in real life case we might even fence the host as it starts so slow!) Why is iscsi so slow? Ryan, can you explain why kdump.service is taking so much time during boot? (see comment 26 and comment 28). Maybe we have misconfiguration issue? I think we need to investigate this with platform folks. Ryan, iscsi.service is on the top of systemd-analyze-blame on rhev-h, but not in top 20 on rhel 7.3 (see comment 23). Do you have any idea why? Roman, do you have an idea what is missing in my setup to reproduce the issue? see comment 23, comment 26 and comment 28. No idea on either, Nir. We use the same configuration files as RHEL, and there should be no difference here. Can you provide a test environment? Raz, can you provide a test environment to Ryan to investigate the kdump and iscsi issues? What are the requirements? Ideally, the same setup as Nir used -- 600 LUNs/4800 LVs. Unfortunately, I don't have an environment capable of creating that kind of load right now. (In reply to Ryan Barry from comment #36) > Ideally, the same setup as Nir used -- 600 LUNs/4800 LVs. > > Unfortunately, I don't have an environment capable of creating that kind of > load right now. The 300 FC LUNS setup should be good enough, we see exactly the same behavior compared with RHEL. Created attachment 1284218 [details]
Script for creating lvs on all multipath devices
This is the script used for creating the LVs on all the 300/600 LUNs.
Adding back needinfo removed in comment 33. Roman, see comment 32. Nir / Allon, Please review the two solutions we have: 1) https://access.redhat.com/solutions/2777831 It has a better issue defined to address the bug. 2) https://access.redhat.com/solutions/2662261 The issues is not exactly matching this bug title, however the solution is more close to the one we discuss here. Let me know which approach to take and I can modify the solutions. (In reply to Nir Soffer from comment #34) > Raz, can you provide a test environment to Ryan to investigate the kdump and > iscsi > issues? Hi, the test seems to be valid. The only difference is that the ENV experiencing this issue is in production. So may be the storage is under some load from other hypervisors. I will check if this is only hppening with other hypervisors are running or all the time. (In reply to Roman Hodain from comment #42) > (In reply to Nir Soffer from comment #34) > > Raz, can you provide a test environment to Ryan to investigate the kdump and > > iscsi > > issues? > > Hi, > > the test seems to be valid. The only difference is that the ENV experiencing > this issue is in production. So may be the storage is under some load from > other hypervisors. > > I will check if this is only hppening with other hypervisors are running or > all the time. Do you still need a testing hypervisor? As we can see in the comment 28, there is a huge improvement by using a proper lvm filter. Most of the time is spect in vdsm, trying to deactivate logical volumes activated during boot. With a proper lvm filter, vdsm does not have to deactivate anything. We introduced a new vdsm-tool command, "config-lvm-filter", automating lvm configuration. If you use block storage you should configure lvm filter properly on all hosts. See https://ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/ Nir, this is targeted to 4.2.1 but modified in 4.2.0. Can you please check / fix target milestone? (In reply to Sandro Bonazzola from comment #56) > Nir, this is targeted to 4.2.1 but modified in 4.2.0. > Can you please check / fix target milestone? Same as https://bugzilla.redhat.com/show_bug.cgi?id=1130527#c26 INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops Verified, config-lvm-filter put the correct filter and with the correct filter as specified at comment 28 duration's are improved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489 BZ<2>Jira Resync |