Description of problem: As the number of Gluster Volumes grow, so does the number of underlying devices (4 per brick). In this case, with ~220 volumes, there 880 devices per server. On a server reboot, the lvm scan took over 20 minutes to complete. As we support up to 1000 volumes, it's likely the current 20 minute scan time is going to grow to 90+ minutes on a fully loaded system. After rebooting a server with two 4tb drives it took a long time for the lvm-pvscan services to all complete. Those two devices are used by Gluster in Container Native Storage. [root@server ~]# systemctl list-units lvm2-pvscan* UNIT LOAD ACTIVE SUB DESCRIPTION lvm2-pvscan@253:3.service loaded active exited LVM2 PV scan on device 253:3 lvm2-pvscan@253:4.service loaded active exited LVM2 PV scan on device 253:4 lvm2-pvscan@253:5.service loaded active exited LVM2 PV scan on device 253:5 lvm2-pvscan@8:16.service loaded active exited LVM2 PV scan on device 8:16 lvm2-pvscan@8:2.service loaded active exited LVM2 PV scan on device 8:2 lvm2-pvscan@8:32.service loaded active exited LVM2 PV scan on device 8:32 LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 6 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'. [root@server ~]# systemctl status -l lvm2-pvscan@8:32.service * lvm2-pvscan@8:32.service - LVM2 PV scan on device 8:32 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: active (exited) since Thu 2018-05-17 10:48:11 PDT; 22h ago Docs: man:pvscan(8) Main PID: 1288 (code=exited, status=0/SUCCESS) Memory: 0B CGroup: /system.slice/system-lvm2\x2dpvscan.slice/lvm2-pvscan@8:32.service May 17 10:36:27 server.dmz systemd[1]: Starting LVM2 PV scan on device 8:32... May 17 10:36:27 server.dmz lvm[1288]: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. May 17 10:48:11 server.dmz lvm[1288]: Internal error: Reserved memory (21868544) not enough: used 45850624. Increase activation/reserved_memory? May 17 10:48:11 server.dmz lvm[1288]: 202 logical volume(s) in volume group "vg_0b025213f4146415f5151250bf206d03" now active May 17 10:48:11 server.dmz systemd[1]: Started LVM2 PV scan on device 8:32. [root@server ~]# systemctl status -l lvm2-pvscan@8:16.service * lvm2-pvscan@8:16.service - LVM2 PV scan on device 8:16 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: active (exited) since Thu 2018-05-17 10:56:34 PDT; 22h ago Docs: man:pvscan(8) Main PID: 1304 (code=exited, status=0/SUCCESS) Memory: 0B CGroup: /system.slice/system-lvm2\x2dpvscan.slice/lvm2-pvscan@8:16.service May 17 10:36:27 server.dmz systemd[1]: Starting LVM2 PV scan on device 8:16... May 17 10:36:27 server.dmz lvm[1304]: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. May 17 10:56:34 server.dmz lvm[1304]: Internal error: Reserved memory (22237184) not enough: used 68055040. Increase activation/reserved_memory? May 17 10:56:34 server.dmz lvm[1304]: 246 logical volume(s) in volume group "vg_fdef9e44dc1d58ce1759162de3d7b4d9" now active May 17 10:56:34 server.dmz systemd[1]: Started LVM2 PV scan on device 8:16. So between the reboot at 10:36 and 10:56 (20 min) other LVM operations would hang on the lock file. [root@server ~]# pvs ^C Interrupted... Giving up waiting for lock. /run/lock/lvm/V_vg_fdef9e44dc1d58ce1759162de3d7b4d9:aux: flock failed: Interrupted system call Can't get lock for vg_fdef9e44dc1d58ce1759162de3d7b4d9 Cannot process volume group vg_fdef9e44dc1d58ce1759162de3d7b4d9 Interrupted... Interrupted... PV VG Fmt Attr PSize PFree /dev/mapper/mpatha_t2 docker-vg lvm2 a-- 499.98g 299.49g /dev/mapper/mpathb_t2 vgvarlog01 lvm2 a-- 99.98g 0 /dev/mapper/mpathc_t2 vgetcd lvm2 a-- 49.98g 0 /dev/sda2 rootvg lvm2 a-- 278.36g 202.36g /dev/sdc vg_0b025213f4146415f5151250bf206d03 lvm2 a-- <4.37t 3.84t Version-Release number of selected component (if applicable): lvm2-2.02.171-8.el7.x86_64 glusterfs-3.8.4-54.el7rhgs.x86_64 kernel-3.10.0-693.11.6.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a lot of CNS Volumes 2. Bring down Gluster pod cleanly 3. Restart Server Actual results: Takes a long time for lvm scanning, preventing you from bringing back Gluster. Expected results: Need a way to preform restarts and maintenance is an efficient manner. Additional info:
Are all VGs active? (should they be?) Are there happen to be LVs within LVs? Is lvmetadata up and running (I don't think there's a need for it) ?
Can you try just filtering out the brick LVs in lvm.conf global_filter = [ "r|brick|" ]
Filtering them out means that lvm will not waste time scanning those brick LVs for other PVs. You do not appear to be "stacking" PVs on top of LVs, in which case there is no reason to scan the LVs. I said the same up in comment 13. If you filter them out, then the pvscan commands that are run to scan each brick LV will do nothing. Nothing needs to be added to the lvmetad cache from these LVs.
Moving the component to CNS Ansible, mainly to point out that, we need to get this as a install step to setup LVM global setting while getting the node setup.
Probably, though I'm not the right person to determine that nor make that change. I also don't know who would be.
David, isn't this a deja-vu with VDSM (see https://bugzilla.redhat.com/show_bug.cgi?id=1374545 ) ? We need to disable LVs we do not need/usr, we need to disable lvmetad and we need a correct lvm.conf, no? (I don't remember if we need to re-run dracut?)
This bug may be the same recently fixed bug 1613141. I'd suggest trying the fix from that bug. (In reply to Yaniv Kaul from comment #39) > David, isn't this a deja-vu with VDSM (see > https://bugzilla.redhat.com/show_bug.cgi?id=1374545 ) ? We need to disable > LVs we do not need/usr, we need to disable lvmetad and we need a correct > lvm.conf, no? > (I don't remember if we need to re-run dracut?) In the vdsm case there is shared storage, but here I don't think there is, so lvmetad can still legitimately be used. Also, I don't think there are any PVs layered on the LVs (from guests or otherwise), which means there should be no foreign LVs (e.g. from guests) that need to be excluded. Adding the LVs to the filter will not cause them to disappear, it will just prevent lvm from scanning them for layered (guest) PVs. When there are hundreds or thousands of LVs, scanning them can waste a lot of time and cause contention with lvmetad. Updating the initramfs to include the lvm.conf filter change would probably be best, although it's probably not necessary (there should be no pvscans or autoactivation happening in the initramfs).