Bug 1435198

Summary: Udev/haldaemon consume 100 percent of the CPU for more than an hour when booting with lots of LVM objects
Product: Red Hat Enterprise Linux 6 Reporter: Greg Scott <gscott>
Component: halAssignee: Richard Hughes <rhughes>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.7CC: cww, udev-maint-list
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-15 21:19:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Greg Scott 2017-03-23 11:32:21 UTC
Description of problem:

RHEVH-6.7 and other RHEVH-6.n systems with lots of LVM objects take more than an hour to boot while haldaemon consumes 100 percent of the CPU and grinds through several thousand LVM objects.


Version-Release number of selected component (if applicable):

RHEVH-6.7-20160104 and others in the RHEVH-6 series.

How reproducible:
Always

Steps to Reproduce:
1. Set up a RHEV environment with 2000 active, stateless VMs and 2000 inactive VMs in a fiberchannel environment with 4 paths to the SAN.  This will add up to ((2000 X 2) + 2000) X 4 = 24000 LVM objects.  These will be a mixture of LVs, VGs, and PVs.
2. Boot one of the RHEV-H hosts
3. Wait a long time
4. It never activates by itself into a RHEV environment.

Actual results:

Once we can finally login to the console or ssh into the host, watch top and notice haldaemon consumes 100 percent of the CPU for at least an hour to grind through all those /dev/mapper objects.  After haldaemon finishes grinding, we can restart the vdsmd service by hand to activate the host into a RHEV environment.

Expected results:

The system should boot in a reasonable amount of time and activate on its own.

Additional info:

Note that RHEVH-6 and RHVH-7 are special cases of RHEL.  RHEV / RHV uses a process named VDSM on each host to filter out LVM objects it doesn't need to activate the VMs on this host. So, at boot time, haldaemon enumerates every single LVM object, and then VDSM filters out the ones it doesn't need.  Not only does this waste a huge amount of time, but haldaemon breaks vdsm by taking so long.

We worked around the problem by disabling haldaemon at boot for large customers.  With haldaemon disabled, boot times drop from one hour plus to around 15 minutes, and the VDSM process works as expected to automatically activate these hosts into their RHEV datacenter.

Haldaemon starts by default with RHEL 6. RHEV engineering needs to know what is lost by disabling haldaemon in RHEVH-6, and the best way to boot RHEL7 / RHVH-7 while carrying a similar load of tens of thousands of LVM objects in /dev/mapper.

Comment 2 Michal Sekletar 2017-03-23 12:48:36 UTC
I don't know all the details about HAL, but in the nutshell, HAL exposes devices and on DBus and provides library (libhal) that applications can use to enumerate devices, get and set device properties and so on.

By removing HAL you lose some capabilities but question is whether you need them. Anyway, HAL is a separate component in RHEL6. Reassigning...