Bug 1574580

Summary: On a host with 4 or more large VMs, all executor threads may get stuck in NumaInfoMonitor, causing all be VMs to be marked as "not responding"
Product: Red Hat Enterprise Virtualization Manager Reporter: Sergio Lopez <slopezpa>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.11CC: dagur, gveitmic, lsurette, slopezpa, srevivo, ycui, ykaul
Target Milestone: ---Keywords: Performance
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-24 07:41:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sergio Lopez 2018-05-03 15:14:43 UTC
Description of problem:

Reading "/proc/$QEMU_PID/task/$VCPU_PID/numa_maps" for a VM with various hundreds of GBs allocated to it, may take a significant amount of time. As this is done once per each vCPU, if the VM has a large number of them, the time needed to collect the NUMA stats for it can take even various minutes.

On a VM with 40 vCPUs and ~800GB of RAM, we've measured 227 seconds for reading all "numa_maps", taking 4 to 6 seconds each.

As a consequence, in the core dump from vdsm we can observe that all executor threads are busy in NumaInfoMonitor:

#50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9b880e7130, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdd': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x7f9c284aa730>, _Condition__lock=<Lock at remote 0x7f9c181d3290>, acquire=<instanc...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040
#50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9bd800ffe0, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x239ddc0>, _Condition__lock=<Lock at remote 0x7f9c080ddb90>, acquire=<instancemethod at remote 0x1214fa0>, _Condition__cond=<Cond at remote 0x2a65d90>) at remote 0x2a65fd0>) at remote 0x2a65190>, _statusLock=<Lock at remote ...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040
#50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9c2410f2d0, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x2b5eb40>, _Condition__lock=<Lock at remote 0x2afef80>, acquire=<instancemethod at remote 0x2b5e190>, _Condition__cond=<Cond at remote 0x7f9c3c24a250>) at remote 0x7f9c3c24ad50>) at remote 0x7f9c3c24ae90>, _statusLock=<Lock ...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040
#50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9bc4011c80, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdd': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x2b6aa50>, _Condition__lock=<Lock at remote 0x2a856c8>, acquire=<instancemethod at...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040


This causes vdsm to log a batch of messages like this:

2018-05-03 10:37:15,507-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='85d19230-4aaf-489f-b091-4cdfd3721ff9') monitor became unresponsive (command timeout, age=60.54) (vm:5137)
2018-05-03 10:37:15,507-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='09a260e9-2172-44db-aeaa-66e643167eee') monitor became unresponsive (command timeout, age=60.54) (vm:5137)
2018-05-03 10:37:15,508-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='09004259-7648-431a-b9b0-a7f6d64a82c8') monitor became unresponsive (command timeout, age=60.54) (vm:5137)
2018-05-03 10:37:15,509-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='9156bc72-9e75-4d5c-b183-389040403ec2') monitor became unresponsive (command timeout, age=60.54) (vm:5137)
2018-05-03 10:37:15,509-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='75e47b1b-1cff-4e01-9098-4689caadeb86') monitor became unresponsive (command timeout, age=60.5499999998) (vm:5137)
2018-05-03 10:37:17,521-0400 WARN  (jsonrpc/6) [virt.vm] (vmId='75e47b1b-1cff-4e01-9098-4689caadeb86') monitor became unresponsive (command timeout, age=62.5600000005) (vm:5137)
2018-05-03 10:37:18,804-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='85d19230-4aaf-489f-b091-4cdfd3721ff9') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137)
2018-05-03 10:37:18,804-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='09a260e9-2172-44db-aeaa-66e643167eee') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137)
2018-05-03 10:37:18,805-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='09004259-7648-431a-b9b0-a7f6d64a82c8') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137)
2018-05-03 10:37:18,806-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='9156bc72-9e75-4d5c-b183-389040403ec2') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137)
2018-05-03 10:37:18,806-0400 WARN  (jsonrpc/0) [virt.vm] (vmId='75e47b1b-1cff-4e01-9098-4689caadeb86') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137)

And all VMs running on the Host are repeatedly being marked as "not responding" for short periods of time.


Version-Release number of selected component (if applicable):

vdsm-4.19.50-1.el7ev.x86_64


How reproducible:

Always


Steps to Reproduce:

I don't have a Host large enough to simulate this behavior, but I managed to simulate it by introducing an articial delay in supervdsm's getVcpuNumaMemoryMapping:


     56 @expose
     57 def getVcpuNumaMemoryMapping(vmName):
     58     vmPid = getVmPid(vmName)
     59     vCpuPids = numa.getVcpuPid(vmName)
     60     vCpuIdxToNode = {}
     61     for vCpuIndex, vCpuPid in vCpuPids.iteritems():
     62         numaMapsFile = "/proc/%s/task/%s/numa_maps" % (vmPid, vCpuPid)
     63         try:
     64             with open(numaMapsFile, 'r') as f:
     65                 mappingNodes = map(
     66                     int, re.findall('N(\d+)=\d+', f.read()))
     67                 vCpuIdxToNode[vCpuIndex] = list(set(mappingNodes))
     68                 # Ask this thread to take a quick nap
     69                 time.sleep(30)
     70         except IOError:
     71             continue
     72     return vCpuIdxToNode

Then, running at least 4 VMs (not counting HostedEngine) in the Host, and after restarting supervdsm (to synchronize numa stats collections), the symptoms of the issue started appearing.

Comment 5 Franta Kust 2019-05-16 13:06:01 UTC
BZ<2>Jira Resync

Comment 6 Daniel Gur 2019-08-28 13:13:13 UTC
sync2jira

Comment 7 Daniel Gur 2019-08-28 13:17:26 UTC
sync2jira