Description of problem: Reading "/proc/$QEMU_PID/task/$VCPU_PID/numa_maps" for a VM with various hundreds of GBs allocated to it, may take a significant amount of time. As this is done once per each vCPU, if the VM has a large number of them, the time needed to collect the NUMA stats for it can take even various minutes. On a VM with 40 vCPUs and ~800GB of RAM, we've measured 227 seconds for reading all "numa_maps", taking 4 to 6 seconds each. As a consequence, in the core dump from vdsm we can observe that all executor threads are busy in NumaInfoMonitor: #50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9b880e7130, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdd': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x7f9c284aa730>, _Condition__lock=<Lock at remote 0x7f9c181d3290>, acquire=<instanc...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040 #50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9bd800ffe0, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x239ddc0>, _Condition__lock=<Lock at remote 0x7f9c080ddb90>, acquire=<instancemethod at remote 0x1214fa0>, _Condition__cond=<Cond at remote 0x2a65d90>) at remote 0x2a65fd0>) at remote 0x2a65190>, _statusLock=<Lock at remote ...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040 #50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9c2410f2d0, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x2b5eb40>, _Condition__lock=<Lock at remote 0x2afef80>, acquire=<instancemethod at remote 0x2b5e190>, _Condition__cond=<Cond at remote 0x7f9c3c24a250>) at remote 0x7f9c3c24ad50>) at remote 0x7f9c3c24ae90>, _statusLock=<Lock ...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040 #50 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f9bc4011c80, for file /usr/lib/python2.7/site-packages/vdsm/executor.py, line 379, in __call__ (self=<Task(discard=True, _callable=<NumaInfoMonitor(_vm=<Vm(_migration_downtime=None, _ioTuneValues={'vda': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdc': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdb': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'vdd': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}}, _vcpuLimit=None, _monitorable=True, _pathsPreparedEvent=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<Condition(release=<instancemethod at remote 0x2b6aa50>, _Condition__lock=<Lock at remote 0x2a856c8>, acquire=<instancemethod at...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040 This causes vdsm to log a batch of messages like this: 2018-05-03 10:37:15,507-0400 WARN (jsonrpc/0) [virt.vm] (vmId='85d19230-4aaf-489f-b091-4cdfd3721ff9') monitor became unresponsive (command timeout, age=60.54) (vm:5137) 2018-05-03 10:37:15,507-0400 WARN (jsonrpc/0) [virt.vm] (vmId='09a260e9-2172-44db-aeaa-66e643167eee') monitor became unresponsive (command timeout, age=60.54) (vm:5137) 2018-05-03 10:37:15,508-0400 WARN (jsonrpc/0) [virt.vm] (vmId='09004259-7648-431a-b9b0-a7f6d64a82c8') monitor became unresponsive (command timeout, age=60.54) (vm:5137) 2018-05-03 10:37:15,509-0400 WARN (jsonrpc/0) [virt.vm] (vmId='9156bc72-9e75-4d5c-b183-389040403ec2') monitor became unresponsive (command timeout, age=60.54) (vm:5137) 2018-05-03 10:37:15,509-0400 WARN (jsonrpc/0) [virt.vm] (vmId='75e47b1b-1cff-4e01-9098-4689caadeb86') monitor became unresponsive (command timeout, age=60.5499999998) (vm:5137) 2018-05-03 10:37:17,521-0400 WARN (jsonrpc/6) [virt.vm] (vmId='75e47b1b-1cff-4e01-9098-4689caadeb86') monitor became unresponsive (command timeout, age=62.5600000005) (vm:5137) 2018-05-03 10:37:18,804-0400 WARN (jsonrpc/0) [virt.vm] (vmId='85d19230-4aaf-489f-b091-4cdfd3721ff9') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137) 2018-05-03 10:37:18,804-0400 WARN (jsonrpc/0) [virt.vm] (vmId='09a260e9-2172-44db-aeaa-66e643167eee') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137) 2018-05-03 10:37:18,805-0400 WARN (jsonrpc/0) [virt.vm] (vmId='09004259-7648-431a-b9b0-a7f6d64a82c8') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137) 2018-05-03 10:37:18,806-0400 WARN (jsonrpc/0) [virt.vm] (vmId='9156bc72-9e75-4d5c-b183-389040403ec2') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137) 2018-05-03 10:37:18,806-0400 WARN (jsonrpc/0) [virt.vm] (vmId='75e47b1b-1cff-4e01-9098-4689caadeb86') monitor became unresponsive (command timeout, age=63.8399999999) (vm:5137) And all VMs running on the Host are repeatedly being marked as "not responding" for short periods of time. Version-Release number of selected component (if applicable): vdsm-4.19.50-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: I don't have a Host large enough to simulate this behavior, but I managed to simulate it by introducing an articial delay in supervdsm's getVcpuNumaMemoryMapping: 56 @expose 57 def getVcpuNumaMemoryMapping(vmName): 58 vmPid = getVmPid(vmName) 59 vCpuPids = numa.getVcpuPid(vmName) 60 vCpuIdxToNode = {} 61 for vCpuIndex, vCpuPid in vCpuPids.iteritems(): 62 numaMapsFile = "/proc/%s/task/%s/numa_maps" % (vmPid, vCpuPid) 63 try: 64 with open(numaMapsFile, 'r') as f: 65 mappingNodes = map( 66 int, re.findall('N(\d+)=\d+', f.read())) 67 vCpuIdxToNode[vCpuIndex] = list(set(mappingNodes)) 68 # Ask this thread to take a quick nap 69 time.sleep(30) 70 except IOError: 71 continue 72 return vCpuIdxToNode Then, running at least 4 VMs (not counting HostedEngine) in the Host, and after restarting supervdsm (to synchronize numa stats collections), the symptoms of the issue started appearing.
BZ<2>Jira Resync
sync2jira