Bug 1357416
Summary: | [NPIV] memory leak when frequently nodedev-create/destroy virtual fc cards. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yisun | ||||||
Component: | libvirt | Assignee: | John Ferlan <jferlan> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Han Han <hhan> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 7.3 | CC: | dyuan, hhan, jdenemar, rbalakri, xuzhang, yisun | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-2.5.0-1.el7 | Doc Type: | No Doc Update | ||||||
Doc Text: |
undefined
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-08-01 17:11:42 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
I fear there's a bit of bz 1319544 at play here too, but also a libvirt bug for which I have sent a patch upstream for: http://www.redhat.com/archives/libvir-list/2016-October/msg00509.html There's three leaks in nodeDeviceSysfsGetSCSIHostCaps - mainly because it just writes what it gets back into the virNodeDevCapDataPtr fields for wwnn, wwpn, and fabric_name) without first checking/freeing what could have been there before. So it's just overwriting on each call. Just looking at the valgrind data shows essentially three code paths: 1. virNodeDeviceCreateXML -> nodeDeviceCreateXML -> find_new_device -> nodeDeviceLookupSCSIHostByWWN -> nodeDeviceSysfsGetSCSIHostCaps (for lines 78, 70, and 62 - or fabric_name, node_name, port_name). 2. udevEventHandleCallback -> udevAddOneDevice -> udevGetDeviceDetails -> udevProcessSCSIHost -> nodeDeviceSysfsGetSCSIHostCaps (for same 3 lines) 3. virStateInitialize -> nodeStateInitialize -> udevEnumerateDevices -> udevProcessDeviceListEntry -> udevAddOneDevice -> udevGetDeviceDetails -> udevProcessSCSIHost -> nodeDeviceSysfsGetSCSIHostCaps The first 2 are essentially caused by the loop, while the 3rd is libvirtd startup reading things (and since you ^C'd out, the virStateCleanup never occurs). Let's focus on 1 & 2. #1 occurs because nodeDeviceLookupSCSIHostByWWN and find_new_device will continually be called "until" the "wwnn/wwpn" from the input XML (from virNodeDeviceCreateXML) match what's found in the file system. NB: The relation ship with bz 1319544 exists because the 'scsi_hostN' device exists, but it may have -1 written into the wwnn/wwpn which won't match the passed/expected wwnn/wwpn - so the code "waits" for the next trip. #2 is interesting - this is more or less the innocent bystander or original allocation path it seems. When a device is "add"'d or "change"'d, the udevAddOneDevice will make the "original" entry/object in the node_device driver. For any "change" events, the object->def would replace the initially created object->def (and free things properly). Still this would create the scsi_hostN object from which path #1 would do it's magic. Unfortunately the test environment is down right now; otherwise, I'd check out this hypothesis. I've pushed the patch: commit f29b13f8307985cffbd8a1a8c25bc4421db6f8a5 Author: John Ferlan <jferlan> Date: Tue Oct 11 11:21:19 2016 -0400 util: Alter return value of virReadFCHost and fix mem leak ... Rather than return a 0 or -1 and the *result string, return just the result string to the caller. Alter all the callers to handle the different return. As a side effect or result of this, it's much clearer that we cannot just assign the returned string into the scsi_host wwnn, wwpn, and fabric_wwn fields - rather we should fetch a temporary string, then as long as our fetch was good, VIR_FREE what may have been there, and STEAL what we just got. This fixes a memory leak in the virNodeDeviceCreateXML code path through find_new_device and nodeDeviceLookupSCSIHostByWWN which will continually call nodeDeviceSysfsGetSCSIHostCaps until the expected wwnn/wwpn is found in the device object capabilities. $ git describe f29b13f8307985cffbd8a1a8c25bc4421db6f8a5 v2.3.0-133-gf29b13f $ NB: I chose not to have a bugfix doc note on this since (a) the issue has been present a long time and (b) creating/destroying NPIV devices in a loop is not a "standard" operation Verified on libvirt-3.1.0-2.el7.x86_64: 1. Start libvirtd via valgrind # valgrind --leak-check=full --trace-children=no --child-silent-after-fork=yes --log-file=/tmp/valgrind.log libvirtd -d 2. Try to nodedev-create and nodedev-destroy vHBA looply # cat /tmp/hba.xml <device> <parent>scsi_host4</parent> <capability type='scsi_host'> <capability type='fc_host'> <wwnn>20000024ff370144</wwnn> <wwpn>2101001b32a90000</wwpn> </capability> </capability> </device> Run test: #!/bin/bash -x hba_xml=/tmp/hba.xml for i in {1..1000} do virsh nodedev-create $hba_xml node_name=scsi_$(ls --time=ctime /sys/class/fc_host/|head -1) virsh nodedev-destroy $node_name done 3. Check valgrind.log: ==24752== 358 (296 direct, 62 indirect) bytes in 1 blocks are definitely lost in loss record 160 of 178 ==24752== at 0x4C29975: calloc (vg_replace_malloc.c:711) ==24752== by 0x556598B: virAlloc (viralloc.c:144) ==24752== by 0x17E90D: daemonConfigNew (libvirtd-config.c:110) ==24752== by 0x13DFD1: main (libvirtd.c:1206) ==24752== ==24752== LEAK SUMMARY: ==24752== definitely lost: 296 bytes in 1 blocks ==24752== indirectly lost: 62 bytes in 6 blocks ==24752== possibly lost: 388 bytes in 20 blocks ==24752== still reachable: 113,496 bytes in 1,065 blocks ==24752== suppressed: 0 bytes in 0 blocks ==24752== Reachable blocks (those to which a pointer was found) are not shown. ==24752== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==24752== ==24752== For counts of detected and suppressed errors, rerun with: -v ==24752== ERROR SUMMARY: 19 errors from 19 contexts (suppressed: 0 from 0) No definitely lost caused by virReadFCHost. Bug fixed. Created attachment 1261493 [details]
The log of valgrind
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |
Created attachment 1180941 [details] script to frequently create/destroy vhba. Description of problem: memory leak when nodedev-create/destroy virtual fc cards. Version-Release number of selected component (if applicable): libvirt-2.0.0-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: Terminal 1: # service libvirtd stop # valgrind --leak-check=full --trace-children=no --child-silent-after-fork=yes libvirtd Terminal 2: 1. # cat /root/vhba/vhba1on0.xml <device> <parent>scsi_host0</parent> <capability type='scsi_host'> <capability type='fc_host'> <wwnn>20000000c99e2b80</wwnn> <wwpn>1000000000000001</wwpn> </capability> </capability> </device> 2. # ./test.sh /root/vhba/vhba1on0.xml // test.sh is attached as a file ... Node device scsi_host8 created from /root/vhba/vhba1on0.xml Destroyed node device 'scsi_host8' Node device scsi_host9 created from /root/vhba/vhba1on0.xml Destroyed node device 'scsi_host9' Node device scsi_host10 created from /root/vhba/vhba1on0.xml Destroyed node device 'scsi_host10' Node device scsi_host11 created from /root/vhba/vhba1on0.xml Destroyed node device 'scsi_host11' Node device scsi_host12 created from /root/vhba/vhba1on0.xml Destroyed node device 'scsi_host12' Terminal 1: ctrl+c and valgrind will have following log: [root@bootp-73-75-161 ~]# valgrind --leak-check=full --trace-children=no --child-silent-after-fork=yes libvirtd ==1468== Memcheck, a memory error detector ==1468== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==1468== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==1468== Command: libvirtd ==1468== ^C==1468== Conditional jump or move depends on uninitialised value(s) ==1468== at 0x7A4D8C6: ??? (in /usr/lib64/libselinux.so.1) ==1468== by 0x7A49ECB: selabel_close (in /usr/lib64/libselinux.so.1) ==1468== by 0x56A0DEB: virSecuritySELinuxSecurityDriverClose (security_selinux.c:817) ==1468== by 0x569D232: virSecurityManagerDispose (security_manager.c:54) ==1468== by 0x54FF85A: virObjectUnref (virobject.c:265) ==1468== by 0x569AC9E: virSecurityStackClose (security_stack.c:94) ==1468== by 0x569D232: virSecurityManagerDispose (security_manager.c:54) ==1468== by 0x54FF85A: virObjectUnref (virobject.c:265) ==1468== by 0x1E68982F: qemuStateCleanup (qemu_driver.c:1092) ==1468== by 0x55AB3EA: virStateCleanup (libvirt.c:807) ==1468== by 0x1206F7: main (libvirtd.c:1659) ==1468== ==1468== ==1468== HEAP SUMMARY: ==1468== in use at exit: 904,224 bytes in 17,317 blocks ==1468== total heap usage: 4,900,186 allocs, 4,882,869 frees, 4,050,715,788 bytes allocated ==1468== ==1468== 34 bytes in 2 blocks are definitely lost in loss record 281 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB3177D: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:62) ==1468== by 0x1DB329E9: udevProcessSCSIHost (node_device_udev.c:537) ==1468== by 0x1DB329E9: udevGetDeviceDetails (node_device_udev.c:999) ==1468== by 0x1DB329E9: udevAddOneDevice (node_device_udev.c:1116) ==1468== by 0x1DB34215: udevProcessDeviceListEntry (node_device_udev.c:1155) ==1468== by 0x1DB34215: udevEnumerateDevices (node_device_udev.c:1212) ==1468== by 0x1DB34215: nodeStateInitialize (node_device_udev.c:1504) ==1468== by 0x55AB28E: virStateInitialize (libvirt.c:770) ==1468== by 0x12254A: daemonRunStateInit (libvirtd.c:959) ==1468== by 0x551A671: virThreadHelper (virthread.c:206) ==1468== by 0x8072DC4: start_thread (pthread_create.c:308) ==1468== by 0x837E6EC: clone (clone.S:113) ==1468== ==1468== 34 bytes in 2 blocks are definitely lost in loss record 282 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB3179D: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:70) ==1468== by 0x1DB329E9: udevProcessSCSIHost (node_device_udev.c:537) ==1468== by 0x1DB329E9: udevGetDeviceDetails (node_device_udev.c:999) ==1468== by 0x1DB329E9: udevAddOneDevice (node_device_udev.c:1116) ==1468== by 0x1DB34215: udevProcessDeviceListEntry (node_device_udev.c:1155) ==1468== by 0x1DB34215: udevEnumerateDevices (node_device_udev.c:1212) ==1468== by 0x1DB34215: nodeStateInitialize (node_device_udev.c:1504) ==1468== by 0x55AB28E: virStateInitialize (libvirt.c:770) ==1468== by 0x12254A: daemonRunStateInit (libvirtd.c:959) ==1468== by 0x551A671: virThreadHelper (virthread.c:206) ==1468== by 0x8072DC4: start_thread (pthread_create.c:308) ==1468== by 0x837E6EC: clone (clone.S:113) ==1468== ==1468== 34 bytes in 2 blocks are definitely lost in loss record 283 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB317BD: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:78) ==1468== by 0x1DB329E9: udevProcessSCSIHost (node_device_udev.c:537) ==1468== by 0x1DB329E9: udevGetDeviceDetails (node_device_udev.c:999) ==1468== by 0x1DB329E9: udevAddOneDevice (node_device_udev.c:1116) ==1468== by 0x1DB34215: udevProcessDeviceListEntry (node_device_udev.c:1155) ==1468== by 0x1DB34215: udevEnumerateDevices (node_device_udev.c:1212) ==1468== by 0x1DB34215: nodeStateInitialize (node_device_udev.c:1504) ==1468== by 0x55AB28E: virStateInitialize (libvirt.c:770) ==1468== by 0x12254A: daemonRunStateInit (libvirtd.c:959) ==1468== by 0x551A671: virThreadHelper (virthread.c:206) ==1468== by 0x8072DC4: start_thread (pthread_create.c:308) ==1468== by 0x837E6EC: clone (clone.S:113) ==1468== ==1468== 1,752 bytes in 876 blocks are definitely lost in loss record 604 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB317BD: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:78) ==1468== by 0x1DB329E9: udevProcessSCSIHost (node_device_udev.c:537) ==1468== by 0x1DB329E9: udevGetDeviceDetails (node_device_udev.c:999) ==1468== by 0x1DB329E9: udevAddOneDevice (node_device_udev.c:1116) ==1468== by 0x1DB33C5F: udevEventHandleCallback (node_device_udev.c:1303) ==1468== by 0x54D02F6: virEventPollDispatchHandles (vireventpoll.c:508) ==1468== by 0x54D02F6: virEventPollRunOnce (vireventpoll.c:657) ==1468== by 0x54CE9C1: virEventRunDefaultImpl (virevent.c:314) ==1468== by 0x5625C9C: virNetDaemonRun (virnetdaemon.c:818) ==1468== by 0x121AE0: main (libvirtd.c:1617) ==1468== ==1468== 14,892 bytes in 876 blocks are definitely lost in loss record 625 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB3177D: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:62) ==1468== by 0x1DB329E9: udevProcessSCSIHost (node_device_udev.c:537) ==1468== by 0x1DB329E9: udevGetDeviceDetails (node_device_udev.c:999) ==1468== by 0x1DB329E9: udevAddOneDevice (node_device_udev.c:1116) ==1468== by 0x1DB33C5F: udevEventHandleCallback (node_device_udev.c:1303) ==1468== by 0x54D02F6: virEventPollDispatchHandles (vireventpoll.c:508) ==1468== by 0x54D02F6: virEventPollRunOnce (vireventpoll.c:657) ==1468== by 0x54CE9C1: virEventRunDefaultImpl (virevent.c:314) ==1468== by 0x5625C9C: virNetDaemonRun (virnetdaemon.c:818) ==1468== by 0x121AE0: main (libvirtd.c:1617) ==1468== ==1468== 14,892 bytes in 876 blocks are definitely lost in loss record 626 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB3179D: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:70) ==1468== by 0x1DB329E9: udevProcessSCSIHost (node_device_udev.c:537) ==1468== by 0x1DB329E9: udevGetDeviceDetails (node_device_udev.c:999) ==1468== by 0x1DB329E9: udevAddOneDevice (node_device_udev.c:1116) ==1468== by 0x1DB33C5F: udevEventHandleCallback (node_device_udev.c:1303) ==1468== by 0x54D02F6: virEventPollDispatchHandles (vireventpoll.c:508) ==1468== by 0x54D02F6: virEventPollRunOnce (vireventpoll.c:657) ==1468== by 0x54CE9C1: virEventRunDefaultImpl (virevent.c:314) ==1468== by 0x5625C9C: virNetDaemonRun (virnetdaemon.c:818) ==1468== by 0x121AE0: main (libvirtd.c:1617) ==1468== ==1468== 29,750 bytes in 1,750 blocks are definitely lost in loss record 629 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB3177D: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:62) ==1468== by 0x1DB30859: nodeDeviceLookupSCSIHostByWWN (node_device_driver.c:295) ==1468== by 0x1DB3132E: find_new_device (node_device_driver.c:545) ==1468== by 0x1DB3132E: nodeDeviceCreateXML (node_device_driver.c:600) ==1468== by 0x55DB1AF: virNodeDeviceCreateXML (libvirt-nodedev.c:707) ==1468== by 0x13F819: remoteDispatchNodeDeviceCreateXML (remote_dispatch.h:13403) ==1468== by 0x13F819: remoteDispatchNodeDeviceCreateXMLHelper (remote_dispatch.h:13381) ==1468== by 0x562BBD1: virNetServerProgramDispatchCall (virnetserverprogram.c:437) ==1468== by 0x562BBD1: virNetServerProgramDispatch (virnetserverprogram.c:307) ==1468== by 0x15BBCC: virNetServerProcessMsg (virnetserver.c:148) ==1468== by 0x15BBCC: virNetServerHandleJob (virnetserver.c:169) ==1468== by 0x551B2C0: virThreadPoolWorker (virthreadpool.c:167) ==1468== ==1468== 29,750 bytes in 1,750 blocks are definitely lost in loss record 630 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB3179D: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:70) ==1468== by 0x1DB30859: nodeDeviceLookupSCSIHostByWWN (node_device_driver.c:295) ==1468== by 0x1DB3132E: find_new_device (node_device_driver.c:545) ==1468== by 0x1DB3132E: nodeDeviceCreateXML (node_device_driver.c:600) ==1468== by 0x55DB1AF: virNodeDeviceCreateXML (libvirt-nodedev.c:707) ==1468== by 0x13F819: remoteDispatchNodeDeviceCreateXML (remote_dispatch.h:13403) ==1468== by 0x13F819: remoteDispatchNodeDeviceCreateXMLHelper (remote_dispatch.h:13381) ==1468== by 0x562BBD1: virNetServerProgramDispatchCall (virnetserverprogram.c:437) ==1468== by 0x562BBD1: virNetServerProgramDispatch (virnetserverprogram.c:307) ==1468== by 0x15BBCC: virNetServerProcessMsg (virnetserver.c:148) ==1468== by 0x15BBCC: virNetServerHandleJob (virnetserver.c:169) ==1468== by 0x551B2C0: virThreadPoolWorker (virthreadpool.c:167) ==1468== ==1468== 29,750 bytes in 1,750 blocks are definitely lost in loss record 631 of 635 ==1468== at 0x4C27BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==1468== by 0x830D839: strdup (strdup.c:42) ==1468== by 0x5515544: virStrdup (virstring.c:713) ==1468== by 0x5524157: virReadFCHost (virutil.c:2046) ==1468== by 0x1DB317BD: nodeDeviceSysfsGetSCSIHostCaps (node_device_linux_sysfs.c:78) ==1468== by 0x1DB30859: nodeDeviceLookupSCSIHostByWWN (node_device_driver.c:295) ==1468== by 0x1DB3132E: find_new_device (node_device_driver.c:545) ==1468== by 0x1DB3132E: nodeDeviceCreateXML (node_device_driver.c:600) ==1468== by 0x55DB1AF: virNodeDeviceCreateXML (libvirt-nodedev.c:707) ==1468== by 0x13F819: remoteDispatchNodeDeviceCreateXML (remote_dispatch.h:13403) ==1468== by 0x13F819: remoteDispatchNodeDeviceCreateXMLHelper (remote_dispatch.h:13381) ==1468== by 0x562BBD1: virNetServerProgramDispatchCall (virnetserverprogram.c:437) ==1468== by 0x562BBD1: virNetServerProgramDispatch (virnetserverprogram.c:307) ==1468== by 0x15BBCC: virNetServerProcessMsg (virnetserver.c:148) ==1468== by 0x15BBCC: virNetServerHandleJob (virnetserver.c:169) ==1468== by 0x551B2C0: virThreadPoolWorker (virthreadpool.c:167) ==1468== ==1468== LEAK SUMMARY: ==1468== definitely lost: 120,888 bytes in 7,884 blocks ==1468== indirectly lost: 0 bytes in 0 blocks ==1468== possibly lost: 0 bytes in 0 blocks ==1468== still reachable: 783,336 bytes in 9,433 blocks ==1468== suppressed: 0 bytes in 0 blocks ==1468== Reachable blocks (those to which a pointer was found) are not shown. ==1468== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==1468== ==1468== For counts of detected and suppressed errors, rerun with: -v ==1468== Use --track-origins=yes to see where uninitialised values come from ==1468== ERROR SUMMARY: 34 errors from 10 contexts (suppressed: 0 from 0) Actual results: There are some mem leak errors with libvirtd. Expected results: Mem leak should be avoided.