656795 – Start and shutdown domain lead to memory leak

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 656795 - Start and shutdown domain lead to memory leak

Summary: Start and shutdown domain lead to memory leak

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Blake
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	583083 620334 (view as bug list)
Depends On:	620345 658571 658657 682240
Blocks:	672549 679164 682249
TreeView+	depends on / blocked

Reported:	2010-11-24 06:15 UTC by xhu
Modified:	2015-09-28 02:27 UTC (History)
CC List:	19 users (show)
Fixed In Version:	libvirt-0.8.7-1.el6
Doc Type:	Bug Fix
Doc Text:	Memory buffer was not freed properly on domain startup and shutdown, which led to a memory leak that increased each time the domain was started or shut down. This update removes this memory leak.
Clone Of:
Environment:
Last Closed:	2011-05-19 13:24:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
leak memory check script (247 bytes, text/plain) 2010-11-24 06:15 UTC, xhu	no flags	Details
valgrind log for libvirtd (411.96 KB, text/plain) 2010-11-24 06:18 UTC, xhu	no flags	Details
Patch to fix memory leak (812 bytes, patch) 2010-12-01 15:02 UTC, Anthony Liguori	no flags	Details \| Diff
libvirtd_memory_check.sh.log for libvirt-0.8.7-1.el6 (2.35 KB, application/octet-stream) 2011-01-11 06:01 UTC, Cui Chun	no flags	Details
leak test log for libvirt-0.8.7-6.el6 (2.30 KB, text/plain) 2011-02-15 12:08 UTC, Vivian Bian	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	68645	0	None	None	None	Never
Red Hat Product Errata	RHBA-2011:0596	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2011-05-18 17:56:36 UTC

Description xhu 2010-11-24 06:15:27 UTC

Created attachment 462542 [details]
leak memory check script

Description of problem:
Start and shutdown domain lead to memory leak

Version-Release number of selected component (if applicable):
kernel-2.6.32-71.el6.x86_64
libvirt-0.8.1-27.el6.x86_64
qemu-kvm-0.12.1.2-2.113.el6.x86_64

How reproducible:
everytime

Steps to Reproduce:
1. install a domain named "kvm1"
2. disable selinux:
# setenforce 0
3. run "libvirtd_memory_check.sh" attachment script
  
Actual results:
when the script in step 2 is running, it can be seen in "libvirtd_memory_check.sh.log" attachment valgrind log that there is memory leak. 
And the leak memory will increase after every start/shutdown

Expected results:
No memory leak caused by start and shutdown domain

Additional info:
The leak memory statics is shown in "libvirtd_memory_check.sh.log" attachment valgrind log as follows:

LEAK SUMMARY:
==10198==    definitely lost: 124,172 bytes in 148 blocks
==10198==    indirectly lost: 3,527,409 bytes in 30,656 blocks
==10198==      possibly lost: 26,229 bytes in 138 blocks
==10198==    still reachable: 2,373,241 bytes in 17,808 blocks
==10198==         suppressed: 0 bytes in 0 blocks
==10198== Rerun with --leak-check=full to see details of leaked memory

Comment 1 xhu 2010-11-24 06:18:45 UTC

Created attachment 462543 [details]
valgrind log for libvirtd

Comment 2 Eric Blake 2010-11-24 18:11:36 UTC

Upstream patch posted for the worst offender (at least 1024 bytes on every qemu monitor connection, which is one per start/stop sequence):
https://www.redhat.com/archives/libvir-list/2010-November/msg01100.html

There appears to be other leaks (148 blocks of 1024 bytes each would be larger than 124172 lost bytes), but they are smaller in size, and might not be as frequent; it will take more analysis to decide whether anything else is worth plugging.

Comment 3 Frank Novak 2010-11-25 14:05:16 UTC

Mike Strosaker on our team had a simple cron job to run every 15 mins and capture libvirtd memory usage..As far as we can tell, the same four VMs have been running on the system for the entirety of monitoring, so there has been no provisioning activity.


DATE                                 %MEM    RSS
2010-11-23-16:45:01                  10.4    6731.32
2010-11-23-17:00:01                  11.3    7341.36
2010-11-23-17:15:01                  12.3    7948.18
2010-11-23-17:30:01                  13.2    8555.17
2010-11-23-17:45:01                  14.2    9168.53
2010-11-23-18:00:01                  15.2    9799.07
2010-11-23-18:15:01                  16.2    10436.68
2010-11-23-18:30:01                  17.1    11057.58
2010-11-23-18:45:01                  18.1    11666.37
2010-11-23-19:00:01                  19.1    12307.74
2010-11-23-19:15:01                  20.0    12945.53
2010-11-23-19:30:01                  21.0    13572.84
2010-11-23-19:45:01                  22.0    14213.44
2010-11-23-20:00:01                  23.0    14863.93
2010-11-23-20:15:01                  24.0    15502.12
2010-11-23-20:30:01                  25.0    16155.37
2010-11-23-20:45:01                  26.1    16817.77
2010-11-23-21:00:01                  27.1    17472.32
2010-11-23-21:15:01                  28.1    18116.05
2010-11-23-21:30:01                  29.1    18758.14
2010-11-23-21:45:01                  30.0    19380.17
2010-11-23-22:00:01                  31.0    20029.62
2010-11-23-22:15:01                  32.0    20654.29
2010-11-23-22:30:01                  33.0    21280.00
2010-11-23-22:45:01                  33.9    21890.36
2010-11-23-23:00:01                  34.9    22500.23
2010-11-23-23:15:01                  35.8    23103.05
2010-11-23-23:30:01                  36.8    23714.33

Additional data being collected..

Comment 4 Anthony Liguori 2010-12-01 00:09:41 UTC

The nasty leak has something to do with disk information.  Based on a core extracted from a leaking libvirtd process, there's a repeating pattern of:

002ca150: 2f73 746f 7261 6765 2f70 726f 642f 6570  /storage/prod/ep
002ca160: 6865 6d65 7261 6c2f 2f76 686f 7374 3037  hemeral//vhost07
002ca170: 3239 2f76 686f 7374 3037 3239 2e69 6d67  29/vhost0729.img
002ca180: 0000 0058 767f 0000 2500 0000 0000 0000  ...Xv...%.......
002ca190: 6964 6530 2d30 2d30 0000 0058 767f 0000  ide0-0-0...Xv...
002ca1a0: 2000 0000 0000 0000 3500 0000 0000 0000   .......5.......
002ca1b0: 656f 7468 6572 0000 9800 0058 767f 0000  eother.....Xv...
002ca1c0: bf89 cacb 1b4d 2a7b 3000 0030 767f 0000  .....M*{0..0v...
002ca1d0: 3000 0000 0000 0000 4500 0000 0000 0000  0.......E.......

That is repeated over 2 million times which looks clearly like a high frequency leak.  Still trying to find a the right data structure that would contain this information.

Comment 5 Anthony Liguori 2010-12-01 00:12:55 UTC

There are three other guests running.  vhost0728 has 200k hits in the core file but the other two hosts only have 3 hits.

Looks like the leak is specific to particular guests.  It's possibly we're running some sort of API call frequently but only for certain guests.

Comment 6 Eric Blake 2010-12-01 00:27:31 UTC

I've identified further leaks in libnl and libselinux that impact libvirt, and I'm still in the process of tracking down root causes of other valgrind leak reports.  I'm definitely making progress on plugging leaks via upstream patches, and will be working on backporting them to RHEL as fast as I can.

Comment 7 Eric Blake 2010-12-01 03:45:15 UTC

*** Bug 620334 has been marked as a duplicate of this bug. ***

Comment 8 Daniel Berrangé 2010-12-01 10:37:08 UTC

(In reply to comment #4)
> The nasty leak has something to do with disk information.  Based on a core
> extracted from a leaking libvirtd process, there's a repeating pattern of:
> 
> 002ca150: 2f73 746f 7261 6765 2f70 726f 642f 6570  /storage/prod/ep
> 002ca160: 6865 6d65 7261 6c2f 2f76 686f 7374 3037  hemeral//vhost07
> 002ca170: 3239 2f76 686f 7374 3037 3239 2e69 6d67  29/vhost0729.img
> 002ca180: 0000 0058 767f 0000 2500 0000 0000 0000  ...Xv...%.......
> 002ca190: 6964 6530 2d30 2d30 0000 0058 767f 0000  ide0-0-0...Xv...
> 002ca1a0: 2000 0000 0000 0000 3500 0000 0000 0000   .......5.......
> 002ca1b0: 656f 7468 6572 0000 9800 0058 767f 0000  eother.....Xv...
> 002ca1c0: bf89 cacb 1b4d 2a7b 3000 0030 767f 0000  .....M*{0..0v...
> 002ca1d0: 3000 0000 0000 0000 4500 0000 0000 0000  0.......E.......
> 
> That is repeated over 2 million times which looks clearly like a high frequency
> leak.  Still trying to find a the right data structure that would contain this
> information.

That pattern is showing 'path', 'devAlias', <some integer>, 'reason'. In other words it is an instance of an virDomainEvent for an I/O error. Likely from

    ioErrorEvent2 = virDomainEventIOErrorReasonNewFromObj(vm, srcPath, devAlias, action, reason);

In qemuHandleDomainIOError.

This allocated object is put on the event queue

            qemuDomainEventQueue(driver, ioErrorEvent2);


Some short while later qemuDomainEventFlush runs and invokes

    virDomainEventQueueDispatch(&tempQueue,
                                driver->domainEventCallbacks,
                                qemuDomainEventDispatchFunc,
                                driver);

this should iterate over all queued event, dispatch them, and then call virDomainEventFree().

The only way I could see it leak, is if the qemuDomainEventFlush method never got run.

Comment 9 Anthony Liguori 2010-12-01 15:02:43 UTC

Created attachment 463999 [details]
Patch to fix memory leak

This is untested and against upstream, but I think this is the source of the problem.

Comment 11 Eric Blake 2010-12-14 14:52:11 UTC

Proposed patch series for z-stream:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2010-December/msg00305.html

Comment 12 IBM Bug Proxy 2010-12-21 09:52:04 UTC

------- Comment From bnpoorni.com 2010-12-21 04:47 EDT-------
*** Bug 68847 has been marked as a duplicate of this bug. ***

Comment 13 Jiri Denemark 2011-01-09 23:57:32 UTC

Built into libvirt-0.8.7-1.el6

Comment 14 Cui Chun 2011-01-11 05:57:48 UTC

Verified. 

Please confirm if the "LEAK SUMMARY" is acceptable. I will continue to run the script and try to finish 36000 cycles tonight. 

-----------------
Test environment:
libvirt-0.8.7-1.el6
qemu-kvm-0.12.1.2-2.128.el6
kernel-2.6.32-94.el6

Steps:
1. install a domain named "rhel6-clone"
2. disable selinux:
# setenforce 0
3. run "libvirtd_memory_check.sh" attachment script 
4. check the "libvirtd_memory_check.sh.log" after running 400 cycles, do not found leak again.

==21443== LEAK SUMMARY:
==21443==    definitely lost: 0 bytes in 0 blocks
==21443==    indirectly lost: 0 bytes in 0 blocks
==21443==      possibly lost: 349 bytes in 18 blocks
==21443==    still reachable: 2,540 bytes in 47 blocks
==21443==         suppressed: 0 bytes in 0 blocks
==21443== Rerun with --leak-check=full to see details of leaked memory

Comment 15 Cui Chun 2011-01-11 06:01:07 UTC

Created attachment 472740 [details]
libvirtd_memory_check.sh.log for libvirt-0.8.7-1.el6

Comment 16 Laine Stump 2011-01-13 21:36:40 UTC

*** Bug 583083 has been marked as a duplicate of this bug. ***

Comment 19 Vivian Bian 2011-02-15 12:08:58 UTC

Created attachment 478866 [details]
leak test log for libvirt-0.8.7-6.el6

retested with libvirt-0.8.7-6.el6.x86_64 PASS . Set bug status to VERIFIED

1. install a domain named "rhel6-clone"
2. disable selinux:
# setenforce 0
3. run "libvirtd_memory_check.sh" attachment script 
4. check the "libvirtd_memory_check.sh.log" after running 36000 cycles, do not
found leak again.


==5593== LEAK SUMMARY:
==5593==    definitely lost: 0 bytes in 0 blocks
==5593==    indirectly lost: 0 bytes in 0 blocks
==5593==      possibly lost: 349 bytes in 18 blocks
==5593==    still reachable: 1,840 bytes in 39 blocks
==5593==         suppressed: 0 bytes in 0 blocks
==5593== Rerun with --leak-check=full to see details of leaked memory

Comment 20 Martin Prpič 2011-04-15 14:24:25 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Starting and shutting down a domain led to a memory leak due to the memory buffer not being freed properly. With this update, starting and shutting down a domain no longer leads to a memory leak.

Comment 23 Laura Bailey 2011-05-04 04:30:36 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Starting and shutting down a domain led to a memory leak due to the memory buffer not being freed properly. With this update, starting and shutting down a domain no longer leads to a memory leak.+Memory buffer was not freed properly on domain startup and shutdown, which led to a memory leak that increased each time the domain was started or shut down. This update removes this memory leak.

Comment 24 errata-xmlrpc 2011-05-19 13:24:25 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html

Note You need to log in before you can comment on or make changes to this bug.