866524 – use-after-free on virsh node-memory-tune

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 866524 - use-after-free on virsh node-memory-tune

Summary: use-after-free on virsh node-memory-tune

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Osier Yang
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	888457
TreeView+	depends on / blocked

Reported:	2012-10-15 14:44 UTC by Alex Jia
Modified:	2013-02-21 07:10 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libvirt-0.10.2-12.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 07:10:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
valgrind report (10.03 KB, text/plain) 2012-10-15 14:45 UTC, Alex Jia	no flags	Details
valgrind report (37.25 KB, text/plain) 2012-11-01 09:47 UTC, Alex Jia	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:0276	0	normal	SHIPPED_LIVE	Moderate: libvirt security, bug fix, and enhancement update	2013-02-20 21:18:26 UTC

Description Alex Jia 2012-10-15 14:44:52 UTC

Description of problem:
Valgrind complains use-after-free memory error when passed a invalid number to virsh node-memory-tune.

Version-Release number of selected component (if applicable):
# rpm -q libvirt
libvirt-0.10.2-3.el6.x86_64

How reproducible:
always

Steps to Reproduce:
# valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1

  
Actual results:
# valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1
                                                                   
error: invalid shm-sleep-millisecs number



Expected results:
please see attachment.

Additional info:

Comment 1 Alex Jia 2012-10-15 14:45:27 UTC

Created attachment 627492 [details]
valgrind report

Comment 3 Osier Yang 2012-10-16 10:43:13 UTC

(In reply to comment #0)
> Description of problem:
> Valgrind complains use-after-free memory error when passed a invalid number
> to virsh node-memory-tune.
> 
> Version-Release number of selected component (if applicable):
> # rpm -q libvirt
> libvirt-0.10.2-3.el6.x86_64
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> # valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1
> 
>   
> Actual results:
> # valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1
>                                                                    

==8062==  Address 0x52e78a8 is 280 bytes inside a block of size 312 free'd
==8062==    at 0x4A0595D: free (vg_replace_malloc.c:366)
==8062==    by 0x4C83508: virFree (memory.c:309)
==8062==    by 0x4C94E7C: virObjectUnref (virobject.c:145)
==8062==    by 0x4D13FE7: virConnectClose (libvirt.c:1458)
==8062==    by 0x40C868: vshDeinit (virsh.c:2571)
==8062==    by 0x41069E: main (virsh.c:3009)

Hm, I don't see these errors with both libvirt and libvirt-client are 0.10.2-3

Comment 4 Alex Jia 2012-10-16 11:32:41 UTC

(In reply to comment #3)
> 
> Hm, I don't see these errors with both libvirt and libvirt-client are
> 0.10.2-3

I just confirmed this, it's not always reproducible, you may ran valgrind serveral times and then I beleive you will see previous memory errors.

Comment 5 Osier Yang 2012-10-24 10:57:55 UTC

Even tried 50 times, I can't see the error.

Comment 6 Alex Jia 2012-10-24 12:56:25 UTC

(In reply to comment #5)
> Even tried 50 times, I can't see the error.

what's libvirt version you are trying? libvirt-0.10.2-3.el6.x86_64?

BTW, the libvirt-0.10.2-5.el6.x86_64 is fine for me.

Comment 7 Osier Yang 2012-10-24 13:20:57 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > Even tried 50 times, I can't see the error.
> 
> what's libvirt version you are trying? libvirt-0.10.2-3.el6.x86_64?
> 
> BTW, the libvirt-0.10.2-5.el6.x86_64 is fine for me.

I tried with the top of rhel git. And to be sure, I tried again with installing
0.10.2-3 (both libvirt and libvirt-client), can't reproduce either, am I too
unlucky, or you are too lucky?

Comment 8 Alex Jia 2012-10-24 14:34:34 UTC

I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always reproducible, I restart libvirtd service then try 4 times to reproduce it:


==27860== 1 errors in context 1 of 9:
==27860== Invalid write of size 4
==27860==    at 0x38BAA0A5E5: pthread_mutex_unlock (pthread_mutex_unlock.c:46)
==27860==    by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688)
==27860==    by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859)
==27860==    by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485)
==27860==    by 0x4C7B486: virEventRunDefaultImpl (event.c:247)
==27860==    by 0x40CD11: vshEventLoop (virsh.c:2115)
==27860==    by 0x4C8CD48: virThreadHelper (threads-pthread.c:161)
==27860==    by 0x38BAA077F0: start_thread (pthread_create.c:301)
==27860==    by 0x33F68E570C: clone (clone.S:115)
==27860==  Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd
==27860==    at 0x4A0595D: free (vg_replace_malloc.c:366)
==27860==    by 0x4C83508: virFree (memory.c:309)
==27860==    by 0x4C94E7C: virObjectUnref (virobject.c:145)
==27860==    by 0x4D13FE7: virConnectClose (libvirt.c:1458)
==27860==    by 0x40C868: vshDeinit (virsh.c:2571)
==27860==    by 0x41069E: main (virsh.c:3009)

Comment 9 Alex Jia 2012-10-24 14:41:00 UTC

(In reply to comment #8)
> I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always
> reproducible, I restart libvirtd service then try 4 times to reproduce it:
> 

I also reproduced on libvirt-0.10.2-5.el6.x86_64, a easy to reproduce method is to restart libvirtd service then run several times valgrind.

Comment 10 Alex Jia 2012-11-01 09:47:18 UTC

Created attachment 636527 [details]
valgrind report

To reproduce it on libvirt-0.10.2-6.el6.x86_64.

Comment 13 Daniel Berrangé 2012-12-04 10:42:51 UTC

(In reply to comment #8)
> I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always
> reproducible, I restart libvirtd service then try 4 times to reproduce it:
> 
> 
> ==27860== 1 errors in context 1 of 9:
> ==27860== Invalid write of size 4
> ==27860==    at 0x38BAA0A5E5: pthread_mutex_unlock
> (pthread_mutex_unlock.c:46)
> ==27860==    by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688)
> ==27860==    by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859)
> ==27860==    by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485)
> ==27860==    by 0x4C7B486: virEventRunDefaultImpl (event.c:247)
> ==27860==    by 0x40CD11: vshEventLoop (virsh.c:2115)
> ==27860==    by 0x4C8CD48: virThreadHelper (threads-pthread.c:161)
> ==27860==    by 0x38BAA077F0: start_thread (pthread_create.c:301)
> ==27860==    by 0x33F68E570C: clone (clone.S:115)
> ==27860==  Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd
> ==27860==    at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==27860==    by 0x4C83508: virFree (memory.c:309)
> ==27860==    by 0x4C94E7C: virObjectUnref (virobject.c:145)
> ==27860==    by 0x4D13FE7: virConnectClose (libvirt.c:1458)
> ==27860==    by 0x40C868: vshDeinit (virsh.c:2571)
> ==27860==    by 0x41069E: main (virsh.c:3009)

Thanks to compiler optization, this stack trace is somewhat incomplete and misleading.

The pthread_mutex_unlock call in the first stack trace is actually being invoked by 'remoteClientCloseFunc', which is in turn called from 'virNetClientCloseLocked'. The mutex is the virConnectPtr mutex.

The problem is that this is happening after the last reference to virConnectPtr has been released.

The following ought to solve it

diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c
index 072d347..96bf0d8 100644
--- a/src/remote/remote_driver.c
+++ b/src/remote/remote_driver.c
@@ -1001,6 +1001,10 @@ doRemoteClose(virConnectPtr conn, struct private_data *priv)
 
     virObjectUnref(priv->tls);
     priv->tls = NULL;
+    virNetClientSetCloseCallback(priv->client,
+                                 NULL,
+                                 NULL,
+                                 NULL);
     virNetClientClose(priv->client);
     virObjectUnref(priv->client);
     priv->client = NULL;


Though I have not verified, since I'm unable to reproduce the race condition. I don't believe this is at all specific to the 'node-memory-tune' command in virsh - it ought to happen on pretty much any virsh command.

Comment 14 Osier Yang 2012-12-04 16:30:59 UTC

(In reply to comment #13)
> (In reply to comment #8)
> > I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always
> > reproducible, I restart libvirtd service then try 4 times to reproduce it:
> > 
> > 
> > ==27860== 1 errors in context 1 of 9:
> > ==27860== Invalid write of size 4
> > ==27860==    at 0x38BAA0A5E5: pthread_mutex_unlock
> > (pthread_mutex_unlock.c:46)
> > ==27860==    by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688)
> > ==27860==    by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859)
> > ==27860==    by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485)
> > ==27860==    by 0x4C7B486: virEventRunDefaultImpl (event.c:247)
> > ==27860==    by 0x40CD11: vshEventLoop (virsh.c:2115)
> > ==27860==    by 0x4C8CD48: virThreadHelper (threads-pthread.c:161)
> > ==27860==    by 0x38BAA077F0: start_thread (pthread_create.c:301)
> > ==27860==    by 0x33F68E570C: clone (clone.S:115)
> > ==27860==  Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd
> > ==27860==    at 0x4A0595D: free (vg_replace_malloc.c:366)
> > ==27860==    by 0x4C83508: virFree (memory.c:309)
> > ==27860==    by 0x4C94E7C: virObjectUnref (virobject.c:145)
> > ==27860==    by 0x4D13FE7: virConnectClose (libvirt.c:1458)
> > ==27860==    by 0x40C868: vshDeinit (virsh.c:2571)
> > ==27860==    by 0x41069E: main (virsh.c:3009)
> 
> Thanks to compiler optization, this stack trace is somewhat incomplete and
> misleading.
> 
> The pthread_mutex_unlock call in the first stack trace is actually being
> invoked by 'remoteClientCloseFunc', which is in turn called from
> 'virNetClientCloseLocked'. The mutex is the virConnectPtr mutex.

Right, I tracked to here.

> 
> The problem is that this is happening after the last reference to
> virConnectPtr has been released.
> 
> The following ought to solve it
> 
> diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c
> index 072d347..96bf0d8 100644
> --- a/src/remote/remote_driver.c
> +++ b/src/remote/remote_driver.c
> @@ -1001,6 +1001,10 @@ doRemoteClose(virConnectPtr conn, struct private_data
> *priv)
>  
>      virObjectUnref(priv->tls);
>      priv->tls = NULL;
> +    virNetClientSetCloseCallback(priv->client,
> +                                 NULL,
> +                                 NULL,
> +                                 NULL);

Thanks, Daniel, I believe this fixes the problem, but I think it doesn't
fixes the root cause. I.E. Should we mutex the whole connection object
when disposing it? Assuming the event thread get the lock before/during
doRemoteClose, and do some tricks, doRemoteClose might drop into
similar situation then too.


>      virNetClientClose(priv->client);
>      virObjectUnref(priv->client);
>      priv->client = NULL;
> 
> 
> Though I have not verified, since I'm unable to reproduce the race
> condition. I don't believe this is at all specific to the 'node-memory-tune'
> command in virsh - it ought to happen on pretty much any virsh command.

Right, the bug title is misleading.

Comment 15 Alex Jia 2012-12-05 02:31:10 UTC

(In reply to comment #13)
> Though I have not verified, since I'm unable to reproduce the race
> condition. I don't believe this is at all specific to the 'node-memory-tune'
> command in virsh - it ought to happen on pretty much any virsh command.

Yes, I think so, however, as it happens, I met the issue when I ran virsh node-memory-tune command.

Comment 16 Osier Yang 2012-12-05 15:11:51 UTC

Patches posted upstream:

https://www.redhat.com/archives/libvir-list/2012-December/msg00165.html

One is Dainel's method, one is to lock the whole virConnect object.

Comment 17 Osier Yang 2012-12-05 16:16:54 UTC

(In reply to comment #16)
> Patches posted upstream:
> 
> https://www.redhat.com/archives/libvir-list/2012-December/msg00165.html

Please ignore this wrong link, the right one is:

https://www.redhat.com/archives/libvir-list/2012-December/msg00219.html

> 
> One is Dainel's method, one is to lock the whole virConnect object.

Comment 18 Osier Yang 2012-12-05 16:48:33 UTC

(In reply to comment #17)
> (In reply to comment #16)
> > Patches posted upstream:
> > 
> > https://www.redhat.com/archives/libvir-list/2012-December/msg00165.html
> 
> Please ignore this wrong link, the right one is:
> 
> https://www.redhat.com/archives/libvir-list/2012-December/msg00219.html
> 
> > 
> > One is Dainel's method, one is to lock the whole virConnect object.

Committed into upstream. Commit ID: ce5a1f6f9

Comment 22 EricLee 2012-12-14 08:26:42 UTC

Verified it with libvirt-0.10.2-12.el6:

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

Mnually run following command about 10 times:
# valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1

Can not see ERROR like:
==27860== 1 errors in context 1 of 9:
==27860== Invalid write of size 4
==27860==    at 0x38BAA0A5E5: pthread_mutex_unlock (pthread_mutex_unlock.c:46)
==27860==    by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688)
==27860==    by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859)
==27860==    by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485)
==27860==    by 0x4C7B486: virEventRunDefaultImpl (event.c:247)
==27860==    by 0x40CD11: vshEventLoop (virsh.c:2115)
==27860==    by 0x4C8CD48: virThreadHelper (threads-pthread.c:161)
==27860==    by 0x38BAA077F0: start_thread (pthread_create.c:301)
==27860==    by 0x33F68E570C: clone (clone.S:115)
==27860==  Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd
==27860==    at 0x4A0595D: free (vg_replace_malloc.c:366)
==27860==    by 0x4C83508: virFree (memory.c:309)
==27860==    by 0x4C94E7C: virObjectUnref (virobject.c:145)
==27860==    by 0x4D13FE7: virConnectClose (libvirt.c:1458)
==27860==    by 0x40C868: vshDeinit (virsh.c:2571)
==27860==    by 0x41069E: main (virsh.c:3009)

So moving to VERIFIED.

Comment 23 errata-xmlrpc 2013-02-21 07:10:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html

Note You need to log in before you can comment on or make changes to this bug.