Bug 866524
Summary: | use-after-free on virsh node-memory-tune | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Alex Jia <ajia> | ||||||
Component: | libvirt | Assignee: | Osier Yang <jyang> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.5 | CC: | acathrow, berrange, bili, dallan, dyasny, dyuan, eblake, mzhan, pkrempa | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-0.10.2-12.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-02-21 07:10:07 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 888457 | ||||||||
Attachments: |
|
Description
Alex Jia
2012-10-15 14:44:52 UTC
Created attachment 627492 [details]
valgrind report
(In reply to comment #0) > Description of problem: > Valgrind complains use-after-free memory error when passed a invalid number > to virsh node-memory-tune. > > Version-Release number of selected component (if applicable): > # rpm -q libvirt > libvirt-0.10.2-3.el6.x86_64 > > How reproducible: > always > > Steps to Reproduce: > # valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1 > > > Actual results: > # valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1 > ==8062== Address 0x52e78a8 is 280 bytes inside a block of size 312 free'd ==8062== at 0x4A0595D: free (vg_replace_malloc.c:366) ==8062== by 0x4C83508: virFree (memory.c:309) ==8062== by 0x4C94E7C: virObjectUnref (virobject.c:145) ==8062== by 0x4D13FE7: virConnectClose (libvirt.c:1458) ==8062== by 0x40C868: vshDeinit (virsh.c:2571) ==8062== by 0x41069E: main (virsh.c:3009) Hm, I don't see these errors with both libvirt and libvirt-client are 0.10.2-3 (In reply to comment #3) > > Hm, I don't see these errors with both libvirt and libvirt-client are > 0.10.2-3 I just confirmed this, it's not always reproducible, you may ran valgrind serveral times and then I beleive you will see previous memory errors. Even tried 50 times, I can't see the error. (In reply to comment #5) > Even tried 50 times, I can't see the error. what's libvirt version you are trying? libvirt-0.10.2-3.el6.x86_64? BTW, the libvirt-0.10.2-5.el6.x86_64 is fine for me. (In reply to comment #6) > (In reply to comment #5) > > Even tried 50 times, I can't see the error. > > what's libvirt version you are trying? libvirt-0.10.2-3.el6.x86_64? > > BTW, the libvirt-0.10.2-5.el6.x86_64 is fine for me. I tried with the top of rhel git. And to be sure, I tried again with installing 0.10.2-3 (both libvirt and libvirt-client), can't reproduce either, am I too unlucky, or you are too lucky? I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always reproducible, I restart libvirtd service then try 4 times to reproduce it: ==27860== 1 errors in context 1 of 9: ==27860== Invalid write of size 4 ==27860== at 0x38BAA0A5E5: pthread_mutex_unlock (pthread_mutex_unlock.c:46) ==27860== by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688) ==27860== by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859) ==27860== by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485) ==27860== by 0x4C7B486: virEventRunDefaultImpl (event.c:247) ==27860== by 0x40CD11: vshEventLoop (virsh.c:2115) ==27860== by 0x4C8CD48: virThreadHelper (threads-pthread.c:161) ==27860== by 0x38BAA077F0: start_thread (pthread_create.c:301) ==27860== by 0x33F68E570C: clone (clone.S:115) ==27860== Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd ==27860== at 0x4A0595D: free (vg_replace_malloc.c:366) ==27860== by 0x4C83508: virFree (memory.c:309) ==27860== by 0x4C94E7C: virObjectUnref (virobject.c:145) ==27860== by 0x4D13FE7: virConnectClose (libvirt.c:1458) ==27860== by 0x40C868: vshDeinit (virsh.c:2571) ==27860== by 0x41069E: main (virsh.c:3009) (In reply to comment #8) > I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always > reproducible, I restart libvirtd service then try 4 times to reproduce it: > I also reproduced on libvirt-0.10.2-5.el6.x86_64, a easy to reproduce method is to restart libvirtd service then run several times valgrind. Created attachment 636527 [details]
valgrind report
To reproduce it on libvirt-0.10.2-6.el6.x86_64.
(In reply to comment #8) > I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always > reproducible, I restart libvirtd service then try 4 times to reproduce it: > > > ==27860== 1 errors in context 1 of 9: > ==27860== Invalid write of size 4 > ==27860== at 0x38BAA0A5E5: pthread_mutex_unlock > (pthread_mutex_unlock.c:46) > ==27860== by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688) > ==27860== by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859) > ==27860== by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485) > ==27860== by 0x4C7B486: virEventRunDefaultImpl (event.c:247) > ==27860== by 0x40CD11: vshEventLoop (virsh.c:2115) > ==27860== by 0x4C8CD48: virThreadHelper (threads-pthread.c:161) > ==27860== by 0x38BAA077F0: start_thread (pthread_create.c:301) > ==27860== by 0x33F68E570C: clone (clone.S:115) > ==27860== Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd > ==27860== at 0x4A0595D: free (vg_replace_malloc.c:366) > ==27860== by 0x4C83508: virFree (memory.c:309) > ==27860== by 0x4C94E7C: virObjectUnref (virobject.c:145) > ==27860== by 0x4D13FE7: virConnectClose (libvirt.c:1458) > ==27860== by 0x40C868: vshDeinit (virsh.c:2571) > ==27860== by 0x41069E: main (virsh.c:3009) Thanks to compiler optization, this stack trace is somewhat incomplete and misleading. The pthread_mutex_unlock call in the first stack trace is actually being invoked by 'remoteClientCloseFunc', which is in turn called from 'virNetClientCloseLocked'. The mutex is the virConnectPtr mutex. The problem is that this is happening after the last reference to virConnectPtr has been released. The following ought to solve it diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 072d347..96bf0d8 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -1001,6 +1001,10 @@ doRemoteClose(virConnectPtr conn, struct private_data *priv) virObjectUnref(priv->tls); priv->tls = NULL; + virNetClientSetCloseCallback(priv->client, + NULL, + NULL, + NULL); virNetClientClose(priv->client); virObjectUnref(priv->client); priv->client = NULL; Though I have not verified, since I'm unable to reproduce the race condition. I don't believe this is at all specific to the 'node-memory-tune' command in virsh - it ought to happen on pretty much any virsh command. (In reply to comment #13) > (In reply to comment #8) > > I reproduced it on libvirt-0.10.2-3.el6.x86_64 again, but it isn't always > > reproducible, I restart libvirtd service then try 4 times to reproduce it: > > > > > > ==27860== 1 errors in context 1 of 9: > > ==27860== Invalid write of size 4 > > ==27860== at 0x38BAA0A5E5: pthread_mutex_unlock > > (pthread_mutex_unlock.c:46) > > ==27860== by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688) > > ==27860== by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859) > > ==27860== by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485) > > ==27860== by 0x4C7B486: virEventRunDefaultImpl (event.c:247) > > ==27860== by 0x40CD11: vshEventLoop (virsh.c:2115) > > ==27860== by 0x4C8CD48: virThreadHelper (threads-pthread.c:161) > > ==27860== by 0x38BAA077F0: start_thread (pthread_create.c:301) > > ==27860== by 0x33F68E570C: clone (clone.S:115) > > ==27860== Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd > > ==27860== at 0x4A0595D: free (vg_replace_malloc.c:366) > > ==27860== by 0x4C83508: virFree (memory.c:309) > > ==27860== by 0x4C94E7C: virObjectUnref (virobject.c:145) > > ==27860== by 0x4D13FE7: virConnectClose (libvirt.c:1458) > > ==27860== by 0x40C868: vshDeinit (virsh.c:2571) > > ==27860== by 0x41069E: main (virsh.c:3009) > > Thanks to compiler optization, this stack trace is somewhat incomplete and > misleading. > > The pthread_mutex_unlock call in the first stack trace is actually being > invoked by 'remoteClientCloseFunc', which is in turn called from > 'virNetClientCloseLocked'. The mutex is the virConnectPtr mutex. Right, I tracked to here. > > The problem is that this is happening after the last reference to > virConnectPtr has been released. > > The following ought to solve it > > diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c > index 072d347..96bf0d8 100644 > --- a/src/remote/remote_driver.c > +++ b/src/remote/remote_driver.c > @@ -1001,6 +1001,10 @@ doRemoteClose(virConnectPtr conn, struct private_data > *priv) > > virObjectUnref(priv->tls); > priv->tls = NULL; > + virNetClientSetCloseCallback(priv->client, > + NULL, > + NULL, > + NULL); Thanks, Daniel, I believe this fixes the problem, but I think it doesn't fixes the root cause. I.E. Should we mutex the whole connection object when disposing it? Assuming the event thread get the lock before/during doRemoteClose, and do some tricks, doRemoteClose might drop into similar situation then too. > virNetClientClose(priv->client); > virObjectUnref(priv->client); > priv->client = NULL; > > > Though I have not verified, since I'm unable to reproduce the race > condition. I don't believe this is at all specific to the 'node-memory-tune' > command in virsh - it ought to happen on pretty much any virsh command. Right, the bug title is misleading. (In reply to comment #13) > Though I have not verified, since I'm unable to reproduce the race > condition. I don't believe this is at all specific to the 'node-memory-tune' > command in virsh - it ought to happen on pretty much any virsh command. Yes, I think so, however, as it happens, I met the issue when I ran virsh node-memory-tune command. Patches posted upstream: https://www.redhat.com/archives/libvir-list/2012-December/msg00165.html One is Dainel's method, one is to lock the whole virConnect object. (In reply to comment #16) > Patches posted upstream: > > https://www.redhat.com/archives/libvir-list/2012-December/msg00165.html Please ignore this wrong link, the right one is: https://www.redhat.com/archives/libvir-list/2012-December/msg00219.html > > One is Dainel's method, one is to lock the whole virConnect object. (In reply to comment #17) > (In reply to comment #16) > > Patches posted upstream: > > > > https://www.redhat.com/archives/libvir-list/2012-December/msg00165.html > > Please ignore this wrong link, the right one is: > > https://www.redhat.com/archives/libvir-list/2012-December/msg00219.html > > > > > One is Dainel's method, one is to lock the whole virConnect object. Committed into upstream. Commit ID: ce5a1f6f9 Verified it with libvirt-0.10.2-12.el6: # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] Mnually run following command about 10 times: # valgrind -v virsh node-memory-tune --shm-sleep-millisecs -1 Can not see ERROR like: ==27860== 1 errors in context 1 of 9: ==27860== Invalid write of size 4 ==27860== at 0x38BAA0A5E5: pthread_mutex_unlock (pthread_mutex_unlock.c:46) ==27860== by 0x4D604B2: virNetClientCloseLocked (virnetclient.c:688) ==27860== by 0x4D62998: virNetClientIncomingEvent (virnetclient.c:1859) ==27860== by 0x4C7C6EE: virEventPollRunOnce (event_poll.c:485) ==27860== by 0x4C7B486: virEventRunDefaultImpl (event.c:247) ==27860== by 0x40CD11: vshEventLoop (virsh.c:2115) ==27860== by 0x4C8CD48: virThreadHelper (threads-pthread.c:161) ==27860== by 0x38BAA077F0: start_thread (pthread_create.c:301) ==27860== by 0x33F68E570C: clone (clone.S:115) ==27860== Address 0x52e5bb8 is 152 bytes inside a block of size 312 free'd ==27860== at 0x4A0595D: free (vg_replace_malloc.c:366) ==27860== by 0x4C83508: virFree (memory.c:309) ==27860== by 0x4C94E7C: virObjectUnref (virobject.c:145) ==27860== by 0x4D13FE7: virConnectClose (libvirt.c:1458) ==27860== by 0x40C868: vshDeinit (virsh.c:2571) ==27860== by 0x41069E: main (virsh.c:3009) So moving to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html |