| Summary: | virsh crashes at remoteStreamEventTimer when libvirtd restarts | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | motohiro.kanda.nx |
| Component: | libvirt | Assignee: | Laine Stump <laine> |
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.2 | CC: | crobinso, dallan, dyuan, mzhan, rwu, xen-maint |
| Target Milestone: | rc | ||
| Target Release: | 6.2 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-09-16 15:08:18 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
On remote/remote_driver.c remoteStreamEventTimer line 4145, the callback remoteStreamSend got send error and called remoteStreamRelease, which deallocated st->privateData. After that, remoteStreamEventTimer touched there. See Bug 716393 of fdstream.c. It looks that re-evaluating st->privateData after callback invocation will help in both cases. st will not be gone since you have the refcount. Much simpler than introducing closed flag to privateData and delay freeing, isnt'it? Another bug is, remoteStreamSend and others call remoteStreamRelease in case of I/O errors, which clears st->driver. Afterward, when the caller such as virConsoleShutdown calls virStreamEventRemoveCallback, it fails because it cannot find the driver. error : virStreamEventRemoveCallback:12988 : this function is not supported by the connection driver: So, remoteStreamEventRemoveCallback and virEventRemoveTimeout are not called for the stream. We have a stale entry in eventLoop.timeouts. One more. console.c virConsoleShutdown must set con->st to null, otherwise it is free'ed twice at line 364, end of vshRunConsole. There has been a lot of changes to this area of the code since this bug was found (See Bug 716393 and Bug 731673 for example). When I attempted to reproduce this problem with libvirt-0.9.4-11.el6, I couldn't cause any crash. The current behavior if libvirtd is restarted while a console is open in virsh - typing any key in the console window will just produce a virsh# prompt. At that point no matter what command you type at the prompt, you will receive this type of message: error: failed to get domain 'mydom' error: Cannot write data: Broken pipe error: Reconnected to the hypervisor After this, you can type another command (eg "console mydom") and it will execute correctly. I did this repeatedly, with the same results every time, and no crashes. Since this problem was never present in an official released build, but was only in interim test builds, I'm directly closing it. |
Description of problem: virsh crashes at remoteStreamEventTimer when libvirtd restarts while it has open console to a qemu guest. It looks eventLoop.timeouts[0] in virEventPollRunOnce has virStreamPtr, which has been deallocated due to stream I/O error. Version-Release number of selected component (if applicable): 0.9.2 How reproducible: Always Steps to Reproduce: 1. virsh # start domain1 --console and leave it open. 2. kill -TERM libvirtd and restart. 3. Type any key in virsh. You will get error: Reconnected to the hypervisor 4. Retry connecting to the console. virsh # console domain1 Actual results: virsh dumps a core. Expected results: Additional info: some mudflap messages and gdb session: ******* mudflap violation 1249 (check/read): time=1309140788.997410 ptr=0xc29260 size=96 pc=0x7f389d5dc7d9 location=`remote/remote_driver.c:4149:13 (remoteStreamEventTimer)' /usr/lib/libmudflapth.so.0(__mf_check+0x59) [0x7f389d5dc7d9] /home/kanda/lib/libvirt.so.0 [0x7f389f38de28] /home/kanda/lib/libvirt.so.0 [0x7f389f1ac1ff] Nearby object 1: checked region begins 1120B before and ends 1025B before mudflap object 0xc9b3c0: name=`malloc region' Nearby object 2: checked region begins 1065B after and ends 1160B after mudflap object 0xc5d830: name=`calloc region' bounds=[0xc28e10,0xc28e37] size=40 area=heap-init check=26r/1w liveness=27 alloc time=1309140750.591263 pc=0x7f389d5dbe69 thread=2680108992 /usr/lib/libmudflapth.so.0(__mf_register+0x59) [0x7f389d5dbe69] /usr/lib/libmudflapth.so.0(__wrap_calloc+0x11c) [0x7f389d5ddc6c] /home/kanda/lib/libvirt.so.0(virAlloc+0x23) [0x7f389f1c2ea3] /home/kanda/lib/libvirt.so.0(virGetStream+0x72) [0x7f389f302292] Nearby object 3: checked region begins 0B into and ends 95B into mudflap dead object 0xc25a90: name=`calloc region' number of nearby objects: 3 ... ******* mudflap violation 1298 (check/read): time=1309140797.630967 ptr=0xc28e10 size=16 pc=0x7f389d5dc7d9 location=`remote/remote_driver.c:4124:12 (remoteStreamEventTimer)' /usr/lib/libmudflapth.so.0(__mf_check+0x59) [0x7f389d5dc7d9] /home/kanda/lib/libvirt.so.0 [0x7f389f38de0d] /home/kanda/lib/libvirt.so.0 [0x7f389f1ac1ff] Nearby object 1: checked region begins 1925B after and ends 1940B after mudflap object 0xc42a60: name=`malloc region' bounds=[0xc28660,0xc2868b] size=44 area=heap check=0r/0w liveness=0 alloc time=1309140788.989845 pc=0x7f389d5dbe69 thread=2680108992 /usr/lib/libmudflapth.so.0(__mf_register+0x59) [0x7f389d5dbe69] /usr/lib/libmudflapth.so.0(__real_malloc+0xfe) [0x7f389d5dd68e] /lib/libc.so.6 [0x7f389c8f230a] /lib/libc.so.6 [0x7f389c8f25bd] Nearby object 2: checked region begins 2224B before and ends 2209B before mudflap object 0xc9b3c0: name=`malloc region' Nearby object 3: checked region begins 20B after and ends 35B after mudflap dead object 0xd69ae0: name=`malloc region' bounds=[0xc28de0,0xc28dfc] size=29 area=heap check=0r/0w liveness=0 alloc time=1309140797.627510 pc=0x7f389d5dbe69 thread=2680108992 /usr/lib/libmudflapth.so.0(__mf_register+0x59) [0x7f389d5dbe69] /usr/lib/libmudflapth.so.0(__real_malloc+0xfe) [0x7f389d5dd68e] /lib/libc.so.6(vasprintf+0x150) [0x7f389c92ff80] /home/kanda/lib/libvirt.so.0(virVasprintf+0x1e) [0x7f389f1e976e] dealloc time=1309140797.627755 pc=0x7f389d5db92c thread=2680108992 /usr/lib/libmudflapth.so.0(__mf_unregister+0x4c) [0x7f389d5db92c] /usr/lib/libmudflapth.so.0(free+0xc6) [0x7f389d5dde46] /home/kanda/lib/libvirt.so.0(virFree+0xc2) [0x7f389f1c2b82] /home/kanda/lib/libvirt.so.0(virLogMessage+0x34d) [0x7f389f1c074d] number of nearby objects: 3 ******* mudflap violation 1299 (check/read): time=1309140797.631177 ptr=0x4 size=80 pc=0x7f389d5dc7d9 location=`remote/remote_driver.c:4124:12 (remoteStreamEventTimer)' /usr/lib/libmudflapth.so.0(__mf_check+0x59) [0x7f389d5dc7d9] /home/kanda/lib/libvirt.so.0 [0x7f389f38ddf1] /home/kanda/lib/libvirt.so.0 [0x7f389f1ac1ff] Nearby object 1: checked region begins 4B after and ends 83B after mudflap object 0x9832b0: name=`NULL' bounds=[(nil),(nil)] size=1 area=no-access check=0r/0w liveness=0 alloc time=1309140722.365692 pc=0x7f389d5dbe69 thread=2680108992 number of nearby objects: 1 Segmentation fault (core dumped) ... Core was generated by `bin/virsh -c qemu:///session'. Program terminated with signal 11, Segmentation fault. [New process 4971] #0 remoteStreamEventTimer (timer=1, opaque=0xc28e10) at remote/remote_driver.c:4124 4124 struct private_data *priv = st->conn->privateData; (gdb) bt #0 remoteStreamEventTimer (timer=1, opaque=0xc28e10) at remote/remote_driver.c:4124 #1 0x00007f389f1ac1ff in virEventPollRunOnce () at util/event_poll.c:433 #2 0x00007f389f1a9bb5 in virEventRunDefaultImpl () at util/event.c:191 #3 0x0000000000409ce5 in vshRunConsole (dom=0xc6c4a0, devname=0x0) at console.c:354 #4 0x000000000042c713 in cmdRunConsole (ctl=0x7fff6b4ecd80, dom=0xc6c4a0, name=0x0) at virsh.c:789 #5 0x000000000042caae in cmdConsole (ctl=0x7fff6b4ecd80, cmd=0xc88fc0) at virsh.c:815 #6 0x000000000042725f in vshCommandRun (ctl=0x7fff6b4ecd80, cmd=0xc88fc0) at virsh.c:12140 #7 0x000000000043a3aa in main (argc=<value optimized out>, argv=<value optimized out>) at virsh.c:13497 (gdb) l 4119 4120 static void 4121 remoteStreamEventTimer(int timer ATTRIBUTE_UNUSED, void *opaque) 4122 { 4123 virStreamPtr st = opaque; 4124 struct private_data *priv = st->conn->privateData; 4125 struct private_stream_data *privst = st->privateData; 4126 int events = 0; 4127 4128 remoteDriverLock(priv); (gdb) p *(virStreamPtr)0xc28e10 $2 = {magic = 13215456, conn = 0x4, refs = 112, flags = 0, driver = 0x30, privateData = 0xc67e00} (gdb) x/8gx 0xc28e10 0xc28e10: 0x0000000000c9a6e0 0x0000000000000004 0xc28e20: 0x0000000000000070 0x0000000000000030 0xc28e30: 0x0000000000c67e00 0x0000000000c8bdb0 0xc28e40: 0x0000000000000000 0x0000000000c4f190 (gdb) frame 1 #1 0x00007f389f1ac1ff in virEventPollRunOnce () at util/event_poll.c:433 433 (cb)(timer, opaque); (gdb) info locals fds = (struct pollfd *) 0xc256d0 ret = 0 timeout = 0 nfds = 3 __func__ = "virEventPollRunOnce" __FUNCTION__ = "virEventPollRunOnce" (gdb) p eventLoop.timeouts[0] $1 = {timer = 2, frequency = 0, expiresAt = 1309140797630, cb = 0x7f389f38d820 <remoteStreamEventTimer>, ff = 0x7f389f38f510 <remoteStreamEventTimerFree>, opaque = 0xc28e10, deleted = 0} (gdb) p eventLoop.timeouts[1] $2 = {timer = 3, frequency = -1, expiresAt = 0, cb = 0x7f389f38f730 <remoteDomainEventQueueFlush>, ff = 0, opaque = 0xc7f070, deleted = 0} part of libvirt debug log free'ing this stream: 11:13:08.991: 4971: error : remoteIOWriteBuffer:4920 : cannot send data: Broken pipe 11:13:08.992: 4971: debug : remoteIOEventLoop:5745 : Giving up the buck due to I/O error 201 0xd952a0 (nil) 11:13:08.992: 4971: debug : virStreamEventRemoveCallback:12969 : stream=0xc28e10 11:13:08.993: 4971: error : virStreamEventRemoveCallback:12988 : this function is not supported by the connection driver: virStreamEventRemoveCallback 11:13:08.994: 4971: debug : virStreamFree:13095 : stream=0xc28e10 11:13:08.994: 4971: debug : virUnrefStream:1124 : unref stream 0xc28e10 2 11:13:08.994: 4971: debug : virEventPollRemoveHandle:171 : Remove handle w=3 11:13:08.995: 4971: debug : virEventPollRemoveHandle:184 : mark delete 1 0 11:13:08.995: 4971: debug : virEventPollInterruptLocked:686 : Skip interrupt, 1 -1614858304 11:13:08.995: 4971: debug : virEventPollRemoveHandle:171 : Remove handle w=4 11:13:08.996: 4971: debug : virEventPollRemoveHandle:184 : mark delete 2 1 11:13:08.996: 4971: debug : virEventPollInterruptLocked:686 : Skip interrupt, 1 -1614858304 11:13:08.997: 4971: debug : virEventPollCleanupTimeouts:498 : Cleanup 2 11:13:08.998: 4971: debug : virEventPollCleanupHandles:545 : Cleanup 3 11:13:08.998: 4971: debug : virStreamFree:13095 : stream=0xc28e10 11:13:08.998: 4971: debug : virUnrefStream:1124 : unref stream 0xc28e10 1 11:13:08.999: 4971: debug : virReleaseStream:1104 : release dev 0xc28e10 11:13:08.999: 4971: debug : virReleaseStream:1109 : unref connection 0xc7f070 3 11:13:08.999: 4971: debug : virDomainFree:2117 : dom=0xc6c4a0, (VM: name=kanda2, uuid=4b597033-4e6e-a5f6-e4a6-69f1121ac754), 11:13:09.000: 4971: debug : virUnrefDomain:276 : unref domain 0xc6c4a0 kanda2 1 11:13:09.000: 4971: debug : virReleaseDomain:238 : release domain 0xc6c4a0 kanda2 4b597033-4e6e-a5f6-e4a6-69f1121ac754 11:13:09.001: 4971: debug : virReleaseDomain:246 : unref connection 0xc7f070 2 11:13:14.401: 4971: debug : virDomainLookupByName:2029 : conn=0xc7f070, name=kanda2 11:13:14.403: 4971: debug : remoteIO:5798 : Do proc=23 serial=8 length=40 wait=(nil) 11:13:14.403: 4971: debug : remoteIO:5870 : We have the buck 23 0xdd6ad0 0xdd6ad0 11:13:14.404: 4971: error : remoteIOWriteBuffer:4920 : cannot send data: Broken pipe 11:13:14.404: 4971: debug : remoteIOEventLoop:5745 : Giving up the buck due to I/O error 23 0xdd6ad0 (nil) 11:13:14.405: 4971: debug : virConnectClose:1353 : conn=0xc7f070 11:13:14.406: 4971: debug : virUnrefConnect:145 : unref connection 0xc7f070 1 11:13:14.406: 4971: debug : virReleaseConnect:94 : release connection 0xc7f070 11:13:14.406: 4971: debug : virEventPollRemoveTimeout:278 : Remove timer 1 11:13:14.407: 4971: debug : virEventPollInterruptLocked:686 : Skip interrupt, 0 -1614858304 11:13:14.407: 4971: debug : virEventPollRemoveHandle:171 : Remove handle w=-1 11:13:14.407: 4971: warning : virEventPollRemoveHandle:174 : Ignoring invalid remove watch -1 11:13:14.408: 4971: debug : remoteIO:5798 : Do proc=2 serial=9 length=28 wait=(nil) 11:13:14.409: 4971: debug : remoteIO:5870 : We have the buck 2 0xdd6520 0xdd6520 11:13:14.409: 4971: error : remoteIOWriteBuffer:4920 : cannot send data: Broken pipe 11:13:14.410: 4971: debug : remoteIOEventLoop:5745 : Giving up the buck due to I/O error 2 0xdd6520 (nil) 11:13:14.410: 4971: debug : virConnectOpenAuth:1327 : name=qemu:///session, auth=0x7f389faf7a20, flags=0 11:13:14.411: 4971: debug : do_open:1062 : name "qemu:///session" to URI components: