Bug 836913
Summary: | Assertion `c->entries[i].ref == 0' failed, also random segfaults, when qemu calls bdrv_close_all | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | |
Component: | qemu | Assignee: | Fedora Virtualization Maintainers <virt-maint> | |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 17 | CC: | amit.shah, berrange, cfergeau, crobinso, dwmw2, itamar, knoel, kwolf, pbonzini, scottt.tw, virt-maint | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 837943 (view as bug list) | Environment: | ||
Last Closed: | 2013-04-02 13:56:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 837943 |
Description
Richard W.M. Jones
2012-07-02 07:19:55 UTC
Happened again with a slightly different stack trace: *** glibc detected *** /bin/qemu-kvm: corrupted double-linked list: 0x00007fb6ac933c00 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x30e127d6ad)[0x7fb6a4ddb6ad] /lib64/libc.so.6(+0x30e127e078)[0x7fb6a4ddc078] /bin/qemu-kvm(+0x98a95)[0x7fb6aa65fa95] /bin/qemu-kvm(+0x9e2ba)[0x7fb6aa6652ba] /bin/qemu-kvm(+0x8d568)[0x7fb6aa654568] /bin/qemu-kvm(+0x8d630)[0x7fb6aa654630] /bin/qemu-kvm(main+0x1429)[0x7fb6aa63c9b9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb6a4d7fba5] /bin/qemu-kvm(+0x79299)[0x7fb6aa640299] ======= Memory map: ======== Please decode those addresses using addr2line or gdb. I was trying all morning to capture a core file, and now I finally reproduced the bug and got one: Program terminated with signal 6, Aborted. #0 0x00007fe8cd51bc85 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 63 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fe8cd51bc85 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 #1 0x00007fe8cd51d438 in __GI_abort () at abort.c:90 #2 0x00007fe8cd55b92b in __libc_message (do_abort=2, fmt=fmt@entry= 0x7fe8cd662fd0 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:197 #3 0x00007fe8cd5626ad in malloc_printerr (ptr=0x7fe8d5b04a80, str= 0x7fe8cd65fee7 "corrupted double-linked list", action=<optimized out>) at malloc.c:4969 #4 malloc_consolidate (av=av@entry=0x7fe8cd89d720) at malloc.c:4143 #5 0x00007fe8cd563078 in _int_free (av=0x7fe8cd89d720, p=0x7fe8d53d09f0, have_lock=0) at malloc.c:4043 #6 0x00007fe8d2de6a95 in qcow2_cache_destroy (bs=bs@entry=0x7fe8d53cea90, c= 0x7fe8d53d0790) at block/qcow2-cache.c:70 #7 0x00007fe8d2dec2ba in qcow2_close (bs=0x7fe8d53cea90) at block/qcow2.c:628 #8 0x00007fe8d2ddb568 in bdrv_close (bs=0x7fe8d53cea90) at block.c:709 #9 0x00007fe8d2ddb630 in bdrv_close_all () at block.c:732 #10 0x00007fe8d2dc39b9 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /usr/src/debug/qemu-kvm-1.0/vl.c:3529 (gdb) info threads Id Target Id Frame 15 Thread 0x7fe8c525c700 (LWP 11782) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 14 Thread 0x7fe8951f7700 (LWP 11791) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 13 Thread 0x7fe8981fd700 (LWP 11785) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 12 Thread 0x7fe8979fc700 (LWP 11786) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 11 Thread 0x7fe8949f6700 (LWP 11792) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 10 Thread 0x7fe8971fb700 (LWP 11787) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 9 Thread 0x7fe8c6942700 (LWP 11780) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 8 Thread 0x7fe8c4a5b700 (LWP 11783) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 7 Thread 0x7fe8969fa700 (LWP 11788) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 6 Thread 0x7fe8961f9700 (LWP 11789) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 5 Thread 0x7fe88ffff700 (LWP 11793) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 4 Thread 0x7fe8989fe700 (LWP 11784) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 3 Thread 0x7fe8959f8700 (LWP 11790) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 2 Thread 0x7fe8c5f40700 (LWP 11781) 0x00007fe8cd5d2aa7 in ioctl () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 0x7fe8d2d0fa00 (LWP 11777) 0x00007fe8cd51bc85 in __GI_raise ( sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 Interestingly, another core dump file shows a slightly different stack trace, although it's all in the same area of the code ... Program terminated with signal 6, Aborted. #0 0x00007fb95005dc85 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 63 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fb95005dc85 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 #1 0x00007fb95005f438 in __GI_abort () at abort.c:90 #2 0x00007fb950056a52 in __assert_fail_base (fmt= 0x7fb9501a4468 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fb955b373af "c->entries[i].ref == 0", file=file@entry=0x7fb955b3739b "block/qcow2-cache.c", line=line@entry=69, function=function@entry=0x7fb955b373e0 "qcow2_cache_destroy") at assert.c:92 #3 0x00007fb950056b02 in __GI___assert_fail (assertion=assertion@entry= 0x7fb955b373af "c->entries[i].ref == 0", file=file@entry= 0x7fb955b3739b "block/qcow2-cache.c", line=line@entry=69, function=function@entry=0x7fb955b373e0 "qcow2_cache_destroy") at assert.c:101 #4 0x00007fb955928ae7 in qcow2_cache_destroy (bs=bs@entry=0x7fb957cf7a90, c= 0x7fb957cf9790) at block/qcow2-cache.c:69 #5 0x00007fb95592e2ba in qcow2_close (bs=0x7fb957cf7a90) at block/qcow2.c:628 #6 0x00007fb95591d568 in bdrv_close (bs=0x7fb957cf7a90) at block.c:709 #7 0x00007fb95591d630 in bdrv_close_all () at block.c:732 #8 0x00007fb9559059b9 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /usr/src/debug/qemu-kvm-1.0/vl.c:3529 (gdb) info threads Id Target Id Frame 18 Thread 0x7fb941317700 (LWP 5751) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 17 Thread 0x7fb942319700 (LWP 5749) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 16 Thread 0x7fb942b1a700 (LWP 5748) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 15 Thread 0x7fb94331b700 (LWP 5747) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 14 Thread 0x7fb91a1f9700 (LWP 5758) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 13 Thread 0x7fb9191f7700 (LWP 5760) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 12 Thread 0x7fb9199f8700 (LWP 5759) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 11 Thread 0x7fb91b9fc700 (LWP 5755) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 10 Thread 0x7fb91c1fd700 (LWP 5754) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 9 Thread 0x7fb940b16700 (LWP 5752) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 8 Thread 0x7fb949484700 (LWP 5744) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 7 Thread 0x7fb91b1fb700 (LWP 5756) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 6 Thread 0x7fb91a9fa700 (LWP 5757) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 5 Thread 0x7fb943b1c700 (LWP 5746) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 4 Thread 0x7fb948a82700 (LWP 5745) 0x00007fb950114aa7 in ioctl () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7fb941b18700 (LWP 5750) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 2 Thread 0x7fb91c9fe700 (LWP 5753) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 * 1 Thread 0x7fb955851a00 (LWP 5741) 0x00007fb95005dc85 in __GI_raise ( sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 The assertion in comment 5 is in this code: int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c) { int i; for (i = 0; i < c->size; i++) { assert(c->entries[i].ref == 0); qemu_vfree(c->entries[i].table); } g_free(c->entries); g_free(c); return 0; } I don't have a value for 'i' but two of the 'entries' have non-zero 'ref': (gdb) print *c $3 = { entries = 0x7f22fd8c07b0, depends = 0x0, size = 16, depends_on_flush = false, writethrough = false } (gdb) print c->entries[0] $4 = { table = 0x7f22fd8c0a00, offset = 262144, dirty = false, cache_hits = 2964, ref = 1 } [...] (gdb) print c->entries[2] $6 = { table = 0x7f22fd8e0e00, offset = 351469568, dirty = false, cache_hits = 33, ref = 1 } The vcpu is still running, so it's unclear why qemu things its safe to start destroying the qcow2 cache. I don't know if this is definitively the problem, but these two lines in vl.c look the wrong way round to me: main_loop(); bdrv_close_all(); << pause_all_vcpus(); << net_cleanup(); res_free(); Possible duplicates: bug 798499 bug 798857 bug 805893 I am now able to reproduce this bug almost at will: $ cd /tmp $ ulimit -Hc unlimited $ ulimit -Sc unlimited $ rm -f test.img $ qemu-img create -f qcow2 test.img 100M $ guestfish -a test.img run : part-disk /dev/sda gpt : copy-device-to-device /dev/zero /dev/sda1 size:98M $ ls core.23671 test.img The core file isn't produced every time, but it's produced pretty often, maybe 1 in 3 runs. Simply repeat the qemu-img + guestfish commands until it happens. To get full debug information about what libguestfs is doing add the -v and/or -x options to guestfish. But basically it appears to happen when we send SIGTERM to qemu. Workaround posted for virt-resize: https://www.redhat.com/archives/libguestfs/2012-July/msg00015.html The underlying issue is fixed in qemu-kvm >= 1.1.0 which is being rolled out to Fedora Rawhide today. The workaround mentioned in comment 9 is not required, and will not go upstream. The solution is to update qemu to >= 1.1.0. *** Bug 837021 has been marked as a duplicate of this bug. *** Reopening and moving this bug to F17. If this is a possible data loss issue, we should try and isolate something to backport (rebasing to 1.1 should be a last resort option IMO) This should be the fix: commit 7094f12f868fc1b75b105bcc676e15964dab77af Author: Kevin Wolf <kwolf> Date: Wed Apr 11 11:06:37 2012 +0200 block: Drain requests in bdrv_close Hmm, that wasn't trivial to backport, particularly because some of the plumbing needed for bdrv_drain_all also needs to be backported, and anything non-trivial in this area scares me. Since F17 isn't long for this world, and we haven't seen reports from endusers, just closing this wontfix. |