Bug 1253375 - crash in main_channel_client_is_network_info_initialized when playing youtube video spanning more monitors
crash in main_channel_client_is_network_info_initialized when playing youtube...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: spice (Show other bugs)
7.2
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Frediano Ziglio
SPICE QE bug list
: ZStream
: 1253276 (view as bug list)
Depends On:
Blocks: 1361946 1364356
  Show dependency treegraph
 
Reported: 2015-08-13 11:04 EDT by David Jaša
Modified: 2016-11-03 23:42 EDT (History)
13 users (show)

See Also:
Fixed In Version: spice-0.12.4-17.el7
Doc Type: Bug Fix
Doc Text:
Due to a race condition, using the SPICE console to view video from certain sources, such as YouTube pages, and stretching the video over multiple monitors caused SPICE to terminate unexpectedly. This update ensures that thread-safe reference counting is used in the described scenario, which avoids the race condition and thus prevents the crashes from occurring.
Story Points: ---
Clone Of:
: 1361946 1364356 (view as bug list)
Environment:
Last Closed: 2016-11-03 23:42:53 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
full backtrace (3.88 KB, text/plain)
2015-08-13 11:04 EDT, David Jaša
no flags Details
Fix thread safety for red_client_unref (678 bytes, patch)
2016-04-12 08:31 EDT, Frediano Ziglio
no flags Details | Diff

  None (edit)
Description David Jaša 2015-08-13 11:04:30 EDT
Created attachment 1062681 [details]
full backtrace

Description of problem:
crash in main_channel_client_is_network_info_initialized when playing youtube video spanning more monitors
related bugs:
  * likely Fedora duplicate: bug 1014034
  * also happens with the same steps: bug 1253276
  * was trying to reproduce: bug 1235442 (-3)

Version-Release number of selected component (if applicable):
spice-server-0.12.4-13.el7.x86_64
qemu-kvm-1.5.3-100.el7.x86_64

How reproducible:
frequent

Steps to Reproduce:
1. connect to a multi-monitor Windows VM (in my case: 4 large monitors in a matrix)
2. play a youtube video in Firefox
3. position Firefox window so that the video is played on > 1 monitors (all 4 when placed to the "crosshair" in my case)
4. if nothing happens, try to resize the Firefox window

Actual results:
during 3. or 4., I'm getting this crash or bug 1253276.

Expected results:
video keeps playing

Additional info:
  * the video is quite sluggish, sound is choppy
  * backtrace is:
Thread 17 (Thread 0x7f198c5ff700 (LWP 19518)):
#0  0x00007f1a010ef652 in main_channel_client_is_network_info_initialized (mcc=mcc@entry=0x1000) at main_channel.c:1165
#1  0x00007f1a01102b59 in red_stream_get_initial_bit_rate (stream=0x7f1a0a29cb70, stream=0x7f1a0a29cb70, dcc=0x7f1a0e8f0000) at red_worker.c:2975
#2  red_display_create_stream (dcc=dcc@entry=0x7f1a0e8f0000, stream=stream@entry=0x7f1a0a29cb70) at red_worker.c:3102
#3  0x00007f1a0111213d in red_create_stream (drawable=<optimized out>, worker=0x7f1a0a0c4000) at red_worker.c:3179
#4  red_stream_add_frame (worker=0x7f1a0a0c4000, frame_drawable=<optimized out>, frames_count=<optimized out>, gradual_frames_count=<optimized out>, 
    last_gradual_frame=<optimized out>) at red_worker.c:3460
#5  0x00007f1a01115ecc in red_current_add_equal (other=0x7f1a0a2740d0, item=0x7f1a0a275b60, worker=0x7f1a0a0c4000) at red_worker.c:3535
#6  red_current_add (drawable=0x7f1a0a275b38, ring=<optimized out>, worker=0x7f1a0a0c4000) at red_worker.c:3716
#7  red_current_add_qxl (red_drawable=<optimized out>, drawable=0x7f1a0a275b38, ring=<optimized out>, worker=0x7f1a0a0c4000) at red_worker.c:3936
#8  red_process_drawable (group_id=<optimized out>, drawable=<optimized out>, worker=0x7f1a0a0c4000) at red_worker.c:4286
---Type <return> to continue, or q <return> to quit---
#9  red_process_commands (worker=worker@entry=0x7f1a0a0c4000, ring_is_empty=ring_is_empty@entry=0x7f198c5feaf0, max_pipe_size=50)
    at red_worker.c:5221
#10 0x00007f1a0111e33a in red_worker_main (arg=<optimized out>) at red_worker.c:12272
#11 0x00007f1a04394dc5 in start_thread (arg=0x7f198c5ff700) at pthread_create.c:308
#12 0x00007f1a004421bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Comment 1 Frediano Ziglio 2016-02-17 11:14:57 EST
Looking at backtrace

#0  0x00007f1a010ef652 in main_channel_client_is_network_info_initialized (mcc=mcc@entry=0x1000) at main_channel.c:1165
No locals.
#1  0x00007f1a01102b59 in red_stream_get_initial_bit_rate (stream=0x7f1a0a29cb70, stream=0x7f1a0a29cb70, dcc=0x7f1a0e8f0000) at red_worker.c:2975
        mcc = 0x1000

looks like mcc (MainChannelClient pointer) is corrupted and set to 0x1000. This pointer is readed from dcc->client->mcc. dcc seems a valid pointer so looks like the dcc->client (RedClient* pointer) structure is corrupted. Possibly a use-after-free issue. Unfortunately from the backtrace is not possible to understand if dcc->client was corrupted but was possible to read mcc field from it.

Bug 1014034 is a good duplicate (crash in the same path).
Comment 2 Jonathon Jongsma 2016-02-23 15:38:01 EST
I did manage to reproduce this, but it didn't happen right away. I had a window spanning 4 monitors ("in the crosshairs" as described by djasa), but had to watch the video for several minutes and resize the window quite a few times before it crashed.
Comment 3 Frediano Ziglio 2016-02-24 01:53:23 EST
I would try using address sanitizer compiled in.
Comment 4 Frediano Ziglio 2016-02-24 07:16:58 EST
Have you tried disabling streaming from Qemu or turning it to all instead of filter?
Comment 5 Jonathon Jongsma 2016-03-03 14:45:33 EST
By the way, here's a trace I encountered. mcc is NULL in this one. Still need to investigate more.


#0  0x00007f0161abf622 in main_channel_client_is_network_info_initialized (mcc=mcc@entry=0x0)
    at main_channel.c:1165
No locals.
#1  0x00007f0161ad2d59 in red_stream_get_initial_bit_rate (stream=0x7f016adacb70, 
    stream=0x7f016adacb70, dcc=0x7f016b378000) at red_worker.c:2974
        mcc = 0x0
        net_test_bit_rate = <optimized out>
        env_bit_rate_str = <optimized out>
        bit_rate = <optimized out>
#2  red_display_create_stream (dcc=dcc@entry=0x7f016b378000, stream=stream@entry=0x7f016adacb70)
    at red_worker.c:3101
        mjpeg_cbs = {get_roundtrip_ms = 0x7f0161ad1190 <red_stream_mjpeg_encoder_get_roundtrip>, 
          get_source_fps = 0x7f0161ad0a00 <red_stream_mjpeg_encoder_get_source_fps>, 
          update_client_playback_delay = 
    0x7f0161ad1f00 <red_stream_update_client_playback_latency>}
        agent = 0x7f016b3bc720
        __FUNCTION__ = "red_display_create_stream"
#3  0x00007f0161ae184d in red_create_stream (drawable=<optimized out>, worker=0x7f016abd4000)
    at red_worker.c:3178
        dcc_ring_item = 0x7f016b378000
        stream = 0x7f016adacb70
        dcc = 0x7f016b378000
        next = 0x0
        src_rect = <optimized out>
        bitmap = <optimized out>
#4  red_stream_add_frame (worker=0x7f016abd4000, frame_drawable=<optimized out>, 
    frames_count=<optimized out>, gradual_frames_count=<optimized out>, 
    last_gradual_frame=<optimized out>) at red_worker.c:3459
No locals.
#5  0x00007f0161ae63ef in red_current_add_equal (other=0x7f016ada4c40, item=0x7f016ada6b80, 
    worker=0x7f016abd4000) at red_worker.c:3534
        add_after = 0
        other_draw_item = 0x7f016ada4c40
        drawable = 0x7f016ada6b58
        other_drawable = 0x7f016ada4c18
#6  red_current_add (drawable=0x7f016ada6b58, ring=<optimized out>, worker=0x7f016abd4000)
    at red_worker.c:3715
        sibling = 0x7f016ada4c40
        test_res = 4
        exclude_base = <optimized out>
        item = 0x7f016ada6b80
        now = 0x7f016ada4c40
        exclude_rgn = {extents = {x1 = 0, y1 = 0, x2 = 0, y2 = 0}, 
          data = 0x7f0161475880 <pixman_region32_empty_data_>}
#7  red_current_add_qxl (red_drawable=<optimized out>, drawable=0x7f016ada6b58, 
    ring=<optimized out>, worker=0x7f016abd4000) at red_worker.c:3935
No locals.
#8  red_process_drawable (group_id=<optimized out>, drawable=<optimized out>, 
    worker=0x7f016abd4000) at red_worker.c:4285
        surface_id = <optimized out>
#9  red_process_commands (worker=worker@entry=0x7f016abd4000, 
    ring_is_empty=ring_is_empty@entry=0x7f0040ffeb30, max_pipe_size=50) at red_worker.c:5230
        red_drawable = <optimized out>
        ext_cmd = {cmd = {data = 72057594056609344, type = 1, padding = 0}, group_id = 1, 
          flags = 0}
        n = 0
        start = <optimized out>
#10 0x00007f0161aee61a in red_worker_main (arg=<optimized out>) at red_worker.c:12282
        ring_is_empty = 0
        i = <optimized out>
        num_events = 0
        timers_queue_timeout = <optimized out>
        worker = <optimized out>
        __FUNCTION__ = "red_worker_main"
#11 0x00007f0164d64dc5 in start_thread (arg=0x7f0040fff700) at pthread_create.c:308
        __res = <optimized out>
        pd = 0x7f0040fff700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139639067244288, -5296964069404309032, 0, 
                139639067244992, 139639067244288, 139644061251560, 5223584861979544024, 
                5224086038422227416}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
#12 0x00007f0160e1328d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
No locals.
Comment 6 Jonathon Jongsma 2016-03-28 11:53:20 EDT
By the way, I had very little luck running qemu/spice-server after building with address sanitizer enabled. It ran very very slowly (see bug 1281849) and if I remember correctly, it aborted at some point for no easily-identifiable reason. 

In addition, I've been trying for several days to reproduce this in various ways (including running normally, running under valgrind, etc) and have had no luck at all. 

One thing that I've noticed is that the thread ID of the thread that processes the qxl commands is different than the thread ID stored in RedClient::thread_id. Since red_client_get_main() reads RedClient::mcc without locking the client mutex, this looked suspicious. But RedClient::mcc is assigned when the main channel is connected and is never modified after that, so it seems unlikely to be corrupted due to thread-safety issues.
Comment 7 Jonathon Jongsma 2016-03-30 09:56:19 EDT
For reference, I've been getting the following crash frequently when trying to debug this issue (which is probably bug 1253276 as mentioned in the initial bug report above). This may have also been the "abort" I referred to in comment #6 when running with address sanitizer enabled.

(gdb) bt
#0  0x00007ffff6867e8d in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /lib64/libtcmalloc.so.4
#1  0x00007ffff686813a in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /lib64/libtcmalloc.so.4
#2  0x00007ffff68681ed in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#3  0x00007ffff686b245 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /lib64/libtcmalloc.so.4
#4  0x00007ffff6879d6b in tc_malloc () from /lib64/libtcmalloc.so.4
#5  0x0000555555711c19 in malloc_and_trace (n_bytes=128) at vl.c:2743
#6  0x00007ffff635a25f in g_malloc (n_bytes=n_bytes@entry=128) at gmem.c:97
#7  0x00005555558209c7 in qemu_iovec_init (qiov=qiov@entry=0x555556f59dc0, alloc_hint=8) at util/iov.c:265
#8  0x00005555555f7eab in qcow2_co_readv (bs=0x555556e4a000, sector_num=2338908, remaining_sectors=64, qiov=0x555557514f98) at block/qcow2.c:974
#9  0x00005555555de663 in bdrv_aligned_preadv (bs=bs@entry=0x555556e4a000, req=req@entry=0x555556f59f00, offset=offset@entry=1197520896, bytes=bytes@entry=32768, align=align@entry=512, 
    qiov=qiov@entry=0x555557514f98, flags=flags@entry=0) at block.c:2860
#10 0x00005555555debee in bdrv_co_do_preadv (bs=0x555556e4a000, offset=1197520896, bytes=32768, qiov=0x555557514f98, flags=(unknown: 0)) at block.c:2955
#11 0x00005555555e086c in bdrv_co_do_readv (flags=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=<optimized out>) at block.c:2977
#12 bdrv_co_do_rw (opaque=0x555556cf3900) at block.c:4687
#13 0x000055555561a20a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:118
#14 0x00007ffff10f4110 in ?? () from /lib64/libc.so.6
#15 0x00007fffffffb400 in ?? ()
#16 0x0000000000000000 in ?? ()
Comment 8 Frediano Ziglio 2016-03-30 11:31:19 EDT
Perhaps would be worth trying to compile Qemu with the --disable-tcmalloc flags which disable tcmalloc usage. Maybe some tools like sanitizer/valgrind are not that comfortable with a no standard (glibc) memory allocator.
Comment 9 Jonathon Jongsma 2016-04-05 17:30:45 EDT
Nope, I think that crash is unrelated. I believe this is the one I encountered before when trying to run with the address sanitizer enabled:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe2da6700 (LWP 30985)]
__memset_sse2 () at ../sysdeps/x86_64/memset.S:907
907		movntdq %xmm0,(%rdi)
(gdb) bt
#0  __memset_sse2 () at ../sysdeps/x86_64/memset.S:907
#1  0x00007ffff4e5e58e in __asan_handle_no_return () from /lib64/libasan.so.0
#2  0x00007ffff4e55ee0 in siglongjmp () from /lib64/libasan.so.0
#3  0x000055555561a1b5 in qemu_coroutine_switch (from_=0x60240001ea40, to_=to_@entry=0x60260001fe68, 
    action=action@entry=COROUTINE_YIELD) at coroutine-ucontext.c:208
#4  0x00005555556da62e in coroutine_swap (from=<optimized out>, to=0x60260001fe68) at qemu-coroutine.c:72
#5  0x00005555555daac9 in bdrv_co_io_em (bs=0x607200009100, sector_num=640, nb_sectors=1, iov=0x7fffe409c5c0, 
    is_write=<optimized out>) at block.c:4847
#6  0x00005555555de6f8 in bdrv_aligned_preadv (bs=bs@entry=0x607200009100, req=req@entry=0x7fffe409c4b0, 
    offset=offset@entry=327680, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffe409c5c0, 
    flags=flags@entry=0) at block.c:2875
#7  0x00005555555debee in bdrv_co_do_preadv (bs=0x607200009100, offset=327680, bytes=512, qiov=qiov@entry=0x7fffe409c5c0, 
    flags=flags@entry=(unknown: 0)) at block.c:2955
#8  0x00005555555dec58 in bdrv_co_do_readv (flags=(unknown: 0), qiov=0x7fffe409c5c0, nb_sectors=<optimized out>, 
    sector_num=<optimized out>, bs=<optimized out>) at block.c:2977
#9  bdrv_co_readv (bs=<optimized out>, sector_num=<optimized out>, nb_sectors=<optimized out>, qiov=qiov@entry=0x7fffe409c5c0)
    at block.c:2986
#10 0x00005555555f813c in qcow2_co_readv (bs=0x60720000b500, sector_num=0, remaining_sectors=1, qiov=0x60700000eb18)
    at block/qcow2.c:1063
#11 0x00005555555de663 in bdrv_aligned_preadv (bs=bs@entry=0x60720000b500, req=req@entry=0x7fffe409c700, offset=offset@entry=0, 
    bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x60700000eb18, flags=flags@entry=0) at block.c:2860
#12 0x00005555555debee in bdrv_co_do_preadv (bs=0x60720000b500, offset=0, bytes=512, qiov=0x60700000eb18, flags=(unknown: 0))
    at block.c:2955
#13 0x00005555555e086c in bdrv_co_do_readv (flags=<optimized out>, qiov=<optimized out>, nb_sectors=<optimized out>, 
    sector_num=<optimized out>, bs=<optimized out>) at block.c:2977
#14 bdrv_co_do_rw (opaque=0x60520005b410) at block.c:4687
#15 0x000055555561a20a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:118
#16 0x00007fffee00e110 in ?? () from /lib64/libc.so.6
#17 0x00007fffffffb3d0 in ?? ()
#18 0x0000000000000000 in ?? ()


Important to note that ASan complains at startup which may explain the above crash:

"WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!"
Comment 10 Jonathon Jongsma 2016-04-05 17:36:00 EDT
By the way, the crash mentioned in comment #9 does occur under the same circumstances as the original report (multiple displays open, video spanning several displays being resized, etc)

When I run the same command under valgrind (but with ASan disabled), I have a different problem. Valgrind reports a variety of minor issues, but then valgrind itself seems to crash while the guest is still booting:

.... [bunch of stuff snipped] ...

red_dispatcher_set_cursor_peer: 
red_dispatcher_set_cursor_peer: 
red_dispatcher_set_cursor_peer: 
==30868== Thread 3:
==30868== Syscall param write(buf) points to uninitialised byte(s)
==30868==    at 0x78A41CD: ??? (syscall-template.S:81)
==30868==    by 0xAA65406: write_safe (dispatcher.c:103)
==30868==    by 0xAA656A8: dispatcher_send_message (dispatcher.c:182)
==30868==    by 0xAA65A9B: red_dispatcher_add_memslot (red_dispatcher.c:386)
==30868==    by 0x2EB529: qxl_add_memslot (qxl.c:1289)
==30868==    by 0x2ED434: ioport_write (qxl.c:1632)
==30868==    by 0x319882: access_with_adjusted_size (memory.c:365)
==30868==    by 0x31AABE: memory_region_iorange_write (memory.c:440)
==30868==    by 0x318941: kvm_handle_io (kvm-all.c:1517)
==30868==    by 0x318941: kvm_cpu_exec (kvm-all.c:1674)
==30868==    by 0x2CB484: qemu_kvm_cpu_thread_fn (cpus.c:802)
==30868==    by 0x789DDC4: start_thread (pthread_create.c:308)
==30868==    by 0xB82728C: clone (clone.S:113)
==30868==  Address 0x1761c95c is on thread 3's stack
==30868==  in frame #3, created by red_dispatcher_add_memslot (red_dispatcher.c:382)
==30868== 
==30868== Syscall param write(buf) points to uninitialised byte(s)
==30868==    at 0x78A41CD: ??? (syscall-template.S:81)
==30868==    by 0xAA65406: write_safe (dispatcher.c:103)
==30868==    by 0xAA656A8: dispatcher_send_message (dispatcher.c:182)
==30868==    by 0xAA673AA: red_dispatcher_add_memslot_async (red_dispatcher.c:403)
==30868==    by 0xAA673AA: spice_qxl_add_memslot_async (red_dispatcher.c:965)
==30868==    by 0x2EB529: qxl_add_memslot (qxl.c:1289)
==30868==    by 0x2ED434: ioport_write (qxl.c:1632)
==30868==    by 0x319882: access_with_adjusted_size (memory.c:365)
==30868==    by 0x31AABE: memory_region_iorange_write (memory.c:440)
==30868==    by 0x318941: kvm_handle_io (kvm-all.c:1517)
==30868==    by 0x318941: kvm_cpu_exec (kvm-all.c:1674)
==30868==    by 0x2CB484: qemu_kvm_cpu_thread_fn (cpus.c:802)
==30868==    by 0x789DDC4: start_thread (pthread_create.c:308)
==30868==    by 0xB82728C: clone (clone.S:113)
==30868==  Address 0x1761c954 is on thread 3's stack
==30868==  in frame #3, created by spice_qxl_add_memslot_async (red_dispatcher.c:964)
==30868== 
==30868== Syscall param write(buf) points to uninitialised byte(s)
==30868==    at 0x78A41CD: ??? (syscall-template.S:81)
==30868==    by 0xAA65406: write_safe (dispatcher.c:103)
==30868==    by 0xAA656A8: dispatcher_send_message (dispatcher.c:182)
==30868==    by 0xAA65EC0: red_dispatcher_create_primary_surface_async (red_dispatcher.c:518)
==30868==    by 0x2EC618: qxl_create_guest_primary (qxl.c:1409)
==30868==    by 0x2ED493: ioport_write (qxl.c:1648)
==30868==    by 0x319882: access_with_adjusted_size (memory.c:365)
==30868==    by 0x31AABE: memory_region_iorange_write (memory.c:440)
==30868==    by 0x318941: kvm_handle_io (kvm-all.c:1517)
==30868==    by 0x318941: kvm_cpu_exec (kvm-all.c:1674)
==30868==    by 0x2CB484: qemu_kvm_cpu_thread_fn (cpus.c:802)
==30868==    by 0x789DDC4: start_thread (pthread_create.c:308)
==30868==    by 0xB82728C: clone (clone.S:113)
==30868==  Address 0x1761c96c is on thread 3's stack
==30868==  in frame #3, created by red_dispatcher_create_primary_surface_async (red_dispatcher.c:510)
==30868== 
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
--30868-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--30868-- si_code=1;  Faulting address: 0xBDB60C45;  sp: 0x802d9dce0

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==30868==    at 0x380741EB: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
==30868==    by 0x3807439C: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
==30868==    by 0x3805A777: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
==30868==    by 0x3802C89F: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
==30868==    by 0x3802C957: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
==30868==    by 0x380AF2D5: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)
==30868==    by 0x380BDD9C: ??? (in /usr/lib64/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
Comment 11 Frediano Ziglio 2016-04-12 08:31 EDT
Created attachment 1146441 [details]
Fix thread safety for red_client_unref

The bug is caused by a no thread safe function causing occasionally double free so corruptions.
See attached patch.
Previously I cannot reach 20 seconds of video with this patch is playing a video since 10 minutes.
Comment 12 Frediano Ziglio 2016-04-12 08:51:53 EDT
Patch posted to public mailing list https://lists.freedesktop.org/archives/spice-devel/2016-April/028173.html
Comment 13 Frediano Ziglio 2016-04-12 09:19:48 EDT
Steps I used to reproduce (in case may be helpful)
- transferred a RHEL 7 machine from virtual to real
  - qemu-nbd -P 2 rhel-7.qcow2 -c /dev/nbd0
  - pvscan --cache /dev/nbd0
  - vgchange -ay rhel
  - lvcreate a new LV
  - dd from nbd to physical
  - qemu-nbd -d /dev/nbd0
- launched a windows virtual machine in order to get old qemu command line
- stopped virtual machine
- mounted
- copied windows .qcow file from libvirt to new LV
- prepare directory copied for chroot
  - cd /mnt/rhel
  - mount -t proc proc proc
  - mount --rbind /dev dev
- chroot
  - chroot .
  - exec su -
- set core
  - echo core > /proc/sys/kernel/core_pattern
  - ulimit -c unlimited
- launch vm with parameters similar to previous VM (was in libvirt)
  /usr/libexec/qemu-kvm -machine accel=kvm -name win7 -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 34ef3aa3-6bda-4b85-b75c-f0f0c3f3fe4a -no-user-config -nodefaults -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/win7.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,id=drive-ide0-0-1,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device qxl,id=video1,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x8 -device qxl,id=video2,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x9 -device qxl,id=video3,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0xa -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on -netdev user,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:f5:01:43,bus=pci.0,addr=0x3
  I had to change network to user, remove monitor and some CPU/machine which was not supported (used similar values)
- run Firefox with youtube, got reproduction quite easy
- created a fake tcmalloc
  - touch none.c
  - gcc none.c -shared -o /lib64/libTCmalloc.so.4 -s
- changed qemu-kvm binary
  - open with hexedit command
  - search libtcmalloc
  - replace with libTCmalloc
  - exit and save
- run again the virtual machine

Core obtained had a better stack trace:

#0  0x00007f8a0704c5f7 in raise () from /lib64/libc.so.6
#1  0x00007f8a0704dce8 in abort () from /lib64/libc.so.6
#2  0x00007f8a0708c317 in __libc_message () from /lib64/libc.so.6
#3  0x00007f8a07093fe1 in _int_free () from /lib64/libc.so.6
#4  0x00007f8a07dc2e1e in red_client_unref (client=<optimized out>) at red_channel.c:2072
#5  0x00007f8a07dc4523 in dispatcher_handle_single_read (dispatcher=0x7f8a080c58e0 <main_dispatcher>) at dispatcher.c:139
#6  dispatcher_handle_recv_read (dispatcher=0x7f8a080c58e0 <main_dispatcher>) at dispatcher.c:162
#7  0x0000562192f49ca6 in qemu_iohandler_poll (pollfds=0x562195750a00, ret=24638, ret@entry=1) at iohandler.c:143
#8  0x0000562192f4e208 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:465
#9  0x0000562192e6da60 in main_loop () at vl.c:1989
#10 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4381

from #5 looking at msg content I could see the main_dispatcher_handle_mm_time_latency call.
Comment 14 Frediano Ziglio 2016-04-12 11:01:02 EDT
*** Bug 1253276 has been marked as a duplicate of this bug. ***
Comment 24 errata-xmlrpc 2016-11-03 23:42:53 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2324.html

Note You need to log in before you can comment on or make changes to this bug.