RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1746239 - remote-viewer segment fault when connect to a being migrated VM
Summary: remote-viewer segment fault when connect to a being migrated VM
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: virt-viewer
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Default Assignee for SPICE Bugs
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-28 04:22 UTC by Han Han
Modified: 2020-08-11 15:03 UTC (History)
13 users (show)

Fixed In Version: virt-viewer-9.0-3.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1867564 (view as bug list)
Environment:
Last Closed: 2020-08-10 08:47:45 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vm xml, full backtrace, verbose log of step4 (8.70 KB, application/gzip)
2019-08-28 04:22 UTC, Han Han
no flags Details
The script for reproducing, the logs (25.68 KB, application/gzip)
2019-08-29 02:34 UTC, Han Han
no flags Details
Remote-view debug msg on SIGSEGV & SIGABRT issues with G_MESSAGES_DEBUG=all (6.97 KB, application/gzip)
2019-08-29 06:56 UTC, Han Han
no flags Details
rr SIGABRT and SIGSEGV trace (896.69 KB, application/gzip)
2019-09-11 05:35 UTC, Han Han
no flags Details
Detailed backtrace of the segment fault (27.57 KB, text/plain)
2020-05-28 10:19 UTC, Han Han
no flags Details

Description Han Han 2019-08-28 04:22:54 UTC
Created attachment 1608817 [details]
vm xml, full backtrace, verbose log of step4

Description of problem:
As subject

Version-Release number of selected component (if applicable):
Migration host:
qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
libvirt-5.6.0-2.module+el8.1.0+4015+63576633.x86_64
spice-server-0.14.2-1.el8.x86_64

Spice client:
virt-viewer-7.0-7.el8.x86_64

How reproducible:
80%

Steps to Reproduce:
1. Prepare two host for migration. Make sure their hostnames can be resolved by themselves and spice client.
2. Start a vm, enabling any client to connect spice, open ports for spice ports and libvirtd ports on firewall
VM xml:
...
    <graphics type='spice' port='5901' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
...

# firewall-cmd --list-ports
5900-6000/tcp 49150-50000/tcp

3. Migrate vm from one to another by following script:
#!/bin/bash
VM=nfs
SRC=10.66.70.111
SRC_SSH="ssh root@$SRC"
DST=10.66.5.225
DST_SSH="ssh root@$DST"
while true;do
    ssh root.70.111 virsh migrate nfs qemu+ssh://root@$DST/system --verbose
    if [ $? -eq 0 ];then
        echo Now vm is on $DST
    fi
    ssh root.5.225 virsh migrate nfs qemu+ssh://root@$SRC/system --verbose
    if [ $? -eq 0 ];then
        echo Now vm is on $SRC
    fi
done

4. When Vm is being migrated from 10.66.5.225, connnect to vm via remote-viewer
# remote-viewer -v --debug spice://10.66.5.225:5900

Then hit ENTER several times. You will meet a segment fault:
[1]    12132 segmentation fault (core dumped)  remote-viewer -v --debug spice://10.66.5.225:5900

Actual results:
As above

Expected results:
No segment fault

Additional info:
backtrace:
(gdb) bt
#0  0x00007f2e5a896d3d in _int_malloc (av=av@entry=0x7f2e5abd0c60 <main_arena>, bytes=bytes@entry=32) at malloc.c:3620
#1  0x00007f2e5a898aba in __GI___libc_malloc (bytes=bytes@entry=32) at malloc.c:3073
#2  0x00007f2e5b1b11d6 in g_malloc (n_bytes=n_bytes@entry=32) at gmem.c:99
#3  0x00007f2e5b18dd7a in g_data_set_internal
    (dataset=0x0, new_destroy_func=0x7f2e5b48b8d0 <g_object_notify_queue_free>, new_data=0x55f5813fdc90, key_id=61, datalist=0x55f58156ec70) at gdataset.c:464
#4  0x00007f2e5b18dd7a in g_datalist_id_set_data_full
    (datalist=datalist@entry=0x55f58156ec70, key_id=61, data=data@entry=0x55f5813fdc90, destroy_func=destroy_func@entry=0x7f2e5b48b8d0 <g_object_notify_queue_free>)
    at gdataset.c:670
#5  0x00007f2e5b48b892 in g_object_notify_queue_freeze (object=0x55f58156ec60 [GObject], conditional=conditional@entry=0) at gobject.c:242
#6  0x00007f2e5b48c487 in g_object_init (object=0x55f58156ec60 [GObject], class=0x55f5815bc740) at gobject.c:999
#7  0x00007f2e5b4aa8ad in g_type_create_instance (type=94512925560688) at gtype.c:1860
#8  0x00007f2e5b48cfc8 in g_object_new_internal (class=class@entry=0x55f5815bc740, params=params@entry=0x7ffe4ebac7c0, n_params=n_params@entry=2) at gobject.c:1799
#9  0x00007f2e5b48ef4e in g_object_new_valist
    (object_type=<optimized out>, first_property_name=first_property_name@entry=0x7f2e5d364d7b "window", var_args=var_args@entry=0x7ffe4ebac910) at gobject.c:2122
#10 0x00007f2e5b48f2ad in g_object_new (object_type=<optimized out>, first_property_name=first_property_name@entry=0x7f2e5d364d7b "window") at gobject.c:1642
#11 0x00007f2e5d2f04b6 in gdk_window_begin_draw_frame (window=window@entry=0x55f5810b94c0 [GdkX11Window], region=region@entry=0x55f58157bfd0) at gdkwindow.c:3244
#12 0x00007f2e5d9366ab in gtk_widget_render (widget=widget@entry=0x55f58133e2b0 [GtkApplicationWindow], window=0x55f5810b94c0 [GdkX11Window], region=0x55f58157bfd0)
    at gtkwidget.c:17523
#13 0x00007f2e5d7d3d61 in gtk_main_do_event (event=0x7ffe4ebacaf0) at gtkmain.c:1838
#14 0x00007f2e5d7d3d61 in gtk_main_do_event (event=<optimized out>) at gtkmain.c:1685
#15 0x00007f2e5d2d3589 in _gdk_event_emit (event=event@entry=0x7ffe4ebacaf0) at gdkevents.c:73
#16 0x00007f2e5d2e41ee in _gdk_window_process_updates_recurse_helper (window=0x55f5810b94c0 [GdkX11Window], expose_region=<optimized out>) at gdkwindow.c:3852
#17 0x00007f2e5d2e53c6 in gdk_window_process_updates_internal (window=0x55f5810b94c0 [GdkX11Window]) at gdkwindow.c:3998
#18 0x00007f2e5d2e557c in gdk_window_process_updates_with_mode (window=<optimized out>, recurse_mode=<optimized out>) at gdkwindow.c:4192
#22 0x00007f2e5b4a4043 in <emit signal ??? on instance 0x55f5810c2200 [GdkFrameClockIdle]>
    (instance=instance@entry=0x55f5810c2200, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3447
    #19 0x00007f2e5b4873bd in g_closure_invoke (closure=0x55f5813d7c20, return_value=0x0, n_param_values=1, param_values=0x7ffe4ebacdc0, invocation_hint=0x7ffe4ebacd40)
    at gclosure.c:804
    #20 0x00007f2e5b49a945 in signal_emit_unlocked_R
    (node=node@entry=0x55f5810c0e10, detail=detail@entry=0, instance=instance@entry=0x55f5810c2200, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7ffe4ebacdc0) at gsignal.c:3635
    #21 0x00007f2e5b4a3a06 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7ffe4ebacf80)
    at gsignal.c:3391
#23 0x00007f2e5d2dc8b3 in _gdk_frame_clock_emit_paint (frame_clock=frame_clock@entry=0x55f5810c2200 [GdkFrameClockIdle]) at gdkframeclock.c:640
#24 0x00007f2e5d2dd0ed in gdk_frame_clock_paint_idle (data=0x55f5810c2200) at gdkframeclockidle.c:459
#25 0x00007f2e5d2c77fc in gdk_threads_dispatch (data=data@entry=0x55f58122b3a0) at gdk.c:743
#26 0x00007f2e5b1ac141 in g_timeout_dispatch (source=0x55f5815b3790, callback=0x7f2e5d2c77d0 <gdk_threads_dispatch>, user_data=0x55f58122b3a0) at gmain.c:4649
#27 0x00007f2e5b1ab67d in g_main_dispatch (context=0x55f58106cb20) at gmain.c:3176
#28 0x00007f2e5b1ab67d in g_main_context_dispatch (context=context@entry=0x55f58106cb20) at gmain.c:3829
#29 0x00007f2e5b1aba48 in g_main_context_iterate (context=context@entry=0x55f58106cb20, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3902
#30 0x00007f2e5b1abae0 in g_main_context_iteration (context=context@entry=0x55f58106cb20, may_block=may_block@entry=1) at gmain.c:3963
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x00007f2e5b77402d in g_application_run (application=0x55f581068300 [RemoteViewer], argc=<optimized out>, argv=0x7ffe4ebad378) at gapplication.c:2470
#32 0x000055f5808042b0 in main (argc=2, argv=0x7ffe4ebad378) at remote-viewer-main.c:42


For the vm xml, full backtrace, verbose log of step4, see the attachment.

Comment 1 Victor Toso 2019-08-28 08:26:41 UTC
> # remote-viewer -v --debug spice://10.66.5.225:5900

1) Running as root is not recommended
2) The interesting debug for spice would come with --spice-debug option, if you could reproduce and attach that would be great!

> Then hit ENTER several times. You will meet a segment fault:

3) Hitting ENTER is a must? You can't reproduce this bug without keyboard input?

4) It would be interesting as well to get the spice and qemu logs from both hosts.

As per my understanding, a client crashing on host migration with a need of user input would not be a blocker for 8.10 so for now I'm setting this to 8.2.0.
If proved important after better understanding, we can request z-stream later.

Comment 2 zhoujunqin 2019-08-28 09:28:54 UTC
Hi hhan,
Please help reply Comment 1 questions, for i cannot reproduce "segmentation fault (core dumped)" on my testing environment.

SRC:
Package version:

libvirt-4.5.0-33.module+el8.1.0+4066+0f1aadab.x86_64
qemu-kvm-2.12.0-85.module+el8.1.0+4066+0f1aadab.x86_64
virt-viewer-7.0-7.el8.x86_64

Run mig.sh as a non-root user on source host
$ cat mig.sh 
#!/bin/bash
VM=mig
SRC=10.73.196.83
SRC_SSH="ssh root@$SRC"
DST=10.66.4.130
DST_SSH="ssh root@$DST"
while true;do
    ssh root.196.83 virsh migrate mig qemu+ssh://root@$DST/system --verbose
    if [ $? -eq 0 ];then
        echo Now vm is on $DST
    fi
    ssh root.4.130 virsh migrate mig qemu+ssh://root@$SRC/system --verbose
    if [ $? -eq 0 ];then
        echo Now vm is on $SRC
    fi
done

DST:
Package version:

libvirt-4.5.0-33.module+el8.1.0+4066+0f1aadab.x86_64
qemu-kvm-2.12.0-85.module+el8.1.0+4066+0f1aadab.x86_64
virt-viewer-7.0-7.el8.x86_64

Run remote-viewer as non-root user on target host:
$remote-viewer -v --debug spice://10.66.4.130:5900

Result:
Migration finishes several rounds between SRC host and DST host, and remote-viewer keeps connection.

BR,
juzhou.

Comment 3 Han Han 2019-08-29 02:14:30 UTC
(In reply to Victor Toso from comment #1)
> > # remote-viewer -v --debug spice://10.66.5.225:5900
> 
> 1) Running as root is not recommended
Common user can reproduce it, too
> 2) The interesting debug for spice would come with --spice-debug option, if
> you could reproduce and attach that would be great!
I will upload that later
> 
> > Then hit ENTER several times. You will meet a segment fault:
> 
> 3) Hitting ENTER is a must? You can't reproduce this bug without keyboard
> input?
No required. It can be reproduced without any keyboard actions
> 
> 4) It would be interesting as well to get the spice and qemu logs from both
> hosts.
What is the spice log? How can I get that?
For the qemu logs, do you mean the log of /var/log/libvirt/qemu/nfs.log ?
> 
> As per my understanding, a client crashing on host migration with a need of
> user input would not be a blocker for 8.10 so for now I'm setting this to
> 8.2.0.
> If proved important after better understanding, we can request z-stream
> later.

Comment 4 Han Han 2019-08-29 02:34:32 UTC
Created attachment 1609178 [details]
The script for reproducing, the logs

1. Another bug:
First of all, another SIGABRT issue can be reproduce by the steps of this bug:
#0  0x00007f0362f708df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f0362f5acf5 in __GI_abort () at abort.c:79
#2  0x00007f0362fb3c17 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f03630c070c "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007f0362fba53c in malloc_printerr (str=str@entry=0x7f03630c27b0 "malloc(): smallbin double linked list corrupted") at malloc.c:5356
#4  0x00007f0362fbd664 in _int_malloc (av=av@entry=0x7f03632f6c60 <main_arena>, bytes=bytes@entry=33) at malloc.c:3656
#5  0x00007f0362fbe1be in _int_realloc (av=av@entry=0x7f03632f6c60 <main_arena>, oldp=oldp@entry=0x563e249d8d70, oldsize=oldsize@entry=32, nb=nb@entry=48) at malloc.c:4612
#6  0x00007f0362fbf4ab in __GI___libc_realloc (oldmem=0x563e249d8d80, bytes=bytes@entry=32) at malloc.c:3238
#7  0x00007f03638d728e in g_realloc (mem=0x563e249d8d80, n_bytes=32) at gmem.c:164
#8  0x00007f03638f3aa7 in g_string_maybe_expand (string=string@entry=0x563e2467ac00, len=len@entry=22) at gstring.c:102
#9  0x00007f03638f4c4e in g_string_append_vprintf (string=0x563e2467ac00, format=<optimized out>, args=args@entry=0x7fff812d5e80) at gstring.c:1146
#10 0x00007f03638f4e58 in g_string_append_printf (string=string@entry=0x563e2467ac00, format=format@entry=0x7f036392531e "(%s:%lu): ") at gstring.c:1246
#11 0x00007f03638d92a9 in g_log_writer_format_fields (log_level=log_level@entry=G_LOG_LEVEL_DEBUG, fields=fields@entry=0x7fff812d6150, n_fields=n_fields@entry=4, use_color=0) at gmessages.c:2313
#12 0x00007f03638d9e56 in g_log_writer_standard_streams (log_level=log_level@entry=G_LOG_LEVEL_DEBUG, fields=fields@entry=0x7fff812d6150, n_fields=n_fields@entry=4, user_data=user_data@entry=0x0)
    at gmessages.c:2609
#13 0x00007f03638d9f56 in g_log_writer_default (log_level=log_level@entry=G_LOG_LEVEL_DEBUG, fields=fields@entry=0x7fff812d6150, n_fields=n_fields@entry=4, user_data=user_data@entry=0x0) at gmessages.c:2713
#14 0x00007f03638d8277 in g_log_structured_array (n_fields=4, fields=0x7fff812d6150, log_level=G_LOG_LEVEL_DEBUG) at gmessages.c:1970
#15 0x00007f03638d8277 in g_log_structured_array (log_level=G_LOG_LEVEL_DEBUG, fields=0x7fff812d6150, n_fields=4) at gmessages.c:1943
#16 0x00007f03638d86a1 in g_log_default_handler
    (log_domain=log_domain@entry=0x563e229d2b19 "virt-viewer", log_level=log_level@entry=G_LOG_LEVEL_DEBUG, message=message@entry=0x563e24a0c440 "app is not in full screen", unused_data=unused_data@entry=0x0)
    at gmessages.c:3158
#17 0x00007f03638d88ef in g_logv (log_domain=0x563e229d2b19 "virt-viewer", log_level=G_LOG_LEVEL_DEBUG, format=<optimized out>, args=args@entry=0x7fff812d6290) at gmessages.c:1370
#18 0x00007f03638d8ae3 in g_log (log_domain=log_domain@entry=0x563e229d2b19 "virt-viewer", log_level=log_level@entry=G_LOG_LEVEL_DEBUG, format=format@entry=0x563e229d7070 "app is not in full screen")
    at gmessages.c:1432
#19 0x0000563e229ccdaa in virt_viewer_session_spice_fullscreen_auto_conf (self=self@entry=0x563e247b9890 [VirtViewerSessionSpice]) at virt-viewer-session-spice.c:1047
#20 0x0000563e229cce46 in uuid_changed (gobject=<optimized out>, pspec=<optimized out>, self=0x563e247b9890 [VirtViewerSessionSpice]) at virt-viewer-session-spice.c:368
#24 0x00007f0363bca043 in <emit signal notify:uuid on instance 0x563e247a9320 [SpiceSession]> (instance=instance@entry=0x563e247a9320, signal_id=<optimized out>, detail=<optimized out>) at gsignal.c:3447
    #21 0x00007f0363bad3bd in g_closure_invoke (closure=0x563e246c4110, return_value=0x0, n_param_values=2, param_values=0x7fff812d6610, invocation_hint=0x7fff812d6590) at gclosure.c:804
    #22 0x00007f0363bc0945 in signal_emit_unlocked_R
    (node=node@entry=0x563e244b5e20, detail=detail@entry=118, instance=instance@entry=0x563e247a9320, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fff812d6610)
    at gsignal.c:3635
    #23 0x00007f0363bc9a06 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fff812d67f0) at gsignal.c:3391
#25 0x00007f0363bb1df4 in g_object_dispatch_properties_changed (object=0x563e247a9320 [SpiceSession], n_pspecs=<optimized out>, pspecs=<optimized out>) at gobject.c:1082
#26 0x00007f0363bb42d1 in g_object_notify_by_spec_internal (pspec=<optimized out>, object=0x563e247a9320 [SpiceSession]) at gobject.c:1175
#27 0x00007f0363bb42d1 in g_object_notify (object=0x563e247a9320 [SpiceSession], property_name=<optimized out>) at gobject.c:1223
#28 0x00007f03648fc284 in notify_main_context (opaque=opaque@entry=0x7f033effd9c0) at gio-coroutine.c:238
#29 0x00007f03638cdfbb in g_idle_dispatch (source=0x563e248a0760, callback=0x7f03648fc270 <notify_main_context>, user_data=0x7f033effd9c0) at gmain.c:5534
#30 0x00007f03638d167d in g_main_dispatch (context=0x563e244c3b40) at gmain.c:3176
#31 0x00007f03638d167d in g_main_context_dispatch (context=context@entry=0x563e244c3b40) at gmain.c:3829
#32 0x00007f03638d1a48 in g_main_context_iterate (context=context@entry=0x563e244c3b40, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3902
#33 0x00007f03638d1ae0 in g_main_context_iteration (context=context@entry=0x563e244c3b40, may_block=may_block@entry=1) at gmain.c:3963
#34 0x00007f0363e9a02d in g_application_run (application=0x563e244c0300 [RemoteViewer], argc=<optimized out>, argv=0x7fff812d6bd8) at gapplication.c:2470
#35 0x0000563e229b52b0 in main (argc=5, argv=0x7fff812d6bd8) at remote-viewer-main.c:42

2. Use the bug reproducing script
2.1 Set ssh pub key for your SRC and DST hosts
2.2 Make all SRC and DST hostnames can be resolved
2.3 Set the SRC and DST variables in migration.sh. Set the EXIT_CODE var in run.sh, to catch SIGABRT issue, set it to $SIGABRT, to catch SIGSEGV issue, set it to $SIGABRT

2.4 Run ./run.sh, wait it exit.
Then you can get the gdb backtrace from coredumpctl, get the remote-viewer debug  msg from viewer.log, get the vm logs from SRC and DST hosts.

See the logs of gdb backtrace, remote-viewer, vm in abrt or segv dir. 

BTW, the gtk and spice version of my remote-viewer desktop:
gtk2-2.24.32-4.el8.x86_64
gtk3-3.22.30-4.el8.x86_64
spice-gtk3-0.37-1.el8.x86_64
spice-glib-0.37-1.el8.x86_64
spice-protocol-0.14.0-1.el8.noarch

Comment 5 Han Han 2019-08-29 02:37:33 UTC
Correct it for 2.3 of comment4:
to catch SIGSEGV issue, set it to $SIGSEGV

Comment 6 Han Han 2019-08-29 06:56:16 UTC
Created attachment 1609277 [details]
Remote-view debug msg on SIGSEGV & SIGABRT issues with G_MESSAGES_DEBUG=all

Comment 7 Victor Toso 2019-08-30 12:46:01 UTC
> -------------------------------COUNT 22---------------------------
> -------------------------------COUNT 10---------------------------

At least on the test cases it is not 100% reproducible. I'm looking at it, I'll try to reproduce as well. Valgrind might give better hints here.

Comment 8 Victor Toso 2019-09-04 12:53:39 UTC
Hi Han, as mentioned in comment #7 would be possible to run this test set with valgrind and attach it to the bug?

Comment 9 Han Han 2019-09-09 09:45:38 UTC
(In reply to Victor Toso from comment #8)
> Hi Han, as mentioned in comment #7 would be possible to run this test set
> with valgrind and attach it to the bug?

Hello, it is hard to reproduce the bug with valgrind, because remote-viewer run by
valgrind is too **slow**.
How about debugging it with rr(https://rr-project.org/). I can reproduce it and record the
procedure. Then you can replay it and debug it.

Comment 10 Victor Toso 2019-09-09 09:52:19 UTC
> Hello, it is hard to reproduce the bug with valgrind, because remote-viewer
> run by valgrind is too **slow**.

This doesn't seem like a leak, maybe using valgrind's massif tool would be quicker with some sane output as well

> How about debugging it with rr(https://rr-project.org/). I can reproduce it
> and record the procedure. Then you can replay it and debug it.

We can try, thanks

Comment 12 Han Han 2019-09-11 05:35:30 UTC
Created attachment 1613919 [details]
rr SIGABRT and SIGSEGV trace

I also upload the rr traces.
If you have a rhel8 host installed with virt-viewer of bug report version and debuginfo pkgs. You can extract the trace files and replay it.
Replay the SIGBART issue:
# rr replay ~/.local/share/rr/remote-viewer-328
Replay the SIGSEGV issue:
# rr replay ~/.local/share/rr/remote-viewer-357

Comment 13 Victor Toso 2019-09-30 09:41:45 UTC
Just to update that I got reproducer, trying to have a fix now.

Comment 14 Victor Toso 2019-12-16 14:43:58 UTC
Not a reliable reproducer + fix yet. Moving.

Comment 17 Victor Toso 2020-05-25 05:31:11 UTC
Many thanks for testing Junqin.
Moving to MODIFIED so we can add to errata.
Feel free to failed-qe in case you can reproduce similar backtrace

Comment 22 Han Han 2020-05-27 01:55:16 UTC
I don't think the scratch build has fixed the bug:
Verison:
The spice* and virt-viewer* in comment15.
spice-protocol-0.14.2-1.el8.noarch
libgovirt-0.3.4-12.el8_2.x86_64x

Steps: Follow bug report

Results:
[root@hp-dl380pg8-05 ~]# coredumpctl 
Tue 2020-05-26 16:24:05 CST   31165     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:24:31 CST   31237     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:24:58 CST   31288     0     0  11 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:25:13 CST   31326     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:25:34 CST   31388     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:25:48 CST   31426     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:26:08 CST   31471     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:26:29 CST   31539     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:27:07 CST   31642     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:27:38 CST   31742     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:28:10 CST   31797     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:32:20 CST   32546     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:42:04 CST   34463     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:42:41 CST   34574     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:43:16 CST   34713     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:44:40 CST   35068     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:45:01 CST   35139     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:45:41 CST   35300     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:46:06 CST   35399     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:47:09 CST   35638     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:47:29 CST   35680     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:49:57 CST   38940     0     0   6 missing   /usr/bin/remote-viewer
Tue 2020-05-26 16:50:29 CST   40263     0     0   6 present   /usr/bin/remote-viewer
Tue 2020-05-26 16:51:29 CST   43023     0     0   6 present   /usr/bin/remote-viewer

SIGSEGV is still reproduced. Unluckily, the coredump is missing. I will reproduce that and upload backtrace later.

Comment 25 Han Han 2020-05-28 10:19:21 UTC
Created attachment 1692994 [details]
Detailed backtrace of the segment fault

Comment 35 Frediano Ziglio 2020-08-07 11:57:44 UTC
I manage to reproduce quite easily with a VM provided by Victor and scripts attached to this bug. However it's hard to get the root cause. The issue seems to be a memory corruption so the backtrace just report the memory is corrupted but the corruption already happened before the backtrace. I'm using remote-viewer from Fedora 31 but I think the error is the same. Unfortunately it seems a race condition so the timing of many instrumentation hide the issue. Tried:
- valgrind. Too slow, problem disappears
- ElectricFence. It seems allocations are too much, out of memory
- modified ElectricFence (with no guard page). Works but too slow, problem disappears
- address sanitizer with spice-gtk master code. Problem disappears. Maybe worth trying with installed version adding sanitizer support to spice-gtk
- heaptrack. Having some issues, I have to try again using a version from upstream. At least I could try to understand old allocations if is a use-after-free
- tcmalloc memory allocator and tools. Same issue but is not helping
- chap (from vmware). In some crashes I'm able to see the current allocations, but I didn't manage to get much more, still looking at some core dumps
- rr-projects. Too slow, problem disappears and my disk get easily full
- gdb looking at core dumps. In some it seems some internal ptmalloc (glibc allocator) structures are corrupted. In one the memory seems to contain some old log strings

I noted on a crash that it happens when there's a migration (as expected).
I also noted that latency is pretty high, average ping (RTT) is about 320ms. Maybe the old connection structures are taking more to get released (Victor is not able to reproduce as easily as me).

Comment 36 Frediano Ziglio 2020-08-07 16:54:37 UTC
I tried to use same code as installed as a base for spice-gtk with address sanitizer. The problem disappears. I tried to just load sanitizer library with unchanged code and the problem seems to disappear also. It's pretty weird. Most of the crashes happens in glibc memory allocator code. It seems that these corruptions affects internal glibc memory allocator data. Probably something related to the layout of this specific allocator.

Comment 37 Frediano Ziglio 2020-08-10 08:41:20 UTC
Finally found. This is due to a double free which is corrupting the memory. Specifically this was already fixed upstream in https://gitlab.com/virt-viewer/virt-viewer/-/commit/a13173ae649412d06106a0d9c6d29e6a45d5bf57.

Comment 38 Frediano Ziglio 2020-08-10 08:47:45 UTC
The patch that fix this issue is included in virt-viewer 9.0 which is included in RHEL 8.3. So I'll mark this as next release.

Comment 39 David Blechter 2020-08-11 11:39:29 UTC
it was fixed in RHEL 8.3, closing as CURRENTRELEASE


Note You need to log in before you can comment on or make changes to this bug.