Bug 1894045

Summary: Avoid crash due to race in glib event loop code
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Martin Kletzander <mkletzan>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED ERRATA QA Contact: Han Han <hhan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.3CC: amashah, andrew, berrange, elima, fjin, guillaume.pavese, gveitmic, hhan, jdenemar, kanderso, mkalinin, sbonazzo, virt-maint, yafu, yalzhang
Target Milestone: rcKeywords: Regression, ZStream
Target Release: 8.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-6.6.0-8.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1915601 (view as bug list) Environment:
Last Closed: 2021-02-22 15:39:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1915601    

Description Martin Kletzander 2020-11-03 12:38:23 UTC
Description of problem:
There was a possible leak/memory corruption in glib and in order to fix a possible issue like this libvirt added a fix in commit 0db4743645b7a0611a3c0687f834205c9956f7fc which should work around the issue.

How reproducible:
Very very rarely (mostly in eventtest)

Steps to Reproduce:
1. Run tests
2. See eventtest segfault
3. Have mixed feelings

Actual results:
segfault

Expected results:
test pass

Additional info:
This was never seen outside of eventtest, but there are two reasons for backporting commit 0db4743645b7a0611a3c0687f834205c9956f7fc:

 1) sometimes the build fails, so we need to run a build multiple times (which is particularly annoying when the error happens almost at the end)

 2) it is safer to have this patch than not to have it in the long run, even if this does not happen in libvirtd

Comment 5 Han Han 2020-12-15 06:23:53 UTC
Questions from QE:
1. Since the chance to reproduce it is rare, is there any other way to reproduce it instead of the eventtest?
2. This fix is a workaround for the glib before glib-2.63.6(commit https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1353). Will it work on the glib>=2.63.6? Will it be reverted when the minimal required version >= glib-2.63.6?

Comment 6 Martin Kletzander 2020-12-15 22:09:49 UTC
(In reply to Han Han from comment #5)
ad 1. I don't know about any.  The glib issue itself mentions it only being reported couple of times, so I don't expect it to be very reproducible.  

ad 2. It looks like it is scheduled for 2.64.0 and will be backported to 2.62.x as well, but since the fix is not a workaround, just a more safe way of handling things, I do not see a reason for that to be reverted.

I am fine with this going in without much testing (basically unless anything else breaks it's fine), but I do not know if there is a process for it.  Some way to tag this BZ maybe?

Comment 7 Sandro Bonazzola 2021-01-08 07:49:06 UTC
Can we get this out as async as soon as it is verified?

Comment 19 Han Han 2021-01-13 02:38:06 UTC
Get a reproducer here by run unittest:
1. Install the src rpm of libvirt-6.6.0-7
# rpm -i LIVBRIT_6_6_SRCRPM_URL

2. Compile the libvirt-6.6
# rpmbuild -bc -v rpmbuild/SPECS/libvirt.spec
(you may need to disable rbd storage in libvirt.spec and recompile if error happens on rbd)

3. Run eventtest
# cd /root/rpmbuild/BUILD/libvirt-6.6.0/x86_64-redhat-linux-gnu

Run eventtest until it hit an error:
# while true;do ./tests/eventtest; if [ $? -ne 0 ];then break;fi;done


(process:2738769): GLib-CRITICAL **: 20:57:16.306: source_remove_from_context: assertion 'source_list != NULL' failed

# coredumpctl -1                                                     
TIME                            PID   UID   GID SIG COREFILE  EXE
Tue 2021-01-12 20:57:16 EST  2738769     0     0  11 present   /root/rpmbuild/BUILD/libvirt-6.6.0/x86_64-redhat-

Backtrace:
(gdb) bt fu
#0  0x00007f9ca01348b8 in g_source_unref_internal (source=0x55881d6a5090, context=0x55881d6a7af0, have_lock=1)
    at gmain.c:2127
        old_cb_data = 0x55881d6a4910
        old_cb_funcs = 0x55881d664010
        __func__ = "g_source_unref_internal"
#1  0x00007f9ca0134a0e in g_source_iter_next
    (iter=iter@entry=0x7f9c9932a930, source=source@entry=0x7f9c9932a928) at gmain.c:980
        next_source = <optimized out>
#2  0x00007f9ca01373df in g_main_context_check
    (context=context@entry=0x55881d6a7af0, max_priority=200, fds=fds@entry=0x7f9c940024a0, n_fds=n_fds@entry=26)
    at gmain.c:3715
        source = 0x55881d6a5090
        iter = 
          {context = 0x55881d6a7af0, may_modify = 1, current_list = 0x55881d6a4660 = {0x55881d6a4640}, source = 0x55881d6a5090}
        pollrec = <optimized out>
        n_ready = 0
        i = <optimized out>
#3  0x00007f9ca0137a60 in g_main_context_iterate
    (context=context@entry=0x55881d6a7af0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3899
        max_priority = 200
        timeout = 0
        some_ready = <optimized out>
        nfds = 26
        allocated_nfds = 32
        fds = 0x7f9c940024a0
#4  0x00007f9ca0137be0 in g_main_context_iteration (context=0x55881d6a7af0, 
    context@entry=0x0, may_block=may_block@entry=1) at gmain.c:3963
        retval = <optimized out>
#5  0x00007f9ca370e3a4 in virEventGLibRunOnce () at ../../src/util/vireventglib.c:496
#6  0x000055881c416b65 in eventThreadLoop (data=<optimized out>) at ../../tests/eventtest.c:176
#7  0x00007f9c9fcce14a in start_thread (arg=<optimized out>) at pthread_create.c:479
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = 
              {cancel_jmp_buf = {{jmp_buf = {140310561863424, 8770522698823025210, 140727661230654, 140727661230655, 0, 140310561860352, -8751033350172917190, -8751030135613111750}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#8  0x00007f9c9f5e4763 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 20 Han Han 2021-01-13 04:44:22 UTC
Test as comment19 for thousands of loops on libvirt-6.6.0-8.module+el8.3.1+8648+130818f2. No reproduced.

Comment 22 Han Han 2021-02-04 08:29:46 UTC
Covered by unit test

Comment 24 errata-xmlrpc 2021-02-22 15:39:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0639