Bug 885445

Summary: f18/f19/f20 crash in virDBusWatchCallback on i686
Product: [Fedora] Fedora Reporter: Mikhail <mikhail.v.gavrilov>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: berrange, clalancette, collura, crobinso, f.alexander.wilms, guillaumepoiriermorency, itamar, jforbes, jyang, laine, libvirt-maint, marco, maurizio.antillon, moez.roy, veillard, virt-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Unspecified   
Whiteboard: abrt_hash:e6d59761193acafdb8f7a857112c61d42c5ceb33
Fixed In Version: libvirt- Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 00:55:54 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Description Flags
File: backtrace
File: cgroup
File: core_backtrace
File: dso_list
File: environ
File: limits
File: maps
File: open_fds
File: proc_pid_status
File: var_log_messages
fc20 still broken. abrt redirected here. attached is var log messages. none

Description Mikhail 2012-12-09 09:39:29 EST
Version-Release number of selected component:

Additional info:
backtrace_rating: 4
cmdline:        /usr/sbin/libvirtd
crash_function: virDBusWatchCallback
executable:     /usr/sbin/libvirtd
kernel:         3.6.6-3.fc18.i686
remote_result:  NOTFOUND
uid:            0

Truncated backtrace:
Thread no. 1 (5 frames)
 #0 virDBusWatchCallback at util/virdbus.c:144
 #1 virEventPollDispatchHandles at util/event_poll.c:485
 #2 virEventPollRunOnce at util/event_poll.c:632
 #3 virEventRunDefaultImpl at util/event.c:247
 #4 virNetServerRun at rpc/virnetserver.c:748
Comment 1 Mikhail 2012-12-09 09:39:34 EST
Created attachment 660278 [details]
File: backtrace
Comment 2 Mikhail 2012-12-09 09:39:36 EST
Created attachment 660279 [details]
File: cgroup
Comment 3 Mikhail 2012-12-09 09:39:38 EST
Created attachment 660280 [details]
File: core_backtrace
Comment 4 Mikhail 2012-12-09 09:39:41 EST
Created attachment 660281 [details]
File: dso_list
Comment 5 Mikhail 2012-12-09 09:39:43 EST
Created attachment 660282 [details]
File: environ
Comment 6 Mikhail 2012-12-09 09:39:45 EST
Created attachment 660283 [details]
File: limits
Comment 7 Mikhail 2012-12-09 09:39:48 EST
Created attachment 660284 [details]
File: maps
Comment 8 Mikhail 2012-12-09 09:39:50 EST
Created attachment 660285 [details]
File: open_fds
Comment 9 Mikhail 2012-12-09 09:39:53 EST
Created attachment 660286 [details]
File: proc_pid_status
Comment 10 Mikhail 2012-12-09 09:39:55 EST
Created attachment 660287 [details]
File: var_log_messages
Comment 11 Cole Robinson 2013-06-11 18:24:46 EDT
No more reports of this in 5 months, not clear the cause either, so closing.
Comment 12 f.alexander.wilms 2013-10-15 12:38:05 EDT
It crashed yesterday for me

--- Running report_uReport ---
A bug was already filed about this problem:
ABRT Server: URL=https://retrace.fedoraproject.org/faf/reports/28424/
ABRT Server: URL=https://retrace.fedoraproject.org/faf/reports/bthash/cc974fcd14b0ce5eec2453925117c30dfe380f84
Bugzilla: URL=https://bugzilla.redhat.com/show_bug.cgi?id=885445
Comment 13 collura 2013-12-17 06:02:42 EST
Created attachment 837631 [details]
fc20 still broken. abrt redirected here.  attached is var log messages.
Comment 14 Cole Robinson 2013-12-17 09:46:34 EST
Reopening, seems the important bit here is i686 as per the abrt retraces.

(In reply to collura from comment #13)
> Created attachment 837631 [details]
> fc20 still broken. abrt redirected here.  attached is var log messages.

collura, can you confirm your f20 package version? what were you doing when this crashed? is it reproducible?

Also, if you could get a fresh backtrace using gnome-abrt, that would help, since the line numbers have changed with f20 libvirt.
Comment 15 Daniel Berrange 2013-12-18 07:19:13 EST
What is actually interesting here is that the first report was against which included a very relevant dbus patch

commit cc9cfaa90dea7d34b5ed1512dc216d295df45566
Author: Cole Robinson <crobinso@redhat.com>
Date:   Tue Nov 13 08:53:57 2012 -0500

    Cleanly save session VMs on logout/shutdown (bz #872254)

From the stack trace we have one thread doing

#8  0x48f384d2 in dbus_connection_set_watch_functions (connection=connection@entry=0xb1300f48, add_function=add_function@entry=0x424cd440 <virDBusAddWatch>, remove_function=remove_function@entry=0x424cd3e0 <virDBusRemoveWatch>, toggled_function=toggled_function@entry=0x424cd370 <virDBusToggleWatch>, data=data@entry=0xb1300f48, free_data_function=free_data_function@entry=0x0) at dbus-connection.c:4898

to initialize the dbus connection. This method will immediately call the 'add_function' function in this case virDBusAddWatch. The event loop is in turn immediately dispatching a watch which triggered causing it to run virDBusWatchCallback

THe stack trace shows that virDBusWatchCallback is crashing on 

   while (dbus_connection_dispatch(info->bus) == DBUS_DISPATCH_DATA_REMAINS)^M

because 'info' is NULL.

Looking back at the virDBusAddWatch function we have a clear race condition
because we have not yet called     dbus_watch_set_data(watch, info, virDBusWatchFree);
Comment 16 Daniel Berrange 2013-12-18 07:23:47 EST
Posted a fix for this race


The only unanswered question is why Fedora users are only seeing this on i686. There's nothing really about this code which I can see would be i686 specific - ought to hit x86_64 users too. Perhaps there is just something about the speed (or lack thereof) on i686 which makes the race condition much more likely to hit on i686 ?
Comment 17 collura 2013-12-19 05:14:31 EST
regarding comment#14:

is running 


  ( regarding comment#16 about i686 only, here is your x86_64 example ;')
    running on a toshiba satellite L75D-A7283 which is chock full of lag :'| )

no hasnt happened again and dont know how to cause it yet.

i think i was using web browser. 
every few minutes gnome-shell seemed to die for about 30seconds of blackscreen and then came back.  one of the times i checked abrt and found this crash listed but when i reported it there wasnt much info and it sent me on my way.  it referenced this then-closed bug (bz#885445)

   '--- Running report_uReport ---
    A bug was already filed about this problem:
    Bugzilla: URL=https://bugzilla.redhat.com/show_bug.cgi?id=885445

and gave an abrt server entry (which i didnt include in comment#13 because i thought that it did it already doh):  

Comment 18 Fedora Update System 2014-01-17 09:04:43 EST
libvirt- has been submitted as an update for Fedora 20.
Comment 19 Fedora Update System 2014-01-17 23:22:14 EST
Package libvirt-
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
Comment 20 Fedora Update System 2014-01-21 00:55:54 EST
libvirt- has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.