Bug 885445

Summary: f18/f19/f20 crash in virDBusWatchCallback on i686
Product: [Fedora] Fedora Reporter: Mikhail <mikhail.v.gavrilov>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: berrange, clalancette, collura, crobinso, f.alexander.wilms, guillaumepoiriermorency, itamar, jforbes, jyang, laine, libvirt-maint, marco, maurizio.antillon, moez.roy, veillard, virt-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Unspecified   
Whiteboard: abrt_hash:e6d59761193acafdb8f7a857112c61d42c5ceb33
Fixed In Version: libvirt-1.1.3.3-1.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 05:55:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: dso_list
none
File: environ
none
File: limits
none
File: maps
none
File: open_fds
none
File: proc_pid_status
none
File: var_log_messages
none
fc20 still broken. abrt redirected here. attached is var log messages. none

Description Mikhail 2012-12-09 14:39:29 UTC
Version-Release number of selected component:
libvirt-daemon-0.10.2.1-3.fc18

Additional info:
backtrace_rating: 4
cmdline:        /usr/sbin/libvirtd
crash_function: virDBusWatchCallback
executable:     /usr/sbin/libvirtd
kernel:         3.6.6-3.fc18.i686
remote_result:  NOTFOUND
uid:            0

Truncated backtrace:
Thread no. 1 (5 frames)
 #0 virDBusWatchCallback at util/virdbus.c:144
 #1 virEventPollDispatchHandles at util/event_poll.c:485
 #2 virEventPollRunOnce at util/event_poll.c:632
 #3 virEventRunDefaultImpl at util/event.c:247
 #4 virNetServerRun at rpc/virnetserver.c:748

Comment 1 Mikhail 2012-12-09 14:39:34 UTC
Created attachment 660278 [details]
File: backtrace

Comment 2 Mikhail 2012-12-09 14:39:36 UTC
Created attachment 660279 [details]
File: cgroup

Comment 3 Mikhail 2012-12-09 14:39:38 UTC
Created attachment 660280 [details]
File: core_backtrace

Comment 4 Mikhail 2012-12-09 14:39:41 UTC
Created attachment 660281 [details]
File: dso_list

Comment 5 Mikhail 2012-12-09 14:39:43 UTC
Created attachment 660282 [details]
File: environ

Comment 6 Mikhail 2012-12-09 14:39:45 UTC
Created attachment 660283 [details]
File: limits

Comment 7 Mikhail 2012-12-09 14:39:48 UTC
Created attachment 660284 [details]
File: maps

Comment 8 Mikhail 2012-12-09 14:39:50 UTC
Created attachment 660285 [details]
File: open_fds

Comment 9 Mikhail 2012-12-09 14:39:53 UTC
Created attachment 660286 [details]
File: proc_pid_status

Comment 10 Mikhail 2012-12-09 14:39:55 UTC
Created attachment 660287 [details]
File: var_log_messages

Comment 11 Cole Robinson 2013-06-11 22:24:46 UTC
No more reports of this in 5 months, not clear the cause either, so closing.

Comment 12 f.alexander.wilms 2013-10-15 16:38:05 UTC
It crashed yesterday for me

--- Running report_uReport ---
A bug was already filed about this problem:
ABRT Server: URL=https://retrace.fedoraproject.org/faf/reports/28424/
ABRT Server: URL=https://retrace.fedoraproject.org/faf/reports/bthash/cc974fcd14b0ce5eec2453925117c30dfe380f84
Bugzilla: URL=https://bugzilla.redhat.com/show_bug.cgi?id=885445

Comment 13 collura 2013-12-17 11:02:42 UTC
Created attachment 837631 [details]
fc20 still broken. abrt redirected here.  attached is var log messages.

Comment 14 Cole Robinson 2013-12-17 14:46:34 UTC
Reopening, seems the important bit here is i686 as per the abrt retraces.


(In reply to collura from comment #13)
> Created attachment 837631 [details]
> fc20 still broken. abrt redirected here.  attached is var log messages.

collura, can you confirm your f20 package version? what were you doing when this crashed? is it reproducible?

Also, if you could get a fresh backtrace using gnome-abrt, that would help, since the line numbers have changed with f20 libvirt.

Comment 15 Daniel Berrangé 2013-12-18 12:19:13 UTC
What is actually interesting here is that the first report was against 0.10.2.1-3 which included a very relevant dbus patch

commit cc9cfaa90dea7d34b5ed1512dc216d295df45566
Author: Cole Robinson <crobinso>
Date:   Tue Nov 13 08:53:57 2012 -0500

    Cleanly save session VMs on logout/shutdown (bz #872254)



From the stack trace we have one thread doing

#8  0x48f384d2 in dbus_connection_set_watch_functions (connection=connection@entry=0xb1300f48, add_function=add_function@entry=0x424cd440 <virDBusAddWatch>, remove_function=remove_function@entry=0x424cd3e0 <virDBusRemoveWatch>, toggled_function=toggled_function@entry=0x424cd370 <virDBusToggleWatch>, data=data@entry=0xb1300f48, free_data_function=free_data_function@entry=0x0) at dbus-connection.c:4898


to initialize the dbus connection. This method will immediately call the 'add_function' function in this case virDBusAddWatch. The event loop is in turn immediately dispatching a watch which triggered causing it to run virDBusWatchCallback

THe stack trace shows that virDBusWatchCallback is crashing on 

   while (dbus_connection_dispatch(info->bus) == DBUS_DISPATCH_DATA_REMAINS)^M

because 'info' is NULL.

Looking back at the virDBusAddWatch function we have a clear race condition
because we have not yet called     dbus_watch_set_data(watch, info, virDBusWatchFree);

Comment 16 Daniel Berrangé 2013-12-18 12:23:47 UTC
Posted a fix for this race

https://www.redhat.com/archives/libvir-list/2013-December/msg00989.html

The only unanswered question is why Fedora users are only seeing this on i686. There's nothing really about this code which I can see would be i686 specific - ought to hit x86_64 users too. Perhaps there is just something about the speed (or lack thereof) on i686 which makes the race condition much more likely to hit on i686 ?

Comment 17 collura 2013-12-19 10:14:31 UTC
regarding comment#14:

is running 

  libvirt-daemon-1.1.3.2-1.fc20.x86_64

  ( regarding comment#16 about i686 only, here is your x86_64 example ;')
    running on a toshiba satellite L75D-A7283 which is chock full of lag :'| )

no hasnt happened again and dont know how to cause it yet.

i think i was using web browser. 
every few minutes gnome-shell seemed to die for about 30seconds of blackscreen and then came back.  one of the times i checked abrt and found this crash listed but when i reported it there wasnt much info and it sent me on my way.  it referenced this then-closed bug (bz#885445)

   '--- Running report_uReport ---
    A bug was already filed about this problem:
    Bugzilla: URL=https://bugzilla.redhat.com/show_bug.cgi?id=885445
   '

and gave an abrt server entry (which i didnt include in comment#13 because i thought that it did it already doh):  

    https://retrace.fedoraproject.org/faf/reports/28292/

Comment 18 Fedora Update System 2014-01-17 14:04:43 UTC
libvirt-1.1.3.3-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/libvirt-1.1.3.3-1.fc20

Comment 19 Fedora Update System 2014-01-18 04:22:14 UTC
Package libvirt-1.1.3.3-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-1.1.3.3-1.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-1042/libvirt-1.1.3.3-1.fc20
then log in and leave karma (feedback).

Comment 20 Fedora Update System 2014-01-21 05:55:54 UTC
libvirt-1.1.3.3-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.