This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 786170 - gam_server deadlocks, leading to all KDE applications hanging at start
gam_server deadlocks, leading to all KDE applications hanging at start
Product: Fedora
Classification: Fedora
Component: gamin (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Tomáš Bžatek
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2012-01-31 11:06 EST by Sandro Mani
Modified: 2015-03-03 18:04 EST (History)
4 users (show)

See Also:
Fixed In Version: gamin-0.1.10-12.fc18
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-02-08 11:28:32 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Backtrace (1.73 KB, text/plain)
2012-01-31 11:06 EST, Sandro Mani
no flags Details
Patch (586 bytes, patch)
2012-02-02 17:14 EST, Sandro Mani
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
GNOME Desktop 667230 None None None 2012-06-29 16:00:18 EDT
GNOME Desktop 669292 None None None Never
KDE Software Compilation 281312 None None None Never
Debian BTS 542361 None None None 2012-06-29 16:37:49 EDT

  None (edit)
Description Sandro Mani 2012-01-31 11:06:18 EST
Created attachment 558641 [details]

Description of problem:
Occasionally, gam_server deadlocks (see attached backtrace). When this happens, all KDE applications hang at start. Killing gam_server allows the user to regain an operational desktop.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Hard, it appears to be more likely to happen when attempting to launch a KDE application when the machine is out of RAM and swapping.
Actual results:
gam_server hangs, an so do all applications using it (i.e. attempt to connect to /tmp/fam-$username-)

Additional info:
Backtrace attached
Comment 1 Sandro Mani 2012-02-02 17:14:12 EST
Looking at gamin-0.1.10/server/inotify-helper.c, there is an obvious double lock:
ih_sub_foreach_worker locks inotify_lock on line 153, and calls ih_sub_cancel on line 162, which also attempts to lock inotify_lock on line 130. The backtrace indeed shows that gam_server is stuck ad inotify-helper.c::130.

I would suggest removing the G_LOCK and G_UNLOCK in ih_sub_cancel, since it is only called by ih_sub_foreach_worker anyway. I have made a corresponding patch.
Comment 2 Sandro Mani 2012-02-02 17:14:56 EST
Created attachment 559148 [details]
Comment 4 Tomáš Bžatek 2012-02-03 08:17:12 EST
FYI, this piece of code is embedded in glib, should be fixed there too.
/me checks...
Comment 5 Tomáš Bžatek 2012-02-08 11:28:32 EST
Confirming this doesn't affect glib, ih_sub_cancel() is not used in nested calls and there should not be any lock held. In fact, glib uses slightly newer sources from John McCutchan.

Posted my comments in upstream bug

Fixed in gamin-0.1.10-12.fc18
Comment 6 Rex Dieter 2012-06-29 16:37:49 EDT
marked a few more upstream references

Had someone in #fedora-kde irc today seemingly get hit by this already running gamin-0.1.10-12.fc17 , will keep an eye out and re-open if we can get more debugging done next time (instead of simply killing gam_server)
Comment 7 Gerd v. Egidy 2013-03-03 15:57:31 EST
I hit this every few weeks, currently running gamin-0.1.10-12.fc17.x86_64. Today I had enough time to strace it, find that gam_server is the culprit and found this bugzilla entry.

What do you want me to do to further analyze this when I hit it next time?
Comment 8 Sandro Mani 2013-03-03 16:03:12 EST
Do (as root)

debuginfo-install gamin

then run (as normal user)

gdb -p$(pidof gam_server)

and then type


and attach the backtrace.
Comment 9 Sandro Mani 2013-03-03 16:04:12 EST
Ups, should be

gdb -p $(pidof gam_server)

(note the space between p and $)
Comment 10 Gerd v. Egidy 2013-03-03 17:39:39 EST
Thanks. So just a regular backtrace.

Now I remembered how to trigger this: open dolphin and navigate to a directory with many subdirs (about 600 in my case). Scroll a bit around in the view or enter a subdir. And it hits you within a minute or so. BTW, this dir is mounted via NFS4, if that matters.

Here is the backtrace:

#0  0x00007f506864c950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=9, data=data@entry=0x7fff62545660, len=len@entry=59) at gam_channel.c:826
#2  0x0000000000408794 in gam_send_event (conn=conn@entry=0x14dfb60, reqno=90, event=<optimized out>, path=0x15105b0 "###DIRNAME###", len=49)
    at gam_connection.c:609
#3  0x000000000040a36d in gam_eq_flush_callback (conn=0x14dfb60, event=0x1510810, eq=<optimized out>) at gam_eq.c:118
#4  gam_eq_flush (eq=0x14dd130, conn=conn@entry=0x14dfb60) at gam_eq.c:137
#5  0x00000000004081ea in gam_connection_eq_flush (data=0x14dfb60, data@entry=<error reading variable: value has been optimized out>) at gam_connection.c:174
#6  0x00007f50689673bb in g_timeout_dispatch (source=source@entry=0x1511a40, callback=<optimized out>, user_data=<optimized out>) at gmain.c:3882
#7  0x00007f5068966825 in g_main_dispatch (context=0x14dd4d0) at gmain.c:2539
#8  g_main_context_dispatch (context=context@entry=0x14dd4d0) at gmain.c:3075
#9  0x00007f5068966b58 in g_main_context_iterate (context=0x14dd4d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146
#10 0x00007f5068966f52 in g_main_loop_run (loop=0x14df160) at gmain.c:3340
#11 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647

I edited the entry ###DIRNAME###, there was the name of one of my directories in it. ###DIRNAME### was not the current dir I was in, but one up in hierarchy and then two below. Don't know what dolphin was telling gam to do there, I never entered that ###DIRNAME### or the one above in this whole KDE session.

Version is gamin-0.1.10-12.fc17.x86_64,
Signature   : RSA/SHA256, Wed 08 Feb 2012 05:46:09 PM CET, Key ID 50e94c991aca3465
Build Date  : Wed 08 Feb 2012 05:27:13 PM CET
Build Host  :
Comment 11 Gerd v. Egidy 2013-03-03 17:50:01 EST
Next one, tried it in another big dir hierarchy. Can reproduce it within minutes now.

#0  0x00007fb88cfd6950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=20, data=data@entry=0x7fff0bf90350, len=len@entry=51) at gam_channel.c:826
#2  0x0000000000408b6d in gam_send_ack (conn=conn@entry=0xf3e9d0, reqno=1009, path=path@entry=0xff1230 "##OTHERDIR##", len=len@entry=41) at gam_connection.c:686
#3  0x0000000000409020 in gam_connection_request (req=0xf3ee02, conn=0xf3e9d0) at gam_connection.c:411
#4  gam_connection_data (conn=conn@entry=0xf3e9d0, len=<optimized out>) at gam_connection.c:509
#5  0x0000000000407adf in gam_client_conn_read (source=0xf8f980, condition=<optimized out>, info=0xf3e9d0) at gam_channel.c:283
#6  0x00007fb88d2f0825 in g_main_dispatch (context=0xf2f4d0) at gmain.c:2539
#7  g_main_context_dispatch (context=context@entry=0xf2f4d0) at gmain.c:3075
#8  0x00007fb88d2f0b58 in g_main_context_iterate (context=0xf2f4d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146
#9  0x00007fb88d2f0f52 in g_main_loop_run (loop=0xf31160) at gmain.c:3340
#10 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647

##OTHERDIR## again is a dir that I did not open. This time it is below the current one.
Comment 12 Sandro Mani 2013-03-03 17:54:34 EST
This is not the deadlock reported here, so it's better to file a new bug. While I'm at it, when gam_server is hanging, who is the file descriptor fd=20? Look at 

ls -l /proc/$(pidof gam_server)/fd
Comment 13 Gerd v. Egidy 2013-03-04 17:29:30 EST
Ok, I filed a new bug for this: #917848

Note You need to log in before you can comment on or make changes to this bug.