Bug 917848
Summary: | gam_server deadlocks, leading to all KDE applications hanging | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Gerd v. Egidy <gerd> | ||||||||
Component: | gamin | Assignee: | Rex Dieter <rdieter> | ||||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 28 | CC: | dvcroft, elbin.p, germano.massullo, lukast.dev, manisandro, rdieter, redhat, register, rs, tuju, twshield, zeekec | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-05-28 19:42:13 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Gerd v. Egidy
2013-03-04 22:27:15 UTC
Here is one backtrace I made of gam_server in deadlocked state: #0 0x00007f506864c950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=9, data=data@entry=0x7fff62545660, len=len@entry=59) at gam_channel.c:826 #2 0x0000000000408794 in gam_send_event (conn=conn@entry=0x14dfb60, reqno=90, event=<optimized out>, path=0x15105b0 "###DIRNAME###", len=49) at gam_connection.c:609 #3 0x000000000040a36d in gam_eq_flush_callback (conn=0x14dfb60, event=0x1510810, eq=<optimized out>) at gam_eq.c:118 #4 gam_eq_flush (eq=0x14dd130, conn=conn@entry=0x14dfb60) at gam_eq.c:137 #5 0x00000000004081ea in gam_connection_eq_flush (data=0x14dfb60, data@entry=<error reading variable: value has been optimized out>) at gam_connection.c:174 #6 0x00007f50689673bb in g_timeout_dispatch (source=source@entry=0x1511a40, callback=<optimized out>, user_data=<optimized out>) at gmain.c:3882 #7 0x00007f5068966825 in g_main_dispatch (context=0x14dd4d0) at gmain.c:2539 #8 g_main_context_dispatch (context=context@entry=0x14dd4d0) at gmain.c:3075 #9 0x00007f5068966b58 in g_main_context_iterate (context=0x14dd4d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146 #10 0x00007f5068966f52 in g_main_loop_run (loop=0x14df160) at gmain.c:3340 #11 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 I edited the entry ###DIRNAME###, there was the name of one of my directories in it. ###DIRNAME### was not the current dir I was in, but one up in hierarchy and then two below. Don't know what dolphin was telling gam to do there, I never entered that ###DIRNAME### or the one above in this whole KDE session. Another backtrace: #0 0x00007fb88cfd6950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=20, data=data@entry=0x7fff0bf90350, len=len@entry=51) at gam_channel.c:826 #2 0x0000000000408b6d in gam_send_ack (conn=conn@entry=0xf3e9d0, reqno=1009, path=path@entry=0xff1230 "##OTHERDIR##", len=len@entry=41) at gam_connection.c:686 #3 0x0000000000409020 in gam_connection_request (req=0xf3ee02, conn=0xf3e9d0) at gam_connection.c:411 #4 gam_connection_data (conn=conn@entry=0xf3e9d0, len=<optimized out>) at gam_connection.c:509 #5 0x0000000000407adf in gam_client_conn_read (source=0xf8f980, condition=<optimized out>, info=0xf3e9d0) at gam_channel.c:283 #6 0x00007fb88d2f0825 in g_main_dispatch (context=0xf2f4d0) at gmain.c:2539 #7 g_main_context_dispatch (context=context@entry=0xf2f4d0) at gmain.c:3075 #8 0x00007fb88d2f0b58 in g_main_context_iterate (context=0xf2f4d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146 #9 0x00007fb88d2f0f52 in g_main_loop_run (loop=0xf31160) at gmain.c:3340 #10 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 ##OTHERDIR## again is a dir that I did not open. This time it is below the current one. I just tried it again. This time it behaved a bit differently: I could close all dolphins, start new ones and work with them (and other apps). But as soon as I went to the dir it deadlocked in before, the new dolphins deadlocked too. Only when I killed gam_server, I had to do some navigating and scrolling for a minute or so till it deadlocked again. Seems like gam_server kept a bad lock for this dir. Here is the backtrace after the deadlock: #0 0x00007fb570788950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=18, data=data@entry=0x7fffbd84b950, len=len@entry=36) at gam_channel.c:826 #2 0x0000000000408794 in gam_send_event (conn=conn@entry=0x1ebb200, reqno=8947, event=<optimized out>, path=0x226efd0 "##FILENAME##", len=26) at gam_connection.c:609 #3 0x000000000040a36d in gam_eq_flush_callback (conn=0x1ebb200, event=0x2210ee0, eq=<optimized out>) at gam_eq.c:118 #4 gam_eq_flush (eq=0x1e1d4e0, conn=conn@entry=0x1ebb200) at gam_eq.c:137 #5 0x00000000004081ea in gam_connection_eq_flush (data=0x1ebb200, data@entry=<error reading variable: value has been optimized out>) at gam_connection.c:174 #6 0x00007fb570aa33bb in g_timeout_dispatch (source=source@entry=0x224fec0, callback=<optimized out>, user_data=<optimized out>) at gmain.c:3882 #7 0x00007fb570aa2825 in g_main_dispatch (context=0x1e1e610) at gmain.c:2539 #8 g_main_context_dispatch (context=context@entry=0x1e1e610) at gmain.c:3075 #9 0x00007fb570aa2b58 in g_main_context_iterate (context=0x1e1e610, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146 #10 0x00007fb570aa2f52 in g_main_loop_run (loop=0x1e20020) at gmain.c:3340 #11 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 # ls -l /proc/$(pidof gam_server)/fd lrwx------ 1 gerd gerd 64 Mar 4 23:39 18 -> socket:[160104] # ss -p | grep 160104 u_str ESTAB 9222 0 * 159338 * 160104 users:(("dolphin",8376,9)) u_str ESTAB 10990 0 @/tmp/fam-gerd- 160104 * 159338 users:(("gam_server",1696,18)) 8376 is the dolphin session I used to deadlock gamin. Same gam_server process, after I killed the 8376 dolphin: #0 0x00007fb57078cb94 in __GI___poll (fds=0x1e62910, nfds=18, timeout=879) at ../sysdeps/unix/sysv/linux/poll.c:83 #1 0x00007fb570aa2af4 in g_main_context_poll (n_fds=18, fds=0x1e62910, timeout=879, context=0x1e1e610, priority=<optimized out>) at gmain.c:3440 #2 g_main_context_iterate (context=0x1e1e610, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3141 #3 0x00007fb570aa2f52 in g_main_loop_run (loop=0x1e20020) at gmain.c:3340 #4 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 Then I used a new dolphin to enter the dir it deadlocked in before: #0 0x00007fb570788950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=18, data=data@entry=0x7fffbd84b950, len=len@entry=59) at gam_channel.c:826 #2 0x0000000000408794 in gam_send_event (conn=conn@entry=0x21a5400, reqno=90, event=<optimized out>, path=0x2384490 "##DIRNAME##", len=49) at gam_connection.c:609 #3 0x000000000040a36d in gam_eq_flush_callback (conn=0x21a5400, event=0x20a99e0, eq=<optimized out>) at gam_eq.c:118 #4 gam_eq_flush (eq=0x1e3f180, conn=conn@entry=0x21a5400) at gam_eq.c:137 #5 0x00000000004081ea in gam_connection_eq_flush (data=0x21a5400, data@entry=<error reading variable: value has been optimized out>) at gam_connection.c:174 #6 0x00007fb570aa33bb in g_timeout_dispatch (source=source@entry=0x214bef0, callback=<optimized out>, user_data=<optimized out>) at gmain.c:3882 #7 0x00007fb570aa2825 in g_main_dispatch (context=0x1e1e610) at gmain.c:2539 #8 g_main_context_dispatch (context=context@entry=0x1e1e610) at gmain.c:3075 #9 0x00007fb570aa2b58 in g_main_context_iterate (context=0x1e1e610, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146 #10 0x00007fb570aa2f52 in g_main_loop_run (loop=0x1e20020) at gmain.c:3340 #11 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 fd 18 is the socket to this new dolphin session. And what does a backtrace of dolphin show when it is deadlocked (you might want to do a thread apply all backtrace in gdb)? And I take that the system is responsive again after killing dolphin? If so, that would rather be a dolphin bug, not a gamin one. (gdb) bt #0 0x0000003e370e99ad in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000003e39847d24 in g_main_context_poll (priority=2147483647, n_fds=6, fds=0x1639480, timeout=1999, context=0x1187a00) at gmain.c:3584 #2 g_main_context_iterate (context=context@entry=0x1187a00, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3285 #3 0x0000003e39847e44 in g_main_context_iteration (context=0x1187a00, may_block=1) at gmain.c:3351 #4 0x0000003e41da6106 in QEventDispatcherGlib::processEvents (this=0x115df30, flags=...) at kernel/qeventdispatcher_glib.cpp:424 #5 0x0000003e4466a73e in QGuiEventDispatcherGlib::processEvents (this=<optimized out>, flags=...) at kernel/qguieventdispatcher_glib.cpp:207 #6 0x0000003e41d7680f in QEventLoop::processEvents (this=this@entry=0x7fff761e94d0, flags=...) at kernel/qeventloop.cpp:149 #7 0x0000003e41d76a98 in QEventLoop::exec (this=0x7fff761e94d0, flags=...) at kernel/qeventloop.cpp:204 #8 0x0000003e41d7b888 in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1218 #9 0x0000000000407fee in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/okular-4.10.1/shell/main.cpp:94 (gdb) similar symptoms here too, I typically notice this that cannot open PDF attachments with okular from emails or open new dolphin window from plasma panel. I'm going to start killing gam_server first and see if that one causing this. Apply the backtrace to all threads (thread apply all bt), the backtrace you posted is one of a thread polling for events (not the one hanging). This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. Still exists in fedora 20. Is there a new bug for this? Re-opening, for f20. And please post a full backtrace per comment #6 if possible Looks like I already did, in comment #5. and comment #6 says that backtrace was the wrong thread. :) crap. Need to work now, let's try that later. This rose up while tried to open bank statement pdf and okular refused to start. I can also reproduce this problem with krusader, dolphin, gwenview or okular. I'm using gamin-0.1.10-15.fc20.x86_64 Can you please post a backtrace? Please make sure that the backtrace is of the hanging thread (or to be sure, just collect the backtrace of all threads via thread apply all bt). @Sandro Mani: Should I attach gdb to okular or to gam_server? Could you do both, just to get the full picture? #################################### gam_server: #################################### (gdb) thread apply all bt Thread 1 (Thread 0x7f434b10f740 (LWP 1480)): #0 0x00000035d16e66b0 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000000000408086 in gam_client_conn_write () #2 0x000000000040873b in gam_send_event () #3 0x000000000040a2cd in gam_eq_flush () #4 0x000000000040813a in gam_connection_eq_flush () #5 0x00000035d3249e43 in g_timeout_dispatch () from /lib64/libglib-2.0.so.0 #6 0x00000035d32492a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #7 0x00000035d3249628 in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0 #8 0x00000035d3249a3a in g_main_loop_run () from /lib64/libglib-2.0.so.0 #9 0x0000000000403706 in main () #################################### Okular: #################################### (gdb) thread apply all bt Thread 1 (Thread 0x7ffff7fa38c0 (LWP 4305)): #0 0x00000035d1a0eac0 in __connect_nocancel () from /lib64/libpthread.so.0 #1 0x00000035df402195 in gamin_connect_unix_socket () from /lib64/libfam.so.0 #2 0x00000035df402a67 in FAMOpen () from /lib64/libfam.so.0 #3 0x00000030eab18fc5 in KDirWatchPrivate::KDirWatchPrivate() () from /lib64/libkdecore.so.5 #4 0x00000030eab19745 in KDirWatch::KDirWatch(QObject*) () from /lib64/libkdecore.so.5 #5 0x00007fffed9a1355 in Okular::Part::Part(QWidget*, QObject*, QList<QVariant> const&, KComponentData) () from /usr/lib64/kde4/okularpart.so #6 0x00007fffed9a1896 in Okular::PartFactory::create(char const*, QWidget*, QObject*, QList<QVariant> const&, QString const&) () from /usr/lib64/kde4/okularpart.so #7 0x000000000040f0ee in Shell::Shell(QString const&) () #8 0x000000000040a2e6 in Okular::main(QStringList const&, QString const&) () #9 0x0000000000409cc2 in main () I hit the bug again with KDevelop: access("/usr/share/locale/en_US/LC_SCRIPTS/kdevelop/kdevelop.js", R_OK) = -1 ENOENT (No such file or directory) rt_sigaction(SIGUSR2, NULL, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGUSR2, {0x35df404c10, [USR2], SA_RESTORER|SA_RESTART, 0x35d16358f0}, {SIG_DFL, [], 0}, 8) = 0 getuid() = 1000 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 12 fstat(12, {st_mode=S_IFREG|0644, st_size=2171, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc245dbf000 read(12, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2171 close(12) = 0 munmap(0x7fc245dbf000, 4096) = 0 socket(PF_LOCAL, SOCK_STREAM, 0) = 12 connect(12, {sa_family=AF_LOCAL, sun_path=@"/tmp/fam-lukas-"}, 110^CProcess 4120 detached <detached ...> Are you able to consistently reproduce this somehow? And concerning comment 18 (sorry for the late reply, missed it): for next time you hit this, can you make sure you have the gamin debug symbols installed? I.e. yum install gamin-debuginfo This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. This bug still happens in Fedora 21. Occasionally. I hit the bug again, time to install gam debug symbols.. == Krusader == #0 0x000000332bc0f47d in connect () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000003339a01ff5 in gamin_connect_unix_socket (path=path@entry=0x2136660 "/tmp/fam-lukas-") at gam_api.c:383 #2 0x0000003339a02a47 in FAMOpen (fc=0x2182c20) at gam_api.c:977 #3 0x00000033c011b689 in KDirWatchPrivate::KDirWatchPrivate() () at /lib64/libkdecore.so.5 #4 0x00000033c011c205 in KDirWatch::KDirWatch(QObject*) () at /lib64/libkdecore.so.5 #5 0x000000000046d126 in KrTrashWatcher::KrTrashWatcher() () #6 0x000000000046d1d6 in KrTrashHandler::startWatcher() () #7 0x0000000000463a37 in Krusader::Krusader() () #8 0x000000000044c51f in main () == gam_server == Program received signal SIGINT, Interrupt. 0x000000332b8f0940 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x000000332b8f0940 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000407ed6 in gam_client_conn_write (source=0x13, fd=19, data=0x7ffe1bb52f80, len=219774126400, len@entry=56) at gam_channel.c:826 #2 0x00000000004085ad in gam_send_event (conn=conn@entry=0x2520160, reqno=59, event=<optimized out>, path=0x257c900 "/home/user/obfuscated-here", len=46) at gam_connection.c:609 #3 0x000000000040a165 in gam_eq_flush_callback (eq=0x2519be0, conn=0x2520160, event=0x257caa0) at gam_eq.c:118 #4 gam_eq_flush (eq=0x2519be0, conn=0x2520160) at gam_eq.c:137 #5 0x0000000000407f8a in gam_connection_eq_flush (data=0x13, data@entry=<error reading variable: value has been optimized out>) at gam_connection.c:174 #6 0x000000332b44a263 in g_timeout_dispatch (source=0x251ad50, callback=<optimized out>, user_data=<optimized out>) at gmain.c:4520 #7 0x000000332b4497fb in g_main_dispatch (context=0x250f500) at gmain.c:3111 #8 g_main_context_dispatch (context=context@entry=0x250f500) at gmain.c:3710 #9 0x000000332b449b98 in g_main_context_iterate (context=0x250f500, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3781 #10 0x000000332b449ec2 in g_main_loop_run (loop=0x250fc30) at gmain.c:3975 #11 0x00000000004036e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 == This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. I can still trigger it in Fedora 22 with e.g. dolphin. Could you describe how you are able to trigger the issue? Still the same way as in the original bug report: Steps to Reproduce: 1. Open dolphin 2. Navigate into a large directory structure with several hundred or thousand subdirectories, this is mounted via NFS in my case 3. Scroll around a bit or enter a subdirectory Actual results: dolphin deadlocks, new KDE apps lock at start too. Killing gam_server resolves the problem instantly The "trick" is the large directory structure. And I think having it mounted via nfs adds additional latency and makes it more likely that you trigger it. But when I wrote the original bug report, it also happened with large dir structures on a local disk. But now I nearly exclusively use SSDs and it is harder to trigger it with them. I haven't investigated further if it only affects programs based on KDE 4 or also programs based on KDE 5. Just checked the detailed behavior on F22: - the deadlock now only affects the program that triggered it (usually dolphin in my case). - other KDE programs, be they KDE 4 oder KDE 5 based, are not affected anymore. upstream just committed a slightly different fix that what our packaging had been using, i'll try pulling that in and see if it helps gamin-0.1.10-22.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-59c9fbaf94 gamin-0.1.10-22.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-636c7a6056 thank you for your effort. I tried gamin-0.1.10-22.fc22 and unfortunately the bug is not fixed yet. I can still easily trigger it within about a minute with the method described in #c29 If more backtraces or trying a special debug version would help, just tell me what to do. gamin-0.1.10-22.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-636c7a6056 Yes, a fresh backtrace would likely be helpful Here you go: (gdb) thread apply all bt Thread 1 (Thread 0x7f319fcaf700 (LWP 3358)): #0 0x00007f319f4d3280 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000000000407ea6 in gam_client_conn_write (source=<optimized out>, fd=fd@entry=10, data=data@entry=0x7ffd313bf960, len=len@entry=75) at gam_channel.c:826 #2 0x0000000000408897 in gam_send_ack (conn=conn@entry=0x24ccd20, reqno=1633, path=path@entry=0x29918e0 "##BIGDIR##/##OTHERDIR##", len=len@entry=65) at gam_connection.c:686 #3 0x0000000000408d8a in gam_connection_request (req=0x24cd5f8, conn=0x24ccd20) at gam_connection.c:411 #4 gam_connection_data (conn=conn@entry=0x24ccd20, len=<optimized out>) at gam_connection.c:509 #5 0x000000000040783b in gam_client_conn_read (source=0x2a04a20, condition=<optimized out>, info=0x24ccd20) at gam_channel.c:283 #6 0x00007f319f7eaa8a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #7 0x00007f319f7eae20 in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0 #8 0x00007f319f7eb142 in g_main_loop_run () from /lib64/libglib-2.0.so.0 #9 0x00000000004036d5 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 That is with gamin-0.1.10-22.fc22 on Fedora 22 x86_64. ##BIGDIR## contains about 20000 subdirectories and is mounted via NFS. I opened it in dolphin, scrolled around a bit and then tried to open one of those subdirs. That is the moment it deadlocked. ##OTHERDIR## is some other directory, not the one I tried to open. gamin-0.1.10-22.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-59c9fbaf94 gamin-0.1.10-22.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. gamin-0.1.10-22.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. gamin-0.1.10-22.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. Fedora 23: Myriam from KDE, told me to inform you about Amarok backtrace https://bugs.kde.org/attachment.cgi?id=98587 at comment https://bugs.kde.org/show_bug.cgi?id=353949#c3 I retrieved a gam_server GDB backtrace while having Amarok freezed 0x00007fd0436affc0 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) thread apply all backtrace Thread 1 (Thread 0x7fd043e90700 (LWP 2053)): #0 0x00007fd0436affc0 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007fd0439c416c in g_main_context_poll (priority=2147483647, n_fds=15, fds=0x55b338ed4180, timeout=<optimized out>, context=0x55b338c127d0) at gmain.c:4135 #2 g_main_context_iterate (context=0x55b338c127d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3835 #3 0x00007fd0439c44f2 in g_main_loop_run (loop=0x55b338c140c0) at gmain.c:4034 #4 0x000055b336aa6855 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647 (gdb) Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. I'm having a lot of KDE component hangups in F24. - rekonq hangs, results completely gray window - konversation hangs, doesn't redraw window area - plasma hangs, K-menu doesn't open, dolphin doesn't open Killing plasmashell, krunner, kwin_x11 and trying to restart 'em in konsole doesn't work, processes don't seem to output anything into konsole either. Killing gam_server once (it gets respawned) and trying again those helps. *** Bug 1404808 has been marked as a duplicate of this bug. *** Just had this happen on F25 with all updates applied. Kate and Kwrite would not open. Killed gam_server and it restarted automatically and then they worked. I had been logged in for more than 12 hours. Have my home dir on nfs as well as other nfs mounts. gamin-0.1.10-23.fc25.x86_64 is what is installed. Looks like [1] the bug has been fixed but it is not yet backported into Fedora because I am still experiencing this kind of troubles[2] [1]: https://bugzilla.gnome.org/show_bug.cgi?id=667230#c7 [2]: https://bugzilla.gnome.org/show_bug.cgi?id=667230#c8 Can maintainers check if they can backport the patch from the unstable branch? Unfortunately that patch is already applied in the Fedora package: http://pkgs.fedoraproject.org/cgit/rpms/gamin.git/tree/0004-fix-possible-server-deadlock-in-ih_sub_cancel.patch (In reply to Sandro Mani from comment #48) > Unfortunately that patch is already applied in the Fedora package: > > http://pkgs.fedoraproject.org/cgit/rpms/gamin.git/tree/0004-fix-possible- > server-deadlock-in-ih_sub_cancel.patch I think that was another patch because I cannot see "inotify_lock" and "ih_sub_cancel()" stuff in the backtraces I posted in Comment 47 There are no other relevant commits upstream: https://git.gnome.org/browse/gamin/log/ I noticed that many commentors mention having home on nfs. I've suspected that to be a can of worms, for example socket-wise in bug #957786, that got closed with WORKSFORME by person who doesn't use nfs home (sounds logical). Another problem I've noticed related to the home on nfs, is that current Fedora desktop looses its dhcp ip-address occasionally. It sometimes happens when I'm using the desktop, and it certainly happens when I'm not. Having disk ripped away from running processes probably doesn't do any good. Booting this desktop and running it after it feels good - no swap and everything responds fine. Leaving it on and coming back later, next day or so - there is a lot of swap and some functions don't work anymore. To add details to my use case, my system does *not* run NFS server and/or client, so the bug is not NFS related. # systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled) Active: inactive (dead) Can you reproduce it inside strace? Something like $ strace -f -o strace.log <program> (In reply to Germano Massullo from comment #52) > To add details to my use case, my system does *not* run NFS server and/or > client, so the bug is not NFS related. If you have a problematic program, have you looked its file descriptor statuses? $ lsof | grep <program name> there were some good switches for lsof that I already forgot. Anyway, I noticed that some hanging programs had constant 'cwd' for example. Not sure was that the hangpoint or not. http://unix.stackexchange.com/questions/60422/how-to-interpret-this-output-of-lsof-command Related discussion links at upstream: https://bugzilla.gnome.org/777997 gamin server deadlock (PATCH) https://bugzilla.gnome.org/667230 gamin server deadlock (PATCH) https://bugs.kde.org/375301 amarok stuck: lll_lock_wait () *** Bug 1418326 has been marked as a duplicate of this bug. *** Another 'no NFS here'.. The hang I keep seeing is with ktimetracker after a long idle time. Stack trace in #1418326. I'm getting this almost every day. Will add open file descriptors next time it happens. Well that didn't take long.. [rstory ~]$ ps -ef|grep time rstory 22675 29034 0 21:20 pts/7 00:00:00 grep --color=auto time rstory 32266 1 0 09:51 ? 00:00:13 /usr/bin/ktimetracker [rstory ~]$ strace -p 32266 strace: Process 32266 attached write(9, "\n\0\1\0\213\0\3\0\0\0", 10^Cstrace: Process 32266 detached <detached ...> [rstory ~]$ lsof -p 32266 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ktimetrac 32266 rstory 0r FIFO 0,10 0t0 11265228 pipe ktimetrac 32266 rstory 1w FIFO 0,10 0t0 52852 pipe ktimetrac 32266 rstory 2w FIFO 0,10 0t0 52853 pipe ktimetrac 32266 rstory 3u a_inode 0,11 0 8405 [eventfd] ktimetrac 32266 rstory 4r FIFO 0,10 0t0 11265255 pipe ktimetrac 32266 rstory 5u unix 0xffff9fb1cdace400 0t0 11265242 type=STREAM ktimetrac 32266 rstory 6w FIFO 0,10 0t0 11265255 pipe ktimetrac 32266 rstory 7u unix 0xffff9fb1f3174000 0t0 11265256 type=STREAM ktimetrac 32266 rstory 8u unix 0xffff9fb2770c7400 0t0 11265258 type=STREAM ktimetrac 32266 rstory 9u unix 0xffff9fb191893c00 0t0 11265446 type=STREAM ktimetrac 32266 rstory 10u unix 0xffff9fb245d59000 0t0 55480 type=STREAM ktimetrac 32266 rstory 11r a_inode 0,11 0 8405 inotify ktimetrac 32266 rstory 12u unix 0xffff9fb1c8f69800 0t0 11265536 type=STREAM ktimetrac 32266 rstory 18r FIFO 0,10 0t0 71224 pipe ktimetrac 32266 rstory 19w FIFO 0,10 0t0 71224 pipe ktimetrac 32266 rstory 20r FIFO 0,10 0t0 71225 pipe ktimetrac 32266 rstory 21w FIFO 0,10 0t0 71225 pipe ktimetrac 32266 rstory 39r REG 0,43 0 262264 /home/rstory/.local/share/baloo/index [root ~]# ps -ef|grep gam rstory 2632 1 0 Jan30 ? 00:00:00 /usr/libexec/gam_server root 26394 27290 0 21:34 pts/1 00:00:00 grep --color=auto gam [root ~]# strace -p 2632 strace: Process 2632 attached write(11, "6\0\1\0\1\0\2\0,\0plasma-org.kde.plasma."..., 54 ^Cstrace: Process 2632 detached <detached ...> [root ~]# lsof -p 2632 |grep 11 gam_serve 2632 rstory 3r a_inode 0,11 0 8405 inotify gam_serve 2632 rstory 4u a_inode 0,11 0 8405 [eventfd] gam_serve 2632 rstory 11u unix 0xffff9fb245d58c00 0t0 55481 @/tmp/fam-rstory-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ type=STREAM gam_serve 2632 rstory 13u unix 0xffff9fb2539b0c00 0t0 64117 @/tmp/fam-rstory-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ type=STREAM (gdb) bt #0 0x00007fd7f33a3c30 in __write_nocancel () from /lib64/libc.so.6 #1 0x0000557c39615f96 in gam_client_conn_write () #2 0x0000557c396166e8 in gam_send_event () #3 0x0000557c3961854d in gam_eq_flush () #4 0x0000557c3961605a in gam_connection_eq_flush () #5 0x00007fd7f36bc88d in g_timeout_dispatch () from /lib64/libglib-2.0.so.0 #6 0x00007fd7f36bbe42 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #7 0x00007fd7f36bc1c0 in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0 #8 0x00007fd7f36bc4e2 in g_main_loop_run () from /lib64/libglib-2.0.so.0 #9 0x0000557c39611365 in main () Created attachment 1247044 [details] strace amarok (In reply to Sandro Mani from comment #53) > Can you reproduce it inside strace? Something like > $ strace -f -o strace.log <program> See attachment. I have attached only bottom part of $ strace -f -o strace_amarok.log amarok because cleaning a 50MB log file from personal infos requires too much time. (In reply to Juha Tuomala from comment #54) > (In reply to Germano Massullo from comment #52) > > To add details to my use case, my system does *not* run NFS server and/or > > client, so the bug is not NFS related. > > If you have a problematic program, have you looked its file descriptor > statuses? > > $ lsof | grep <program name> > > there were some good switches for lsof that I already forgot. Anyway, I > noticed that some hanging programs had constant 'cwd' for example. Not sure > was that the hangpoint or not. > > http://unix.stackexchange.com/questions/60422/how-to-interpret-this-output- > of-lsof-command I will do it as soon as possible Afraid that strace log is truncated too much since one cannot see where the fds originate from. (In reply to Sandro Mani from comment #61) > Afraid that strace log is truncated too much since one cannot see where the > fds originate from. Could you please provide details as much as possible so that I can better search into logs? Created attachment 1247573 [details]
Screenshot 1
Look at dbus-daemon RAM usage, and the CPU usageof other processes
Created attachment 1247574 [details]
Screenshot 2
The interesting bit is to trace to which socket/file/pipe the fd belongs to. In the full strace log you can track back the fd number to see the syscall which opened it. Or doing an $ ls -l /proc/$pid/fd also shows the open fds. (In reply to Sandro Mani from comment #65) > The interesting bit is to trace to which socket/file/pipe the fd belongs to. > In the full strace log you can track back the fd number to see the syscall > which opened it. Or doing an > > $ ls -l /proc/$pid/fd > > also shows the open fds. Funny, I ran for example: ls -l /proc/$(pidof kded5)/fd/ and that lists 21 files (symbolic links), which all - but two dri related ones - are broken. That cant mean any good? On kded4 more or less the same, but not that many are broken. If those are sockets / pipes etc as far as my knowledge goes that is normal, those need to be interpreted in a different way. I.e. for pipes http://superuser.com/questions/401066/find-what-process-is-on-the-other-end-of-a-pipe (In reply to Sandro Mani from comment #67) > If those are sockets / pipes etc as far as my knowledge goes that is normal, > those need to be interpreted in a different way. I.e. for pipes > > http://superuser.com/questions/401066/find-what-process-is-on-the-other-end- > of-a-pipe Thanks, I'm trying to collect these useful notes to wiki https://fedoraproject.org/wiki/KDE/Debugging feel free to help. We have seen the exact same issue but in our case it was in gam_api.c -> gamin_write_byte coming from FAMCancelMonitor or similar. We're on Centos 7 with gamin-0.1.10-16, but even 0.1.10-29 from rawhide did not fix our issue. I think it has the same symptoms as the issue depicted here, so the solution might be the same. In our case adding a 'select' to check for blocking on the fd in front of the write does fix all the freezes (in gamin_write_byte). There is a TODO in gam_client_conn_write that says it should check if it is blocking or use non blocking IO. This might be a hint that both issues might require the same fix. Currently I've only patched the area that causes our failure, but I could apply the change also to the gam_client_conn_write if required. Blazej, mind sharing a patch that implements what you suggest? Sorry I currently don't have access to the code. But in our case it turned out that the issues we're due to a miss-configured firewall. Turning it off fixed the hangs we saw. Not sure if the 'select' at the mentioned calls is a good idea at this point. This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle. Changing version to '28'. This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. FEDORA-2019-a746ac9c89 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-a746ac9c89 FEDORA-2019-39d23c7a94 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-39d23c7a94 kde-settings-30.3-1.fc30, kdelibs-4.14.38-15.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-a746ac9c89 kde-settings-29.1-1.fc29, kdelibs-4.14.38-15.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-39d23c7a94 kde-settings-30.3-1.fc30, kdelibs-4.14.38-15.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report. kde-settings-29.1-1.fc29, kdelibs-4.14.38-15.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report. |