Bug 985484 - Deadlock in linux_udev_event_thread_main at os/linux_udev.c:153
Deadlock in linux_udev_event_thread_main at os/linux_udev.c:153
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: libusbx (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Hans de Goede
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-17 11:17 EDT by Sandro Mani
Modified: 2013-08-09 13:14 EDT (History)
7 users (show)

See Also:
Fixed In Version: libusbx-1.0.16-3.fc19
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-02 12:42:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Backtrace (6.64 KB, text/plain)
2013-07-17 11:17 EDT, Sandro Mani
no flags Details
Patch fixing the deadlock (4.99 KB, patch)
2013-07-19 05:10 EDT, Hans de Goede
no flags Details | Diff
backtrace (5.16 KB, text/plain)
2013-07-25 06:50 EDT, Sandro Mani
no flags Details

  None (edit)
Description Sandro Mani 2013-07-17 11:17:12 EDT
Created attachment 774819 [details]
Backtrace

Description of problem:
Since upgrading to libusbx-1.0.16-1.fc20.x86_64, upowerd occasionally hangs (leading for instance to most KDE applications suffering a 25sec startup delay, until the dbus call to upowerd times out). Retrieving a backtrace from upowerd shows that the culprit appears to be libusbx. Backtrace is attached.

Version-Release number of selected component (if applicable):
libusbx-1.0.16-1.fc20.x86_64

How reproducible:
Occasionally

Steps to Reproduce:
1. Difficult, appears more or less daily, possibly related to events such as suspend/resume.
Comment 1 Hans de Goede 2013-07-17 14:02:52 EDT
Hi,

Thanks for the bug report! I'm going on vacation for a week starting tomorrow so I don't have time to
look into this myself atm. I've forwarded this bug to the upstream libusbx-devel mailinglist. If upstream does not figure things out while I'm away I'll look into this myself when I'm back.

Regards,

Hans
Comment 2 Sandro Mani 2013-07-17 14:11:47 EDT
Ok, thanks! Have a nice vacation.
Comment 3 Hans de Goede 2013-07-19 05:10:56 EDT
Created attachment 775695 [details]
Patch fixing the deadlock

Hi,

So sending this bugreport upstream helped, someone pointed out the cause, and this morning I had some inspiration how to fix this.

This patch I'm attaching is the result of this, and should fix your issue. I wanted to do a new build for you with the fix in, but I cannot due to buildsys maintenance: https://fedorahosted.org/fedora-infrastructure/ticket/3882

So if you've some experience in building packages yourself it would be great if you could build it yourself, the changes are in pkgs.fedoraproject.org libusbx module master branch, so you just need to do:

As root:
yum install fedpkg

As user:
fedpkg clone --anonymous libusbx
cd libusbx
fedpkg local

And then you should get an x86_64 dir under the libusbx dir with new packages.

And now I'm really really really leaving for vacation (we depart in about an hour), see you in a week!

Regards,

Hans
Comment 4 Sandro Mani 2013-07-19 06:15:58 EDT
Hi Hans,
Thanks a lot! I'm familiar with moch & fedpkg & co., I'll test this right away and then report back in a few days.

Enjoy!
Comment 5 Sandro Mani 2013-07-23 07:01:41 EDT
Looks good, I haven't encountered the issue again since applying the patch. Thanks again!
Comment 6 Sandro Mani 2013-07-25 06:50:54 EDT
Created attachment 778202 [details]
backtrace

Unfortunately there is now a new deadlock, I've attached a new backtrace. (Maybe I'll also find a minute myself to look at the code).
Comment 7 Sandro Mani 2013-07-27 17:43:54 EDT
Without knowing the udev/libusbx internals, it appears that
udev_monitor_receive_device
may end up calling
linux_udev_scan_devices


linux_hotplug_lock is locked before
udev_monitor_receive_device
is called, and
linux_udev_scan_devices
also tries to lock that mutex, hence the deadlock.
Comment 8 Sandro Mani 2013-07-29 14:42:41 EDT
Just for info: I've been running for the past two days a patched version with the

usbi_mutex_static_lock(&linux_hotplug_lock);

and

usbi_mutex_static_unlock(&linux_hotplug_lock);

calls commented out in linux_udev.c@linux_udev_event_thread_main, and things seem to work. Whether that is a proper fix however is another question.
Comment 9 Hans de Goede 2013-07-30 11:17:44 EDT
Hi,

Thanks for all the testing and the new backtrace. There indeed was another deadlock. I've done a new build which should fix both, you can find it here:
http://koji.fedoraproject.org/koji/buildinfo?buildID=439434

Please give this version a try, it should solve all the issues you are seeing.

Thanks,

Hans
Comment 10 Sandro Mani 2013-07-30 11:22:07 EDT
Thanks! Will report back in a few days.
Comment 11 Sandro Mani 2013-08-02 12:42:13 EDT
Looking good! I'll close this. Thanks a lot!
Comment 12 Fedora Update System 2013-08-03 09:09:37 EDT
libusbx-1.0.16-3.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/libusbx-1.0.16-3.fc19
Comment 13 Karoly Negyesi 2013-08-04 16:08:25 EDT
I can reliably reproduce this by undocking and re-docking my Lenovo T420 laptop. The patch does not help (I am on Arch Linux but that probably doesn't matter). When upower is deadlocked:

#0  0x00007fa24fd3c00c in __lll_lock_wait () from /usr/lib/libpthread.so.0
#1  0x00007fa24fd37e86 in _L_lock_507 () from /usr/lib/libpthread.so.0
#2  0x00007fa24fd37cda in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#3  0x00007fa2517f1816 in linux_udev_event_thread_main () from /usr/lib/libusb-1.0.so.0
#4  0x00007fa24fd35dd2 in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007fa25079acdd in clone () from /usr/lib/libc.so.6

Interestingly, attaching an strace to upower before it's deadlocked makes the deadlock go away.
Comment 14 Karoly Negyesi 2013-08-04 16:31:19 EDT
Nevermind: I figured that the patch attached to the issue is just one patch but there needs to be two and I have extracted both

0001-linux-Use-a-separate-lock-to-serialize-start-stop-vs.patch
0002-hotplug-Remove-use-of-pthread_cancel-from-linux_udev.patch

and can confirm it fixes the problem!
Comment 15 Fedora Update System 2013-08-09 13:14:30 EDT
libusbx-1.0.16-3.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.