Bug 1104400 - Backport a fix for the bug in glibc 2.18/2.19's pthread_cond_broadcast()
Summary: Backport a fix for the bug in glibc 2.18/2.19's pthread_cond_broadcast()
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 20
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-03 22:43 UTC by hkimura
Modified: 2016-11-24 12:19 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-06-30 01:29:05 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
syslog excerpt (53.84 KB, text/plain)
2014-10-02 08:56 UTC, Alexander Ploumistos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 17013 0 P2 RESOLVED pthread_cond_broadcast could call lll_unlock() twice, breaking the shared data 2020-12-23 18:54:05 UTC

Description hkimura 2014-06-03 22:43:26 UTC
Description of problem:
pthread_cond_broadcast could call lll_unlock() twice, breaking the shared data.
For example, it can cause infinite wait in pthread_cond_broadcast.

Version-Release number of selected component (if applicable):
glibc 2.18 and 2.19 are affected.
glibc 2.17 is fine, so Fedora 19 is not affected.
Fedora 20 has this issue and most likely Fedora 21 will, too.
Upcoming glibc 2.20 contains a fix for this issue.


How reproducible:
Occasionally. It happens as a result of race condition between waitor and signal-er.

Steps to Reproduce:
  - Several waitors wait on pthread_cond_wait with holding mutex.
  - One signal-er calls broadcast with holding mutex

Actual results:
The thread that invoked pthread_cond_broadcast occasionally hangs.
When I attach gdb, the hanging thread is waiting at lll_lock_wait() even though there are no concurrent threads.

Expected results:
pthread_cond_broadcast should exit immediately.

Additional info:
The bug is already fixed in glibc's source code on April 2014:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=8f630cca5c36941db1cb48726016bbed80ec1041

glibc's bugzilla ticket:
https://sourceware.org/bugzilla/show_bug.cgi?id=17013

This is a backport request because Fedora will employ glibc 2.20 much much later.

Comment 1 Siddhesh Poyarekar 2014-10-01 09:35:58 UTC
I've pushed the patch, so the next update will have this fix.

Comment 2 Alexander Ploumistos 2014-10-02 08:56:05 UTC
Created attachment 943335 [details]
syslog excerpt

Could you tell me if that is what I am seeing, or do I need to open a new ticket?

After a system update on 9/30 conky started crashing occasionaly and at the same time gnome-shell and gnome-terminal behave erratically and firefox takes too long to close tabs and refuses to quit.

These are the packages that were updated:
Sep 30 09:45:28 Updated: glibc-common-2.18-16.fc20.x86_64
Sep 30 09:45:32 Updated: glibc-2.18-16.fc20.x86_64
Sep 30 09:45:33 Updated: 2:libwbclient-4.1.12-4.fc20.x86_64
Sep 30 09:45:34 Updated: 2:samba-libs-4.1.12-4.fc20.x86_64
Sep 30 09:45:35 Updated: 2:samba-common-4.1.12-4.fc20.x86_64
Sep 30 09:45:38 Updated: 1:xscreensaver-base-5.30-4.fc20.x86_64
Sep 30 09:45:39 Updated: gnutls-3.1.26-2.fc20.x86_64
Sep 30 09:45:39 Updated: 1:xscreensaver-extras-base-5.30-4.fc20.x86_64
Sep 30 09:45:40 Updated: gnutls-dane-3.1.26-2.fc20.x86_64
Sep 30 09:45:40 Updated: 1:xscreensaver-gl-base-5.30-4.fc20.x86_64
Sep 30 09:45:41 Updated: 2:libsmbclient-4.1.12-4.fc20.x86_64
Sep 30 09:45:42 Updated: 2:samba-winbind-modules-4.1.12-4.fc20.x86_64
Sep 30 09:45:43 Updated: 2:samba-winbind-4.1.12-4.fc20.x86_64
Sep 30 09:45:43 Updated: ibus-anthy-python-1.5.6-1.fc20.noarch
Sep 30 09:45:57 Updated: ibus-anthy-1.5.6-1.fc20.x86_64
Sep 30 09:45:59 Updated: glibc-headers-2.18-16.fc20.x86_64
Sep 30 09:46:00 Updated: glibc-devel-2.18-16.fc20.x86_64
Sep 30 09:46:00 Updated: 2:samba-winbind-clients-4.1.12-4.fc20.x86_64
Sep 30 09:46:02 Updated: 2:samba-client-4.1.12-4.fc20.x86_64
Sep 30 09:46:04 Updated: 1:xscreensaver-gl-extras-5.30-4.fc20.x86_64
Sep 30 09:46:04 Updated: gnutls-utils-3.1.26-2.fc20.x86_64
Sep 30 09:46:06 Updated: 1:xscreensaver-extras-5.30-4.fc20.x86_64
Sep 30 09:46:06 Updated: libcmis-0.4.1-5.fc20.x86_64
Sep 30 09:46:07 Updated: libseccomp-2.1.1-0.fc20.x86_64
Sep 30 09:46:08 Updated: liblangtag-0.5.4-4.fc20.x86_64
Sep 30 09:46:08 Updated: perl-Data-Dumper-2.154-1.fc20.x86_64
Sep 30 09:46:09 Updated: automake-1.13.4-6.fc20.noarch
Sep 30 09:46:11 Updated: glibc-2.18-16.fc20.i686
Sep 30 09:46:12 Updated: gnutls-3.1.26-2.fc20.i686
Sep 30 09:49:36 Updated: gnome-chemistry-utils-0.14.9-2.fc20.x86_64
Sep 30 10:17:21 Updated: goffice-0.10.18-1.fc20.x86_64
Sep 30 10:17:23 Updated: ca-certificates-2014.2.1-1.1.fc20.noarch

Comment 3 hkimura 2014-10-03 22:24:23 UTC
Thanks for backporting it, Siddhesh.

Alex, I'm not the expert on this, so this is just my guess from your syslog.

The first seemingly related logs look like fork failure with "Resource temporarily unavailable". If I were you, I'd suspect the value of ulimit -u and -e as well as out-of-stack-memory in this case (isn't "ulimit -s" too big, aren't there too many processes/threads, etc).

I skimmed the rest of the syslog. I'd say there are too few information to figure out whether this is caused by this particular bug in glibc. Usually you have to attach a debugger unless the application happens to write out super-detailed info in syslog about the cause of errors (which doesn't seem the case this time).

Comment 4 Alexander Ploumistos 2014-10-04 09:09:18 UTC
I wouldn't mind launching these programs with strace or gdb, but I can't figure a way to reproduce the problem. The first time it happened I was reading a long article in firefox and didn't notice things had gone haywire until I was done; the second time I was waiting for an emerge to complete on a gentoo system via ssh, I detached and rebooted. Some other times I wasn't even sitting in front of my monitor.

These are the limits for my user:
$ ulimit -s
8192
$ ulimit -u
1024
$ ulimit -e
0

I have run the system for hours, sometimes with many processes running concurrently (virtual machines, LAMP, ncurses based stuff, etc.), with no hitch. On a couple of other systems, but which are 32-bit, I haven't noticed similar crashes after the glibc update.

I guess I'll have to wait for Siddhesh's patch to get packaged and see how that goes.

Thank you very much for the input.

Comment 5 Siddhesh Poyarekar 2014-10-06 07:00:13 UTC
(In reply to Alexander Ploumistos from comment #2)
> After a system update on 9/30 conky started crashing occasionaly and at the
> same time gnome-shell and gnome-terminal behave erratically and firefox
> takes too long to close tabs and refuses to quit.

If this is the update that started causing problems then it is not this bug.  This bug has been present since F-20 release.

Comment 6 Alexander Ploumistos 2014-10-06 08:13:55 UTC
After posting here, I have been working on this computer for 12+ hours every day and I have not had this issue again. I've been monitoring my logs and there was no message like those I posted. Any update that I have installed since then doesn't seem to be even remotely relevant. I haven't been work with virt-manager much, but I can't see how virt-manager could cause conky to crash or mess with my terminal.

Anyway, I'll keep paying attention and if I manage to gather something meaningful, I 'll file a fresh bug report.

Thank you both for your time.

Comment 7 Alexander Ploumistos 2014-10-06 08:19:41 UTC
Oops, "I haven't been *working*[..]".

Comment 8 Fedora End Of Life 2015-05-29 12:01:17 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Fedora End Of Life 2015-06-30 01:29:05 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.