Bug 742586 - System occasionally hangs/livelocks on many platforms for us
Summary: System occasionally hangs/livelocks on many platforms for us
Keywords:
Status: CLOSED DUPLICATE of bug 728315
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Larry Woodman
QA Contact: Caspar Zhang
URL:
Whiteboard:
Depends On:
Blocks: 782183 840683
TreeView+ depends on / blocked
 
Reported: 2011-09-30 16:11 UTC by Cort Dougan
Modified: 2018-11-30 23:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-20 19:16:48 UTC
Target Upstream Version:


Attachments (Terms of Use)
program to demonstrate problem (1.51 KB, text/x-csrc)
2011-09-30 17:11 UTC, Cort Dougan
no flags Details

Description Cort Dougan 2011-09-30 16:11:51 UTC
Description of problem:
System livelocks occasionally


Version-Release number of selected component (if applicable):
rhel 6.1

How reproducible:
Once or more per day

Steps to Reproduce:
run attached program in a loop to detect/demonstrate
while true; do ./futex ; if [ $? != 0 ] ; then date;break;fi done

The program will sleep for about 1 second at a time, if it detects that it has slept more than that it will return an error.  Occasionally it reports 30... 60... 90 seconds because the system has become non-responsive for a period of time.  I can see the same behavior interactively - console is non-responsive, X is non-responsive/slow and polkitd/rtkitd report scheduling problems in /var/log/messages at the same time.

Cannot reproduce with rhel 6.0, 5.x, 4.x or others.
  
Actual results:


Expected results:


Additional info:

Comment 2 Cort Dougan 2011-09-30 17:11:41 UTC
Created attachment 525801 [details]
program to demonstrate problem

Comment 3 Cort Dougan 2011-09-30 17:36:29 UTC
Contents of /var/log/messages when this happens:
Sep 30 12:30:04 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action.
Sep 30 12:30:04 localhost rtkit-daemon[2615]: Demoting known real-time threads.
Sep 30 12:30:04 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)).
Sep 30 12:30:04 localhost rtkit-daemon[2615]: Demoted 1 threads.
Sep 30 12:32:14 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action.
Sep 30 12:32:14 localhost rtkit-daemon[2615]: Demoting known real-time threads.
Sep 30 12:32:14 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)).
Sep 30 12:32:14 localhost rtkit-daemon[2615]: Demoted 1 threads.
Sep 30 12:32:29 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action.
Sep 30 12:32:29 localhost rtkit-daemon[2615]: Demoting known real-time threads.
Sep 30 12:32:29 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)).
Sep 30 12:32:29 localhost rtkit-daemon[2615]: Demoted 1 threads.

Comment 4 RHEL Program Management 2011-10-07 15:52:24 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Larry Woodman 2012-02-14 20:37:45 UTC
Any specific hardware I need to run this on or any other workloads I need to run at the same time to reproduce this problem?  I have been trying for several minutes without hitting the failure and assocuated hang/livelock yet.

Larry Woodman

Comment 7 Cort Dougan 2012-02-14 20:51:44 UTC
This seems to be a duplicate of:

https://bugzilla.redhat.com/show_bug.cgi?id=710265

We used the same technique listed there as a work-around and it worked for us and our customer.  The later versions of RHEL (6.1/6.2) seem to not have the same problem.

It only showed on Intel hardware and would sometimes take 30 minutes to appear.  Not specific workload was necessary - in fact idle time tended to bring it on.

Comment 10 Suzanne Logcher 2012-05-18 20:50:55 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 11 linferna 2012-08-16 01:04:18 UTC
I just did a fresh installation RHEL 6.3 (2.6.32-279.2.1.el6.x86_64) I see 'niced' pulseaudio process:

Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3069 of process 3069 (/usr/bin/pulseaudio) owned by '500' high priority at nice level -11.
Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3072 of process 3069 (/usr/bin/pulseaudio) owned by '500' RT at priority 5.
Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3073 of process 3069 (/usr/bin/pulseaudio) owned by '500' RT at priority 5.
Aug 15 20:18:17 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3124 of process 3124 (/usr/bin/pulseaudio) owned by '500' high priority at nice level -11.

1. what rtkit-daemon has to do with pulseaudio?

2. Is this a bug or a feature?

3. Could this bring my server produce poor performance after a while?

Comment 12 Larry Woodman 2012-09-20 19:16:48 UTC

*** This bug has been marked as a duplicate of bug 728315 ***


Note You need to log in before you can comment on or make changes to this bug.