Bug 742586

Summary: System occasionally hangs/livelocks on many platforms for us
Product: Red Hat Enterprise Linux 6 Reporter: Cort Dougan <cort>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED DUPLICATE QA Contact: Caspar Zhang <czhang>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: arfernan, czhang, jhunt, qcai
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-20 19:16:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 782183, 840683    
Attachments:
Description Flags
program to demonstrate problem none

Description Cort Dougan 2011-09-30 16:11:51 UTC
Description of problem:
System livelocks occasionally


Version-Release number of selected component (if applicable):
rhel 6.1

How reproducible:
Once or more per day

Steps to Reproduce:
run attached program in a loop to detect/demonstrate
while true; do ./futex ; if [ $? != 0 ] ; then date;break;fi done

The program will sleep for about 1 second at a time, if it detects that it has slept more than that it will return an error.  Occasionally it reports 30... 60... 90 seconds because the system has become non-responsive for a period of time.  I can see the same behavior interactively - console is non-responsive, X is non-responsive/slow and polkitd/rtkitd report scheduling problems in /var/log/messages at the same time.

Cannot reproduce with rhel 6.0, 5.x, 4.x or others.
  
Actual results:


Expected results:


Additional info:

Comment 2 Cort Dougan 2011-09-30 17:11:41 UTC
Created attachment 525801 [details]
program to demonstrate problem

Comment 3 Cort Dougan 2011-09-30 17:36:29 UTC
Contents of /var/log/messages when this happens:
Sep 30 12:30:04 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action.
Sep 30 12:30:04 localhost rtkit-daemon[2615]: Demoting known real-time threads.
Sep 30 12:30:04 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)).
Sep 30 12:30:04 localhost rtkit-daemon[2615]: Demoted 1 threads.
Sep 30 12:32:14 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action.
Sep 30 12:32:14 localhost rtkit-daemon[2615]: Demoting known real-time threads.
Sep 30 12:32:14 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)).
Sep 30 12:32:14 localhost rtkit-daemon[2615]: Demoted 1 threads.
Sep 30 12:32:29 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action.
Sep 30 12:32:29 localhost rtkit-daemon[2615]: Demoting known real-time threads.
Sep 30 12:32:29 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)).
Sep 30 12:32:29 localhost rtkit-daemon[2615]: Demoted 1 threads.

Comment 4 RHEL Program Management 2011-10-07 15:52:24 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Larry Woodman 2012-02-14 20:37:45 UTC
Any specific hardware I need to run this on or any other workloads I need to run at the same time to reproduce this problem?  I have been trying for several minutes without hitting the failure and assocuated hang/livelock yet.

Larry Woodman

Comment 7 Cort Dougan 2012-02-14 20:51:44 UTC
This seems to be a duplicate of:

https://bugzilla.redhat.com/show_bug.cgi?id=710265

We used the same technique listed there as a work-around and it worked for us and our customer.  The later versions of RHEL (6.1/6.2) seem to not have the same problem.

It only showed on Intel hardware and would sometimes take 30 minutes to appear.  Not specific workload was necessary - in fact idle time tended to bring it on.

Comment 10 Suzanne Logcher 2012-05-18 20:50:55 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 11 linferna 2012-08-16 01:04:18 UTC
I just did a fresh installation RHEL 6.3 (2.6.32-279.2.1.el6.x86_64) I see 'niced' pulseaudio process:

Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3069 of process 3069 (/usr/bin/pulseaudio) owned by '500' high priority at nice level -11.
Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3072 of process 3069 (/usr/bin/pulseaudio) owned by '500' RT at priority 5.
Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3073 of process 3069 (/usr/bin/pulseaudio) owned by '500' RT at priority 5.
Aug 15 20:18:17 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3124 of process 3124 (/usr/bin/pulseaudio) owned by '500' high priority at nice level -11.

1. what rtkit-daemon has to do with pulseaudio?

2. Is this a bug or a feature?

3. Could this bring my server produce poor performance after a while?

Comment 12 Larry Woodman 2012-09-20 19:16:48 UTC

*** This bug has been marked as a duplicate of bug 728315 ***