Hide Forgot
Description of problem: System livelocks occasionally Version-Release number of selected component (if applicable): rhel 6.1 How reproducible: Once or more per day Steps to Reproduce: run attached program in a loop to detect/demonstrate while true; do ./futex ; if [ $? != 0 ] ; then date;break;fi done The program will sleep for about 1 second at a time, if it detects that it has slept more than that it will return an error. Occasionally it reports 30... 60... 90 seconds because the system has become non-responsive for a period of time. I can see the same behavior interactively - console is non-responsive, X is non-responsive/slow and polkitd/rtkitd report scheduling problems in /var/log/messages at the same time. Cannot reproduce with rhel 6.0, 5.x, 4.x or others. Actual results: Expected results: Additional info:
Created attachment 525801 [details] program to demonstrate problem
Contents of /var/log/messages when this happens: Sep 30 12:30:04 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action. Sep 30 12:30:04 localhost rtkit-daemon[2615]: Demoting known real-time threads. Sep 30 12:30:04 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)). Sep 30 12:30:04 localhost rtkit-daemon[2615]: Demoted 1 threads. Sep 30 12:32:14 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action. Sep 30 12:32:14 localhost rtkit-daemon[2615]: Demoting known real-time threads. Sep 30 12:32:14 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)). Sep 30 12:32:14 localhost rtkit-daemon[2615]: Demoted 1 threads. Sep 30 12:32:29 localhost rtkit-daemon[2615]: The canary thread is apparently starving. Taking action. Sep 30 12:32:29 localhost rtkit-daemon[2615]: Demoting known real-time threads. Sep 30 12:32:29 localhost rtkit-daemon[2615]: Successfully demoted thread 2613 of process 2613 (/usr/bin/pulseaudio (deleted)). Sep 30 12:32:29 localhost rtkit-daemon[2615]: Demoted 1 threads.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Any specific hardware I need to run this on or any other workloads I need to run at the same time to reproduce this problem? I have been trying for several minutes without hitting the failure and assocuated hang/livelock yet. Larry Woodman
This seems to be a duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=710265 We used the same technique listed there as a work-around and it worked for us and our customer. The later versions of RHEL (6.1/6.2) seem to not have the same problem. It only showed on Intel hardware and would sometimes take 30 minutes to appear. Not specific workload was necessary - in fact idle time tended to bring it on.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
I just did a fresh installation RHEL 6.3 (2.6.32-279.2.1.el6.x86_64) I see 'niced' pulseaudio process: Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3069 of process 3069 (/usr/bin/pulseaudio) owned by '500' high priority at nice level -11. Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3072 of process 3069 (/usr/bin/pulseaudio) owned by '500' RT at priority 5. Aug 15 20:18:16 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3073 of process 3069 (/usr/bin/pulseaudio) owned by '500' RT at priority 5. Aug 15 20:18:17 redhatsys2 rtkit-daemon[2887]: Sucessfully made thread 3124 of process 3124 (/usr/bin/pulseaudio) owned by '500' high priority at nice level -11. 1. what rtkit-daemon has to do with pulseaudio? 2. Is this a bug or a feature? 3. Could this bring my server produce poor performance after a while?
*** This bug has been marked as a duplicate of bug 728315 ***