RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 836803 - RHEL6: Potential fix for leapsecond caused futex related load spikes
Summary: RHEL6: Potential fix for leapsecond caused futex related load spikes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Dong Zhu
URL:
Whiteboard:
Depends On:
Blocks: 782183 840683 847364 847365 847366 1300182
TreeView+ depends on / blocked
 
Reported: 2012-07-01 14:43 UTC by Prarit Bhargava
Modified: 2018-12-03 17:40 UTC (History)
25 users (show)

Fixed In Version: kernel-2.6.32-298.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-21 06:29:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Current upstream test for hrtimer expiration (5.11 KB, text/plain)
2012-07-11 20:13 UTC, Prarit Bhargava
no flags Details
RHEL PATCH 1/7 (3.76 KB, patch)
2012-08-10 13:56 UTC, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 2/7 (2.56 KB, patch)
2012-08-10 13:56 UTC, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 3/7 (3.61 KB, patch)
2012-08-10 13:56 UTC, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 4/7 (2.99 KB, patch)
2012-08-10 13:57 UTC, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 5/7 (3.40 KB, patch)
2012-08-10 13:57 UTC, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 6/7 (4.49 KB, patch)
2012-08-10 13:57 UTC, Prarit Bhargava
no flags Details | Diff
RHEL PATCH 7/7 (2.08 KB, patch)
2012-08-10 13:57 UTC, Prarit Bhargava
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 154793 0 None None None 2012-07-02 08:12:16 UTC
Red Hat Product Errata RHSA-2013:0496 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6 kernel update 2013-02-20 21:40:54 UTC

Description Prarit Bhargava 2012-07-01 14:43:58 UTC
Description of problem: After the leap second on June 30, 2012, load spikes were noticed in userspace.  After some debugging it was noticed that futexes were timing out which was causing CPU loads to increase dramatically.

[FWIW: I noticed this myself yesterday night.  My firefox suddenly consumed 98.9% of the CPU shortly after the leap second.  Resetting the date and restarting firefox resolved the problem.


Version-Release number of selected component (if applicable): 
2.6.32-279

How reproducible:  Unknown at this time.  Probably fairly high.


Steps to Reproduce:  A reproducer is available here, 

http://marc.info/?l=linux-kernel&m=134113615122011&w=2
  
Actual results:  userspace programs consume ~100% of CPU time because of futex timeouts.


Expected results:  Futexes should not timeout.


Additional info:  http://marc.info/?l=linux-kernel&m=134113577921904&w=2 has an RFC patch and reproducer attached.

Comment 1 Prarit Bhargava 2012-07-01 14:44:39 UTC
Working on a backport now.  Backport is likely to depend on fix for bug 836748.

P.

Comment 2 RHEL Program Management 2012-07-01 14:50:47 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 3 Prarit Bhargava 2012-07-06 15:25:53 UTC
The current situation is as follows:  A patchset has been posted upstream,

http://marc.info/?l=linux-kernel&m=134138316402296&w=2

which has an Acked-by: me.  I've tested this patchset on an upstream kernel using the following tests:

1.  A leap second test I wrote (but which is VERY similar to)

2. http://marc.info/?l=linux-kernel&m=134116789230177&w=2, and

3.  http://marc.info/?l=linux-kernel&m=134116789230177&w=2 with and without the "-s" option.

So far all tests have been successful.  I am now doing a wider test for RHEL6 with a signficant backport of patches to the kernel/time/timekeeping.c code + the patches from upstream.  This testing is ongoing, however, thus far boot testing has not picked up any issues.

A smaller patchset has been identified as well that will resolve only the leapsecond issue, but still leaves the clock code susceptible to some other smaller races and issues.  I've decided to go with the larger set for the sake of completion -- besides, we're changing core code already and I don't see any reason not to broaden the changes at this point.

Watch here for further BZ updates.

P.

Comment 6 Prarit Bhargava 2012-07-11 20:11:49 UTC
upstream kernel testing ....
[root@intel-canoepass-05 tmp]# uname -a
Linux intel-canoepass-05.lab.bos.redhat.com 3.4.4-5.fc17.x86_64 #1 SMP Thu Jul 5 20:20:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@intel-canoepass-05 tmp]# ./leap-a-day -s

Setting time to Sun Jul 22 20:00:00 2012
Scheduling leap second for Sun Jul 22 20:00:00 2012
Sun Jul 22 19:59:57 2012 + 500283 us    TIME_INS
Sun Jul 22 19:59:58 2012 +    521 us    TIME_INS
Sun Jul 22 19:59:58 2012 + 500770 us    TIME_INS
Sun Jul 22 19:59:59 2012 +   1011 us    TIME_INS
Sun Jul 22 19:59:59 2012 + 501289 us    TIME_INS
Sun Jul 22 19:59:59 2012 +   6806 us    TIME_OOP
Sun Jul 22 19:59:59 2012 + 506946 us    TIME_OOP
Sun Jul 22 20:00:00 2012 +   7143 us    TIME_WAIT
Sun Jul 22 20:00:00 2012 + 507274 us    TIME_WAIT
Sun Jul 22 20:00:01 2012 +   7516 us    TIME_WAIT
Sun Jul 22 20:00:01 2012 + 507652 us    TIME_WAIT
Sun Jul 22 20:00:02 2012 +   7898 us    TIME_WAIT
Note: hrtimer early expiration failure observed.
Leap complete


.............................................................................

Modified kernel with upstream patches:

[root@intel-canoepass-05 tmp]# uname -a
Linux intel-canoepass-05.lab.bos.redhat.com 3.5.0-rc6+ #2 SMP Wed Jul 11 14:51:10 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@intel-canoepass-05 tmp]# 

[root@intel-canoepass-05 tmp]# ./leap-a-day -s
This runs continuously. Press ctrl-c to stop
Setting time to speed up testing

Setting time to Wed Jul 11 20:00:00 2012
Scheduling leap second for Wed Jul 11 20:00:00 2012
Something woke us up, returning to sleep
Wed Jul 11 19:59:50 2012 + 746240 us    TIME_OK
Wed Jul 11 19:59:51 2012 + 246487 us    TIME_INS
Wed Jul 11 19:59:51 2012 + 746702 us    TIME_INS
Wed Jul 11 19:59:52 2012 + 246965 us    TIME_INS
Wed Jul 11 19:59:52 2012 + 747181 us    TIME_INS
Wed Jul 11 19:59:53 2012 + 247454 us    TIME_INS
Wed Jul 11 19:59:53 2012 + 747677 us    TIME_INS
Wed Jul 11 19:59:54 2012 + 247885 us    TIME_INS
Wed Jul 11 19:59:54 2012 + 748096 us    TIME_INS
Wed Jul 11 19:59:55 2012 + 248371 us    TIME_INS
Wed Jul 11 19:59:55 2012 + 748623 us    TIME_INS
Wed Jul 11 19:59:56 2012 + 248886 us    TIME_INS
Wed Jul 11 19:59:56 2012 + 749087 us    TIME_INS
Wed Jul 11 19:59:57 2012 + 249357 us    TIME_INS
Wed Jul 11 19:59:57 2012 + 749597 us    TIME_INS
Wed Jul 11 19:59:58 2012 + 249793 us    TIME_INS
Wed Jul 11 19:59:58 2012 + 750003 us    TIME_INS
Wed Jul 11 19:59:59 2012 + 250274 us    TIME_INS
Wed Jul 11 19:59:59 2012 + 750494 us    TIME_INS
Wed Jul 11 19:59:59 2012 + 250728 us    TIME_OOP
Wed Jul 11 19:59:59 2012 + 750938 us    TIME_OOP
Wed Jul 11 20:00:00 2012 + 251203 us    TIME_WAIT
Wed Jul 11 20:00:00 2012 + 751426 us    TIME_WAIT
Wed Jul 11 20:00:01 2012 + 251631 us    TIME_WAIT
Wed Jul 11 20:00:01 2012 + 751845 us    TIME_WAIT
Wed Jul 11 20:00:02 2012 + 252118 us    TIME_WAIT
Leap complete

Comment 7 Prarit Bhargava 2012-07-11 20:13:12 UTC
Created attachment 597663 [details]
Current upstream test for hrtimer expiration

Comment 8 Prarit Bhargava 2012-07-12 13:07:19 UTC
I've run several systems with a modified upstream kernel and the futex patchset and haven't seen any failures in 18+ hours.

P.

Comment 9 Prarit Bhargava 2012-07-16 12:54:30 UTC
Upstream patches are currently in tip.

P.

Comment 12 Prarit Bhargava 2012-07-28 12:12:36 UTC
I've put together a set of patches (that depend on BZ 836748) and have started testing across a large set of systems using the test case previously provided in this BZ.  I will update the BZ with testing results, and the patches after my initial testing is complete.

P.

Comment 13 Scott McCarty 2012-07-30 13:21:16 UTC
    (In reply to comment #2)
> This request was evaluated by Red Hat Product Management for
> inclusion in a Red Hat Enterprise Linux release.  Product
> Management has requested further review of this request by
> Red Hat Engineering, for potential inclusion in a Red Hat
> Enterprise Linux release for currently deployed products.
> This request is not yet committed for inclusion in a release.

Is there an estimated time for inclusion of this as an official errata?  There are several RHEL customers wanting this errata to come out.

Scott McCarty
Solutions Architect

Comment 14 Prarit Bhargava 2012-07-30 13:25:24 UTC
Scott,

There should not be any urgency surrounding this errata as no other leap seconds are currently scheduled and they are typically announced _years_ in advance.

http://en.wikipedia.org/wiki/Leap_second

Please inform your customers that Engineering is working on a stable and well-tested solution, and that a fix will be in RHEL6.4.

P.

Comment 15 Scott McCarty 2012-08-01 02:53:14 UTC
Prarit,
    I appreciate the response. The leap second insertion is released in Bulletin C on the IERS website, which is typically 5 months before the June or December of a leap second.

Our customers all have tickets open which cannot be closed until they have guaranteed that a fix is in place for their systems. 

These two facts combined create risk for operations folks and their managers are particularly uncomfortable closing the ticket without closing the loop. This is why a definite release for this patch is so critical.

I hope that helps clarify. I have been educating our customer's operations teams on the RHEL6.4 release. I have explained to them that it would not be released z-stream because the time system is such a critical piece of the kernel.

Best Regards
Scott M

Comment 19 Prarit Bhargava 2012-08-10 13:56:50 UTC
Created attachment 603549 [details]
RHEL PATCH 1/7

Comment 20 Prarit Bhargava 2012-08-10 13:56:54 UTC
Created attachment 603550 [details]
RHEL PATCH 2/7

Comment 21 Prarit Bhargava 2012-08-10 13:56:57 UTC
Created attachment 603551 [details]
RHEL PATCH 3/7

Comment 22 Prarit Bhargava 2012-08-10 13:57:01 UTC
Created attachment 603552 [details]
RHEL PATCH 4/7

Comment 23 Prarit Bhargava 2012-08-10 13:57:07 UTC
Created attachment 603553 [details]
RHEL PATCH 5/7

Comment 24 Prarit Bhargava 2012-08-10 13:57:10 UTC
Created attachment 603554 [details]
RHEL PATCH 6/7

Comment 25 Prarit Bhargava 2012-08-10 13:57:13 UTC
Created attachment 603555 [details]
RHEL PATCH 7/7

Comment 32 Jarod Wilson 2012-08-16 15:01:39 UTC
Patch(es) available on kernel-2.6.32-298.el6

Comment 37 errata-xmlrpc 2013-02-21 06:29:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html


Note You need to log in before you can comment on or make changes to this bug.