Bug 472277

Summary: CRM 1871016 adjtimex causing instability on GPS clock daemon
Product: Red Hat Enterprise MRG Reporter: Issue Tracker <tao>
Component: realtime-kernelAssignee: Clark Williams <williams>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1.2CC: bhu, davids, lgoncalv, tao, tglx, williams
Target Milestone: 1.3Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-09-14 20:24:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test program to demonstrate ADJ_FREQUENCY failure none

Description Issue Tracker 2008-11-19 19:08:40 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2008-11-19 19:08:41 UTC
Description of problem:

Customer is evaluating MRG on RHEL 5 update 2. MRG 1.0. He has stated adjtimex is causing instability on GPS clock daemon.

As per support process[Previously], Production support would pass on Customer information and Service Request details to MRG Team(mrg-help).

RT raised for this issue 

https://engineering.redhat.com/rt3/SelfService/Display.html?id=30354 [Need to login with kerb username]

Clark Williams[<williams>] from Engineering has corresponded with the customer and has all the information required to take this case further.

Raising this issue through SEG so that we can create a Bugzilla to track to this issue. 

Please assign the bugzilla to Clark Williams
This event sent from IssueTracker by streeter  [SEG - Kernel]
 issue 240953

Comment 2 Luis Claudio R. Goncalves 2008-11-19 19:36:37 UTC
We have identified these commits from upstream as possibly related to this issue:

  - ee9851b218b8bafa22942b5404505ff3d2d34324
  - eea83d896e318bda54be2d2770d2c5d6668d11db
  - 916c7a855174e3b53d182b97a26b2e27a29726a1
  - d40e944c25fb4642adb2a4c580a48218a9f3f824

We are backporting them to our new test kernel (-94)

Comment 5 Clark Williams 2008-12-03 23:01:45 UTC
I built a kernel with the above commits and made it available to the customer through my people.redhat.com page. Unfortunately he's now seeing problems using the ADJ_FREQUENCY mode parameter. I'm going to ask for a reproducer to see if we can trigger this in the our development machines.

Comment 6 Clark Williams 2008-12-04 20:02:16 UTC
Created attachment 325739 [details]
test program to demonstrate ADJ_FREQUENCY failure

Test program from customer demonstrating failure to set frequency with ADJ_FREQUENCY on MRG RT kernel (works on Fedora and RHEL kernels)

Comment 7 Luis Claudio R. Goncalves 2008-12-30 13:26:30 UTC
I have backported one more upstream commit and fixed two issues that somehow came along with the backports. I built a test kernel with these fixes and the result is:

   [root@rhel5 ~]# /tmp/test_it
   adjtimex: Initial Frequency = 0
   Adjtimex: Success returned from setting Frequency to 1.
   adjtimex: New Frequency = 1

   [root@rhel5 ~]# /tmp/test_it
   adjtimex: Initial Frequency = 1
   Adjtimex: Success returned from setting Frequency to 2.
   adjtimex: New Frequency = 2

   [root@rhel5 ~]# /tmp/test_it
   adjtimex: Initial Frequency = 2
   Adjtimex: Success returned from setting Frequency to 3.
   adjtimex: New Frequency = 3

I have already added these fixes to our 2.6.24.7-96.el5rt kernel and started to build it for testing.

Comment 14 errata-xmlrpc 2009-02-04 15:05:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0053.html

Comment 16 Luis Claudio R. Goncalves 2009-02-06 14:12:07 UTC
I have replaced the 8 patches we had to fix this issue by a cleaner and smaller
2-patch solution from John Stultz @ IBM. His patches were added to kernel -102.
test results:

[root@rhel5 tests]# uname -r
2.6.24.7-102.el5rt


[root@rhel5 tests]# ./test_it 
adjtimex: Initial Frequency = 0
Adjtimex: Success returned from setting Frequency to 1.
adjtimex: New Frequency = 1

[root@rhel5 tests]# ./test_it 
adjtimex: Initial Frequency = 1
Adjtimex: Success returned from setting Frequency to 2.
adjtimex: New Frequency = 2

[root@rhel5 tests]# ./test_it 
adjtimex: Initial Frequency = 2
Adjtimex: Success returned from setting Frequency to 3.
adjtimex: New Frequency = 3


[root@rhel5 tests]# ./adjtimex02
adjtimex02    1  PASS  :  Test Passed, adjtimex() returned -1 with errno: 14
adjtimex02    2  PASS  :  Test Passed, adjtimex() returned -1 with errno: 22
adjtimex02    3  PASS  :  Test Passed, adjtimex() returned -1 with errno: 22
adjtimex02    4  PASS  :  Test Passed, adjtimex() returned -1 with errno: 22
adjtimex02    5  PASS  :  Test Passed, adjtimex() returned -1 with errno: 22
adjtimex02    6  PASS  :  Test Passed, adjtimex() returned -1 with errno: 1

Comment 18 Clark Williams 2009-02-23 18:40:49 UTC
Further examination of the new clocksource code showed that adjtimex tick adjustments are only applied at the half-second boundary. A new MRG RT kernel (-104) has been built that contains an experimental patch from John Stultz where tick adjustments are applied immediately rather than at the sync boundary. When this kernel finishes initial testing I'll make it available to the customer to see if it resolves the issue (or at least attenuates the problem).