Bug 1072583

Summary: hwclock --systohc can hang on busy or virtual machine
Product: Red Hat Enterprise Linux 6 Reporter: Chris MacGregor <chrismacgregor>
Component: util-linux-ngAssignee: Karel Zak <kzak>
Status: CLOSED ERRATA QA Contact: Branislav Blaškovič <bblaskov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.4CC: bblaskov, fkrska, jkurik, kzak, msaxena, psklenar, rrajaram
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: util-linux-ng-2.17.2-12.15.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1072930 (view as bug list) Environment:
Last Closed: 2014-10-14 07:35:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 994246, 1072930, 1085818    
Attachments:
Description Flags
patch as submitted to util-linux@vger.kernel.org on 2014-02-27 none

Description Chris MacGregor 2014-03-04 20:34:26 UTC
Created attachment 870620 [details]
patch as submitted to util-linux.org on 2014-02-27

If the hwclock command with the --systohc option is never able to run continuously (without being interrupted for more than 100 ms) for at least 500 ms, then it will never finish running.  This can occur on a machine that is busy, or on a virtual machine where the physical CPUs are shared across a larger number of virtual CPUs.


Version-Release number of selected component (if applicable): util-linux-ng 2.17.2, util-linux 2.20.1


How reproducible: Approximately half the times tried, using the steps below.


Steps to Reproduce:
1. Create a Google Compute Engine instance, machine type g1-small, image centos-6-v20131120, and log in to it.
2. Run "sudo /sbin/hwclock --systohc -D"

Actual results:
Lots of:
...
Time elapsed since reference time has been 5.364920 seconds.
Delaying further to reach the new time.
...
and it never terminates.


Expected results:
Command completes successfully within 2-3 seconds.

Additional info:
The design flaw in 2.17.2's hwclock that causes this problem was masked (not fixed!) by a bug introduced by a change made circa 2011-07-25.  Don't be fooled - this ensures that it always terminates, but not correctly nor in the manner intended (nor was the intent of that change to fix the bug described above).

The attached patch corrects both issues.

Comment 1 Chris MacGregor 2014-03-04 20:46:41 UTC
Forgot to mention: this causes a reboot or shutdown of a machine to effectively hang.

Comment 3 Karel Zak 2014-03-05 12:03:19 UTC
Fixed in upstream tree by commit 4a44a54b3caf77923f0e3f1d5bdf5eda6ef07f62.

Comment 10 Branislav Blaškovič 2014-07-03 12:23:24 UTC
Testcase is failing on ppc64 (only this) architecture.
See this log file:
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2014/06/6786/678665/1410834/22201968/TESTOUT.log

Testing command: 'hwclock -D -D --systohc --test'
Output of this command contains "Timed out"

Comment 11 Karel Zak 2014-07-17 08:56:00 UTC
(In reply to Branislav Blaškovič from comment #10)
> Testcase is failing on ppc64 (only this) architecture.
> See this log file:
> http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2014/06/6786/678665/
> 1410834/22201968/TESTOUT.log
> 
> Testing command: 'hwclock -D -D --systohc --test'
> Output of this command contains "Timed out"

Do you mean 

   "Timed out waiting for time change."

right? This is problem with synchronization to HW clock and it can happen only
on archs where is no usable RTC_UIE_ON ioctl and where we have to use busy-wait.

This is not ralated to the patch. Anyway, I guess you can try to avoid 
this problem by:

  hwclock -D -D --systohc --test --noadjfile --utc
                                 ^^^^^^^^^^^^^^^^^

It would be probably better to update to test and add --noadjfile --utc otherwise you're also testing another hwclock functionality and not only the problem with --systohc.

Comment 12 Karel Zak 2014-07-17 08:57:51 UTC
Note that the timeout for the busy-wait is 1.5s, it seems that on the ppc64 it's not enough. IMHO we can ignore the problem for now.

Comment 13 Branislav Blaškovič 2014-07-22 10:01:38 UTC
Thank you for explanation. Test is passing now.

Comment 14 errata-xmlrpc 2014-10-14 07:35:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1545.html