Bug 1072583
Summary: | hwclock --systohc can hang on busy or virtual machine | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Chris MacGregor <chrismacgregor> | ||||
Component: | util-linux-ng | Assignee: | Karel Zak <kzak> | ||||
Status: | CLOSED ERRATA | QA Contact: | Branislav Blaškovič <bblaskov> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.4 | CC: | bblaskov, fkrska, jkurik, kzak, msaxena, psklenar, rrajaram | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | util-linux-ng-2.17.2-12.15.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1072930 (view as bug list) | Environment: | |||||
Last Closed: | 2014-10-14 07:35:18 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 994246, 1072930, 1085818 | ||||||
Attachments: |
|
Forgot to mention: this causes a reboot or shutdown of a machine to effectively hang. Fixed in upstream tree by commit 4a44a54b3caf77923f0e3f1d5bdf5eda6ef07f62. Testcase is failing on ppc64 (only this) architecture. See this log file: http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2014/06/6786/678665/1410834/22201968/TESTOUT.log Testing command: 'hwclock -D -D --systohc --test' Output of this command contains "Timed out" (In reply to Branislav Blaškovič from comment #10) > Testcase is failing on ppc64 (only this) architecture. > See this log file: > http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2014/06/6786/678665/ > 1410834/22201968/TESTOUT.log > > Testing command: 'hwclock -D -D --systohc --test' > Output of this command contains "Timed out" Do you mean "Timed out waiting for time change." right? This is problem with synchronization to HW clock and it can happen only on archs where is no usable RTC_UIE_ON ioctl and where we have to use busy-wait. This is not ralated to the patch. Anyway, I guess you can try to avoid this problem by: hwclock -D -D --systohc --test --noadjfile --utc ^^^^^^^^^^^^^^^^^ It would be probably better to update to test and add --noadjfile --utc otherwise you're also testing another hwclock functionality and not only the problem with --systohc. Note that the timeout for the busy-wait is 1.5s, it seems that on the ppc64 it's not enough. IMHO we can ignore the problem for now. Thank you for explanation. Test is passing now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1545.html |
Created attachment 870620 [details] patch as submitted to util-linux.org on 2014-02-27 If the hwclock command with the --systohc option is never able to run continuously (without being interrupted for more than 100 ms) for at least 500 ms, then it will never finish running. This can occur on a machine that is busy, or on a virtual machine where the physical CPUs are shared across a larger number of virtual CPUs. Version-Release number of selected component (if applicable): util-linux-ng 2.17.2, util-linux 2.20.1 How reproducible: Approximately half the times tried, using the steps below. Steps to Reproduce: 1. Create a Google Compute Engine instance, machine type g1-small, image centos-6-v20131120, and log in to it. 2. Run "sudo /sbin/hwclock --systohc -D" Actual results: Lots of: ... Time elapsed since reference time has been 5.364920 seconds. Delaying further to reach the new time. ... and it never terminates. Expected results: Command completes successfully within 2-3 seconds. Additional info: The design flaw in 2.17.2's hwclock that causes this problem was masked (not fixed!) by a bug introduced by a change made circa 2011-07-25. Don't be fooled - this ensures that it always terminates, but not correctly nor in the manner intended (nor was the intent of that change to fix the bug described above). The attached patch corrects both issues.