Bug 461640 - [RHEL5.3][Kernel gettimeofday test failing when running in a fullyvirt guest
Summary: [RHEL5.3][Kernel gettimeofday test failing when running in a fullyvirt guest
Keywords:
Status: CLOSED DUPLICATE of bug 449346
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Rik van Riel
QA Contact: Martin Jenner
URL: http://rhts.redhat.com/cgi-bin/rhts/t...
Whiteboard:
Depends On:
Blocks: 492568
TreeView+ depends on / blocked
 
Reported: 2008-09-09 17:00 UTC by Jeff Burke
Modified: 2009-07-13 20:44 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-13 20:44:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gettimeofday test log on bare-metal (4.23 KB, text/plain)
2008-09-12 13:05 UTC, Prarit Bhargava
no flags Details

Description Jeff Burke 2008-09-09 17:00:52 UTC
Description of problem:
 When running a RHEL5.3 kernel-xen DOM0 and a RHEL5.3 fullyvirt DOMU kernel. The gettimeofdaytest fails.

Version-Release number of selected component (if applicable):
kernel-2.6.18-105.el5

How reproducible:
 Unknown

Steps to Reproduce:
1. Install RHEL5.3 distro RHEL5.3-Server-20080814.nightly 
2. Install the 2.6.18-105.el5 kernel
3. Run the following test 
http://rhts.redhat.com/rpms/development/noarch/noarch/rh-tests-kernel-syscalls-gettimeofday-1.8-17.noarch.rpm
  
Actual results:
***** Starting the runtest.sh script *****
***** Current Running Kernel Package = kernel-2.6.18-105.el5.bz461530.x86_64 *****
***** Current Running Distro = Red Hat Enterprise Linux Server release 5.2 (Tikanga) *****
grep /var/log/messages for which time source kernel is using
------------------------------------------------------------
Sep  9 10:04:00 guest69-97 kernel: Calibrating delay using timer specific routine.. 4451.28 BogoMIPS (lpj=2225641)
Sep  9 10:04:00 guest69-97 kernel: Using local APIC timer interrupts.
Sep  9 10:04:00 guest69-97 kernel: Detected 6.250 MHz APIC timer.
Sep  9 10:04:00 guest69-97 kernel: Calibrating delay using timer specific routine.. 4416.47 BogoMIPS (lpj=2208238)
Sep  9 10:04:00 guest69-97 kernel: time.c: Using 69.069722 MHz WALL HPET GTOD HPET timer.
Sep  9 10:04:00 guest69-97 kernel: time.c: Detected 2210.296 MHz processor.
Sep  9 10:04:00 guest69-97 kernel: hpet0: 3 64-bit timers, 69069722 Hz
Sep  9 10:04:01 guest69-97 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.lo.base_reachable_time; Use net.ipv6.neigh.lo.base_reachable_time_ms instead.
Sep  9 10:07:21 guest69-97 kernel: Calibrating delay using timer specific routine.. 4459.70 BogoMIPS (lpj=2229850)
Sep  9 10:07:21 guest69-97 kernel: Using local APIC timer interrupts.
Sep  9 10:07:21 guest69-97 kernel: Detected 6.250 MHz APIC timer.
Sep  9 10:07:21 guest69-97 kernel: Calibrating delay using timer specific routine.. 4421.06 BogoMIPS (lpj=2210534)
Sep  9 10:07:21 guest69-97 kernel: time.c: Using 69.069722 MHz WALL HPET GTOD HPET timer.
Sep  9 10:07:21 guest69-97 kernel: time.c: Detected 2210.301 MHz processor.
Sep  9 10:07:21 guest69-97 kernel: hpet0: 3 64-bit timers, 69069722 Hz
Sep  9 10:07:22 guest69-97 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.lo.base_reachable_time; Use net.ipv6.neigh.lo.base_reachable_time_ms instead.
Sep  9 10:11:46 guest69-97 kernel: Calibrating delay using timer specific routine.. 4462.44 BogoMIPS (lpj=2231222)
Sep  9 10:11:46 guest69-97 kernel: Using local APIC timer interrupts.
Sep  9 10:11:46 guest69-97 kernel: Detected 6.250 MHz APIC timer.
Sep  9 10:11:46 guest69-97 kernel: Calibrating delay using timer specific routine.. 4421.15 BogoMIPS (lpj=2210576)
Sep  9 10:11:46 guest69-97 kernel: time.c: Using 69.069722 MHz WALL HPET GTOD HPET timer.
Sep  9 10:11:46 guest69-97 kernel: time.c: Detected 2210.303 MHz processor.
Sep  9 10:11:47 guest69-97 kernel: hpet0: 3 64-bit timers, 69069722 Hz
Sep  9 10:11:47 guest69-97 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.lo.base_reachable_time; Use net.ipv6.neigh.lo.base_reachable_time_ms instead.
Sep  9 10:14:54 guest69-97 kernel: Calibrating delay using timer specific routine.. 4219.82 BogoMIPS (lpj=2109912)
Sep  9 10:14:54 guest69-97 kernel: Using local APIC timer interrupts.
Sep  9 10:14:54 guest69-97 kernel: Detected 6.250 MHz APIC timer.
Sep  9 10:14:54 guest69-97 kernel: Calibrating delay using timer specific routine.. 4425.96 BogoMIPS (lpj=2212984)
Sep  9 10:14:54 guest69-97 kernel: time.c: Using 69.069722 MHz WALL HPET GTOD HPET timer.
Sep  9 10:14:54 guest69-97 kernel: time.c: Detected 2210.305 MHz processor.
Sep  9 10:14:55 guest69-97 kernel: hpet0: 3 64-bit timers, 69069722 Hz
Sep  9 10:14:55 guest69-97 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.lo.base_reachable_time; Use net.ipv6.neigh.lo.base_reachable_time_ms instead.
------------------------------------------------------------
lspci output to show what chipset
------------------------------------------------------------
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:01.3 USB Controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Class ff80: XenSource, Inc. Xen Platform Device (rev 01)
00:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
------------------------------------------------------------
* List of Available Clock Sources 
        jiffies 
* Current Clock Source 
        jiffies 
------------------------------------------------------------
***** Start timedifftest-qa *****
***** Start loop number 1 *****
curtime 1220972534 time_before 1220972597 time_after 1220972534
Failed:
Start of time test.
***** Done loop number 1 *****
***** Start loop number 2 *****
curtime 1220972546 time_before 1220972546 time_after 1220972545
Failed:
Start of time test.
***** Done loop number 2 *****
***** Start loop number 3 *****
curtime 1220972781 time_before 1220972843 time_after 1220972781
Failed:
Start of time test.
Count=10000000
Count=20000000
***** Done loop number 3 *****
***** Start loop number 4 *****
curtime 1220972845 time_before 1220972845 time_after 1220972783
Failed:
Start of time test.
***** Done loop number 4 *****
***** Start loop number 5 *****
curtime 1220973031 time_before 1220973094 time_after 1220973031
Failed:
Start of time test.
Count=10000000
Count=20000000
***** Done loop number 5 *****
***** Start loop number 6 *****
curtime 1220973036 time_before 1220973037 time_after 1220973036
Failed:
Start of time test.
***** Done loop number 6 *****
***** Start loop number 7 *****
curtime 1220973102 time_before 1220973102 time_after 1220973040
Failed:
Start of time test.
***** Done loop number 7 *****
***** Start loop number 8 *****
curtime 1220973304 time_before 1220973366 time_after 1220973304
Failed:
Start of time test.
Count=10000000
Count=20000000
***** Done loop number 8 *****
***** Start loop number 9 *****
curtime 1220973366 time_before 1220973366 time_after 1220973304
Failed:
Start of time test.
***** Done loop number 9 *****
***** Start loop number 10 *****
curtime 1220973304 time_before 1220973366 time_after 1220973304
Failed:
Start of time test.
***** Done loop number 10 *****
timedrift: Success
***** Start gtod_backwards *****
***** Start loop number 1 *****
Test start time = 1220973304.667498s
start time = 1220973304.671893 
end time = 1220973304.671854 
FAIL: time went backwards -39000 nsec (-1.999961 )
***** Done loop number 1 *****
***** Start loop number 2 *****
Test start time = 1220973304.804972s
start time = 1220973304.805022 
end time = 1220973304.804982 
FAIL: time went backwards -40000 nsec (-1.999960 )
***** Done loop number 2 *****
***** Start loop number 3 *****
Test start time = 1220973304.829893s
start time = 1220973304.829950 
end time = 1220973304.829916 
FAIL: time went backwards -34000 nsec (-1.999966 )
***** Done loop number 3 *****
***** Start loop number 4 *****
Test start time = 1220973304.840977s
start time = 1220973304.841021 
end time = 1220973304.840982 
FAIL: time went backwards -39000 nsec (-1.999961 )
***** Done loop number 4 *****
***** Start loop number 5 *****
Test start time = 1220973304.851896s
start time = 1220973304.851983 
end time = 1220973304.851915 
FAIL: time went backwards -68000 nsec (-1.999932 )
***** Done loop number 5 *****
***** Start loop number 6 *****
Test start time = 1220973304.869495s
start time = 1220973304.870237 
end time = 1220973304.869817 
FAIL: time went backwards -420000 nsec (-1.999580 )
***** Done loop number 6 *****
***** Start loop number 7 *****
Test start time = 1220973304.882003s
start time = 1220973304.881909 
end time = 1220973304.881885 
FAIL: time went backwards -24000 nsec (-1.999976 )
***** Done loop number 7 *****
***** Start loop number 8 *****
Test start time = 1220973304.970057s
start time = 1220973304.986517 
end time = 1220973304.986446 
FAIL: time went backwards -71000 nsec (-1.999929 )
***** Done loop number 8 *****
***** Start loop number 9 *****
Test start time = 1220973305.002961s
start time = 1220973305.002984 
end time = 1220973305.002946 
FAIL: time went backwards -38000 nsec (-1.999962 )
***** Done loop number 9 *****
***** Start loop number 10 *****
Test start time = 1220973305.024943s
start time = 1220973305.026924 
end time = 1220973305.026861 
FAIL: time went backwards -63000 nsec (-1.999937 )
***** Done loop number 10 *****
Time test Failed:
***** End of runtest.sh *****

Expected results:
This test should pass

Additional info:

Comment 1 Prarit Bhargava 2008-09-10 11:58:31 UTC
Man ... this looks really familiar to the +/- 1 nsec error I fixed on bare-metal.

P.

Comment 2 Prarit Bhargava 2008-09-10 12:04:56 UTC
Correcting myself -- this doesn't have the same footprint at the +/-1 nsec error issue I resolved in bare-metal a while back.  This does appear to be something entirely new.

P.

Comment 3 Chris Lalancette 2008-09-10 12:55:43 UTC
Prarit,
    Hard to say exactly, but for the time being, I think this more properly belongs in kernel.  After all, in the fully virt case, we just use the bog standard bare-metal kernel inside the guest.  Now, it *is* possible that this is a hypervisor bug as well, but we won't know that until we track it down.  I swapped it back to kernel.

Chris Lalancette

Comment 4 Prarit Bhargava 2008-09-10 13:02:55 UTC
jburke -- what machine was this?  I'd like to see if this is reproducible on bare-metal.

P.

Comment 5 Prarit Bhargava 2008-09-12 11:54:38 UTC
jburke ping?

Comment 6 Jeff Burke 2008-09-12 12:26:19 UTC
Prarit,
    The Dom0 was hp-dl585g5-01.rhts.bos.redhat.com.

Comment 7 Prarit Bhargava 2008-09-12 13:02:04 UTC
dzickus happened to be running his nightly testsuite on hp-dl585g5-01 which happens to include a bare-metal test of the gettimeofday testsuite.

As I expected, the gettimeofday tests successfully complete on bare-metal.

http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=28942&type=Single 

(search for hp-dl585-g5-01, open up menu, search for gettimeofday)

IMO, this BZ is a virt issue.

P.

Comment 8 Prarit Bhargava 2008-09-12 13:05:15 UTC
Created attachment 316569 [details]
gettimeofday test log on bare-metal

Comment 10 Rik van Riel 2009-04-09 17:04:06 UTC
Jeff, did this happen with a multi-cpu xen guest?

Comment 11 Jeff Burke 2009-04-09 18:20:10 UTC
Rik, Yes this HVM had 2 cpus

Comment 12 Rik van Riel 2009-04-10 17:02:55 UTC
This bug can be caused by a combination of two main factors:
- while doing disk IO, one VCPU of an HVM guest can miss timer ticks
- Xen did not re-deliver those missed timer ticks later on, causing clock skew between VCPUs inside an HVM guest

Both of these issues should be resolved with the backport of the AIO disk handling code and upstream Xen 'no missed-tick accounting' timer code. Please test the test RPMs from http://people.redhat.com/riel/.xenaiotime/ and let us know if those (experimental!) test packages resolve the issue.

Comment 13 Jeff Burke 2009-07-13 20:44:23 UTC

*** This bug has been marked as a duplicate of bug 449346 ***


Note You need to log in before you can comment on or make changes to this bug.