Bug 1098602
| Summary: | kvmclock: Ensure time in migration never goes backward (backport) | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcelo Tosatti <mtosatti> | |
| Component: | qemu-kvm | Assignee: | Marcelo Tosatti <mtosatti> | |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.0 | CC: | amit.shah, chayang, coli, imammedo, jkurik, jreznik, juzhang, knoel, mtosatti, rbalakri, rkrcmar, scui, tdosek, virt-maint, xfu, zhanghm.zhm | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-1.5.3-77.el7 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1121550 1143054 (view as bug list) | Environment: | ||
| Last Closed: | 2015-03-05 08:09:32 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1121550 | |||
|
Description
Marcelo Tosatti
2014-05-16 16:04:31 UTC
Tested this bug with qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 & RHEL7.0 guest. This is steps of testing & result. steps: 1.sync time on hostA #ntpdate clock.redhat.com 2.sync time on hostB #ntpdate clock.redhat.com 3.Boot RHEL7.0 on hostA /usr/libexec/qemu-kvm -M pc -cpu Opteron_G5 -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x7,num_queues=4 -drive file=/mnt/rhel7.0.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -vnc :1 -monitor stdio -net none -rtc base=utc,clock=host,driftfix=slew -serial unix:/tmp/monitor2,server,nowait 4.Boot RHEL7.0 guest on hostB with -incoming /usr/libexec/qemu-kvm -M pc -cpu Opteron_G5 -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x7,num_queues=4 -drive file=/mnt/rhel7.0.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -vnc :1 -monitor stdio -net none -rtc base=utc,clock=host,driftfix=slew -serial unix:/tmp/monitor2,server,nowait -incoming tcp:0:5555 5. check guest system time and current clocksuroce inside guest #date Wed May 28 06:57:10 EDT 2014 # cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock 6. check host system time #date Wed May 28 06:57:05 EDT 2014 so, guest system time=host system time. 7. load stress cpu with stress tool inside guest. #stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10000s 8. print system time per second with script inside guest while true;do date;sleep 1;done >time-result 9. do ping-pong migration hostA<-->hostB 10. compare system time guest and host after 10 times result: guest system time: #date Wed May 28 07:27:45 EDT 2014 host system time: # date Wed May 28 07:28:52 EDT 2014 so, guest system time goes backward about 1 mins after 10 times ping-pong migration. Marcelo, If my steps is wrong, please correct me. according to this result above. seems the build qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 didn't fix this issue. please confirm it. (In reply to FuXiangChun from comment #6) > If my steps is wrong, please correct me. according to this result above. > seems the build qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 didn't fix this > issue. please confirm it. Fu, The difference between guest/host time, after ping-pong migration with loaded guest, should be the same with qemu-kvm-1.5.3-62.el7.x86_64, yes? The patch fixes a different problem which is, when executing from within the guest: - R1 = read kvmclock - R2 = read kvmclock - R2 < R1 (smaller than) (In reply to Marcelo Tosatti from comment #7) > (In reply to FuXiangChun from comment #6) > > If my steps is wrong, please correct me. according to this result above. > > seems the build qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 didn't fix this > > issue. please confirm it. > > > Fu, > > The difference between guest/host time, after ping-pong migration with > loaded guest, should be the same with qemu-kvm-1.5.3-62.el7.x86_64, yes? According to test steps in commnet6. Retested qemu-kvm-1.5.3-60.el7.x86_64 & qemu-kvm-1.5.3-60.el7_0.2.x86_64. QE got the same test result as comments 6(guest system time goes backward about 1 mins after 10 times ping-pong migration). > > The patch fixes a different problem which is, when executing from within > the guest: > > - R1 = read kvmclock > - R2 = read kvmclock > - R2 < R1 (smaller than) Marcelo, QE need to confirm with you a few questions. Q1. How to understand "R1 = read kvmclock" & "R2 = read kvmclock" & "R2 < R1 (smaller than)"? I am not clear what is R1 & R2 & read kvmclock. Are R1 and R2 are hardware clock(if it hardware clock, QE can get it via hwclock command)? Q2. How to understand "read kvmclock", my understanding is that it is system time, use date command to get it, right? Q3. QE did not find qemu-kvm-1.5.3-62.el7.x86_64 in brewweb. The latest qemu version is qemu-kvm-1.5.3-60.el7_0.2.x86_64. How to get it? Q4. If this bug is fixed, result expected is guest system time = host system time after ping-pong migration with loaded guest, right? Test summary, According to steps in comment 6, qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 & qemu-kvm-1.5.3-60.el7.x86_64 & qemu-kvm-1.5.3-60.el7_0.2.x86_64 got the same result->guest system time goes backward about 1.5 mins after 10 times ping-pong migration (In reply to FuXiangChun from comment #8) > > > (In reply to Marcelo Tosatti from comment #7) > > (In reply to FuXiangChun from comment #6) > > > If my steps is wrong, please correct me. according to this result above. > > > seems the build qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 didn't fix this > > > issue. please confirm it. > > > > > > Fu, > > > > The difference between guest/host time, after ping-pong migration with > > loaded guest, should be the same with qemu-kvm-1.5.3-62.el7.x86_64, yes? > > According to test steps in commnet6. Retested qemu-kvm-1.5.3-60.el7.x86_64 & > qemu-kvm-1.5.3-60.el7_0.2.x86_64. QE got the same test result as comments > 6(guest system time goes backward about 1 mins after 10 times ping-pong > migration). > > > > > The patch fixes a different problem which is, when executing from within > > the guest: > > > > - R1 = read kvmclock > > - R2 = read kvmclock > > - R2 < R1 (smaller than) > > Marcelo, > QE need to confirm with you a few questions. > > Q1. How to understand "R1 = read kvmclock" & "R2 = read kvmclock" & "R2 < R1 > (smaller than)"? I am not clear what is R1 & R2 & read kvmclock. Are R1 and > R2 are hardware clock(if it hardware clock, QE can get it via hwclock > command)? R1 and R2 are kvmclock reads (see pvclock_clocksource_read function in arch/x86/kernel/pvclock.c). > Q2. How to understand "read kvmclock", my understanding is that it is system > time, use date command to get it, right? No, can verify by using clock_gettime(CLOCK_MONOTONIC) command from userspace, in a host with TSC clocksource. > Q3. QE did not find qemu-kvm-1.5.3-62.el7.x86_64 in brewweb. The latest qemu > version is qemu-kvm-1.5.3-60.el7_0.2.x86_64. How to get it? I compiled via GIT. So the source code is at http://git.app.eng.bos.redhat.com/virt/rhel7/qemu-kvm.git/ > Q4. If this bug is fixed, result expected is guest system time = host system > time after ping-pong migration with loaded guest, right? No, that is a different problem. > Test summary, > According to steps in comment 6, qemu-kvm-1.5.3-62.el7.bz1076326.x86_64 & > qemu-kvm-1.5.3-60.el7.x86_64 & qemu-kvm-1.5.3-60.el7_0.2.x86_64 got the same > result->guest system time goes backward about 1.5 mins after 10 times > ping-pong migration OK, thanks. I'll attach a testcase for this bug later. Fix included in qemu-kvm-1.5.3-66.el7 TESTCASE ------------- Find host machine with the following characteristics: 1) Using TSC clocksource. 2) With RHEL-7 guest running time-warp-test.c. Check /var/lib/chrony/drift, first element must be negative, or alternatively "chronyc tracking" command must report: Frequency : xyz ppm slow Then the time necessary of guest uptime can be calculated with: h*3600*(ppm * 1/1000000) = 2*60 - where ppm is part per million frequency adjustment of the host as noted, without the negative sign. - h is hours of guest uptime necessary to achieve 2 minutes of drift. 2 minutes of drift should be sufficient for time backwards event (or guest hang) to be noticed in a guest running time-warp-test.c(*) savevm/loadvm. http://people.redhat.com/mingo/time-warp-test/time-warp-test.c. Hi Marcelo,
I have met the qemu core dump when tested s3(savevm & loadvm) related operation using autotest.
version:
qemu-kvm-1.5.3-66.el7.x86_64
core dump info:
Core was generated by `/bin/qemu-kvm -S -name virt-tests-vm1 -sandbox off -M pc -nodefaults -vga cirru'.
Program terminated with signal 6, Aborted.
#0 0x00007fc86582b989 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007fc86582b989 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fc86582d098 in __GI_abort () at abort.c:90
#2 0x00007fc8658248f6 in __assert_fail_base (
fmt=0x7fc8659743e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7fc86aedbfc0 "time.tsc_timestamp <= migration_tsc",
file=file@entry=0x7fc86aedbf88 "/builddir/build/BUILD/qemu-1.5.3/hw/i386/kvm/clock.c",
line=line@entry=64,
function=function@entry=0x7fc86aedc040 <__PRETTY_FUNCTION__.23497> "kvmclock_current_nsec") at assert.c:92
#3 0x00007fc8658249a2 in __GI___assert_fail (
assertion=assertion@entry=0x7fc86aedbfc0 "time.tsc_timestamp <= migration_tsc",
file=file@entry=0x7fc86aedbf88 "/builddir/build/BUILD/qemu-1.5.3/hw/i386/kvm/clock.c",
line=line@entry=64,
function=function@entry=0x7fc86aedc040 <__PRETTY_FUNCTION__.23497> "kvmclock_current_nsec") at assert.c:101
#4 0x00007fc86adaf930 in kvmclock_current_nsec (s=0x7fc86c1ef6b0)
at /usr/src/debug/qemu-1.5.3/hw/i386/kvm/clock.c:64
#5 kvmclock_vm_state_change (opaque=0x7fc86c1ef6b0, running=<optimized out>,
state=<optimized out>) at /usr/src/debug/qemu-1.5.3/hw/i386/kvm/clock.c:87
#6 0x00007fc86ad8216b in vm_state_notify (running=running@entry=1,
state=state@entry=RUN_STATE_RUNNING) at vl.c:1662
#7 0x00007fc86ad821ab in vm_start () at vl.c:1671
#8 0x00007fc86ad52485 in qmp_cont (errp=errp@entry=0x7fff550e4530) at qmp.c:179
#9 0x00007fc86ad4d518 in qmp_marshal_input_cont (mon=<optimized out>,
qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:1318
#10 0x00007fc86add96c7 in qmp_call_cmd (cmd=<optimized out>, params=0x7fc86c9ecff0,
mon=0x7fc86c059950) at /usr/src/debug/qemu-1.5.3/monitor.c:4509
#11 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>)
at /usr/src/debug/qemu-1.5.3/monitor.c:4575
#12 0x00007fc86ae86222 in json_message_process_token (lexer=0x7fc86c059de0,
token=0x7fc86c25b400, type=JSON_OPERATOR, x=37, y=158) at qobject/json-streamer.c:87
#13 0x00007fc86ae957af in json_lexer_feed_char (lexer=lexer@entry=0x7fc86c059de0,
ch=<optimized out>, flush=flush@entry=false) at qobject/json-lexer.c:303
#14 0x00007fc86ae9587e in json_lexer_feed (lexer=0x7fc86c059de0, buffer=<optimized out>,
size=<optimized out>) at qobject/json-lexer.c:356
#15 0x00007fc86ae863b9 in json_message_parser_feed (parser=<optimized out>,
buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:110
#16 0x00007fc86add8413 in monitor_control_read (opaque=<optimized out>,
buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-1.5.3/monitor.c:4596
#17 0x00007fc86ad471c1 in qemu_chr_be_write (len=<optimized out>,
buf=0x7fff550e4720 "}I\016U\377\177", s=0x7fc86c043600) at qemu-char.c:167
#18 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7fc86c043600)
at qemu-char.c:2492
#19 0x00007fc86a080ac6 in g_main_dispatch (context=0x7fc86c043400) at gmain.c:3058
#20 g_main_context_dispatch (context=context@entry=0x7fc86c043400) at gmain.c:3634
#21 0x00007fc86ad19e9a in glib_pollfds_poll () at main-loop.c:187
#22 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232
#23 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464
#24 0x00007fc86ac3ff70 in main_loop () at vl.c:1988
#25 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4359
According to the core dump info and the qemu-kvm git log, I think the core dump problem was introduced by this bug.
And I have tested qemu-kvm-1.5.3-65.el7.x86_64, test pass.
Btw, I have not triggered this problem manually, just met this with autotest (100% reproducible). I will try more times manually to trigger it.
As the above info, I think we should set this bug to 'ASSIGNED', is it ok for you?
If there is anything wrong, feel free to correct me.
Thanks,
Cong
(In reply to CongLi from comment #15) > Hi Marcelo, > > I have met the qemu core dump when tested s3(savevm & loadvm) related > operation using autotest. The qemu core dump is kvmclock related error. Marcelo, QE still can not reproduce this bug with qemu-kvm-1.5.3-64.el7.x86_64. The following are detailed steps. If have any mistake, please correct me. Thanks. 1. hostA & host sync clock from clock.redhat.com #ntpdate clock.redhat.com 2. ensure two hosts are using tsc clocksource. # cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc 3. Boot rhel7.0 guest with this command line. /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -no-kvm-pit-reinjection -name rhel7.0 -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -drive file=/mnt/rhel7-64-ga.qcow2,if=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,media=disk,aio=native,werror=stop,rerror=stop,serial=1234 -device virtio-blk-pci,drive=drive-virtio-disk,id=virtio-disk,bootindex=1 -monitor stdio -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :1 4. run test script inside rhel7.0 guest. ./time-warp-test 5. check "cat /var/lib/chrony/drift" inside guest # cat /var/lib/chrony/drift -21.219299 57.059106 6.check "chronyc tracking" Reference ID : 0.0.0.0 () Stratum : 0 Ref time (UTC) : Thu Jan 1 00:00:00 1970 System time : 0.000000000 seconds fast of NTP time Last offset : 0.000000000 seconds RMS offset : 0.000000000 seconds Frequency : 21.219 ppm slow Residual freq : 0.000 ppm Skew : 0.000 ppm Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Update interval : 0.0 seconds Leap status : Not synchronised 7. do migration to des host. 8. check "cat /var/lib/chrony/drift" & "chronyc tracking" inside guest. got the same result with step 5 & step 6. another, correct a small mistake in script. http://people.redhat.com/mingo/time-warp-test/time-warp-test.c change 169 __asm__ __volatile__("movl $0,%0; rep; nop" : "=g"(*flag) :: "memory"); to 169 __asm__ __volatile__("mov $0,%0; rep; nop" : "=g"(*flag) :: "memory"); otherwise, compile fail. (In reply to FuXiangChun from comment #18) > Marcelo, > QE still can not reproduce this bug with qemu-kvm-1.5.3-64.el7.x86_64. The > following are detailed steps. If have any mistake, please correct me. Thanks. > > 1. hostA & host sync clock from clock.redhat.com > #ntpdate clock.redhat.com > > 2. ensure two hosts are using tsc clocksource. > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > tsc > > 3. Boot rhel7.0 guest with this command line. > > /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 4096 -smp > 4,sockets=4,cores=1,threads=1 -no-kvm-pit-reinjection -name rhel7.0 -uuid > 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc > base=localtime,clock=host,driftfix=slew -drive > file=/mnt/rhel7-64-ga.qcow2,if=none,id=drive-virtio-disk,format=qcow2, > cache=none,aio=native,media=disk,aio=native,werror=stop,rerror=stop, > serial=1234 -device > virtio-blk-pci,drive=drive-virtio-disk,id=virtio-disk,bootindex=1 -monitor > stdio -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global > PIIX4_PM.disable_s4=0 -vnc :1 > > 4. run test script inside rhel7.0 guest. > ./time-warp-test > > 5. check "cat /var/lib/chrony/drift" inside guest > # cat /var/lib/chrony/drift > -21.219299 57.059106 The host must have a negative entry for first element of /var/lib/chrony/drift, not the guest. The guest must be left running for some time, at least 1 hour uptime. Try 2 hours uptime. > 6.check "chronyc tracking" > Reference ID : 0.0.0.0 () > Stratum : 0 > Ref time (UTC) : Thu Jan 1 00:00:00 1970 > System time : 0.000000000 seconds fast of NTP time > Last offset : 0.000000000 seconds > RMS offset : 0.000000000 seconds > Frequency : 21.219 ppm slow > Residual freq : 0.000 ppm > Skew : 0.000 ppm > Root delay : 0.000000 seconds > Root dispersion : 0.000000 seconds > Update interval : 0.0 seconds > Leap status : Not synchronised > > 7. do migration to des host. No need to migrate to destination host, savevm/loadvm on a single host is sufficient. > 8. check "cat /var/lib/chrony/drift" & "chronyc tracking" inside guest. > > got the same result with step 5 & step 6. Fix included in qemu-kvm-1.5.3-77.el7 > > 4. run test script inside rhel7.0 guest. > > ./time-warp-test > > > > 5. check "cat /var/lib/chrony/drift" inside guest > > # cat /var/lib/chrony/drift > > -21.219299 57.059106 > > The host must have a negative entry for first element of > /var/lib/chrony/drift, not the guest. > I have a host using tsc whose drift is a negative value. > The guest must be left running for some time, at least 1 hour > uptime. > > Try 2 hours uptime. > Here do you mean after guest starts up, running time-warp-test.c in guest for more than 2 hours? At the same time, keep savevm then loadvm? Another question: According to your formula: h*3600*(ppm * 1/1000000) = 2*60 In my case: I have Frequency: 19.007 ppm slow, the 'h' should be 33333 hours. Is there any way we can speed up the time to reproduce it? (In reply to Chao Yang from comment #24) > > > 4. run test script inside rhel7.0 guest. > > > ./time-warp-test > > > > > > 5. check "cat /var/lib/chrony/drift" inside guest > > > # cat /var/lib/chrony/drift > > > -21.219299 57.059106 > > > > The host must have a negative entry for first element of > > /var/lib/chrony/drift, not the guest. > > > I have a host using tsc whose drift is a negative value. > > > The guest must be left running for some time, at least 1 hour > > uptime. > > > > Try 2 hours uptime. > > > Here do you mean after guest starts up, running time-warp-test.c in guest > for more than 2 hours? At the same time, keep savevm then loadvm? > > Another question: > According to your formula: > h*3600*(ppm * 1/1000000) = 2*60 > > In my case: I have Frequency: 19.007 ppm slow, the 'h' should be 33333 > hours. Is there any way we can speed up the time to reproduce it? h*3600*(ppm * 1/1000000) = 2*60 24*3600*(.0000190000) = x*60 1.6416000000 = x*60 So 1.6 seconds of drift in 24 hours. Given that time-warp-test is reading time values continuously, and that savevm stops the VM and immediately saves KVM_GET_CLOCK value, 1.6 seconds should be enough. Don't know any method to speed up that value from the top of my head, will look it up and let you know. Hi Marcelo,
I have been trying to reproduce this issue with qemu-kvm-1.5.3-60.el7.x86_64 by steps:
1. start a rhel 7 guest
2. run time-warp-test.c in guest
3. keep it running for more than 3 days
4. savevm then loadvm through monitor
Actual Result:
I stopped ntpd as well as chronyd both in host and in guest. After step 4, no hang happened in guest.
Host info:
# cat /var/lib/chrony/drift
-18.903366 0.044623
# dmesg | grep -i clocksource
[ 0.163878] Switching to clocksource hpet
[ 1.407523] tsc: Refined TSC clocksource calibration: 3392.304 MHz
[ 1.407538] Switching to clocksource tsc
# uname -r
3.10.0-194.el7.x86_64
Questions:
1. When shall I run savevm/loadvm pair?
2. The instruction command to compile this in time-warp-test.c is malfunction. It leads to coredump when running. Is this normal?
3. I used gcc -o time-warp-test.c time-warp-test.c to compile and run. Is this ok?
4. time-warp-test reports TSC: 2.30us, fail:0 What should it report to reproduce this issue?
I'll attach CLI in my test.
(In reply to Chao Yang from comment #26) > Hi Marcelo, > > I have been trying to reproduce this issue with qemu-kvm-1.5.3-60.el7.x86_64 > by steps: > 1. start a rhel 7 guest > 2. run time-warp-test.c in guest > 3. keep it running for more than 3 days > 4. savevm then loadvm through monitor > > Actual Result: > I stopped ntpd as well as chronyd both in host and in guest. After step 4, > no hang happened in guest. Should not stop chronyd in the host. In the guest, you can stop it. > > Host info: > # cat /var/lib/chrony/drift > -18.903366 0.044623 > > # dmesg | grep -i clocksource > [ 0.163878] Switching to clocksource hpet > [ 1.407523] tsc: Refined TSC clocksource calibration: 3392.304 MHz > [ 1.407538] Switching to clocksource tsc > > # uname -r > 3.10.0-194.el7.x86_64 > > Questions: > 1. When shall I run savevm/loadvm pair? > > 2. The instruction command to compile this in time-warp-test.c is > malfunction. It leads to coredump when running. Is this normal? > > 3. I used gcc -o time-warp-test.c time-warp-test.c to compile and run. Is > this ok? > > 4. time-warp-test reports TSC: 2.30us, fail:0 What should it report to > reproduce this issue? > > I'll attach CLI in my test. I have keep VM running time-warp-test for 2 days. And this TSC host has -18 output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes warp(guest is 2 mins slower than host), but I didn't see any hang in guest. Can I say I have reproduced the original issue? If no, what further operation should I do? (In reply to Chao Yang from comment #29) > I have keep VM running time-warp-test for 2 days. And this TSC host has -18 > output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes > warp(guest is 2 mins slower than host), but I didn't see any hang in guest. > Can I say I have reproduced the original issue? If no, what further > operation should I do? Where did you saw the warp exactly? In the output of time-warp-test? (In reply to Marcelo Tosatti from comment #30) > (In reply to Chao Yang from comment #29) > > I have keep VM running time-warp-test for 2 days. And this TSC host has -18 > > output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes > > warp(guest is 2 mins slower than host), but I didn't see any hang in guest. > > Can I say I have reproduced the original issue? If no, what further > > operation should I do? > > Where did you saw the warp exactly? In the output of time-warp-test? No, it was from date. And time in guest caught up soon. Warp test in guest reported: TSC: 2.36us, fail:0 TOD: 2.27us, fail:0 CLK: 2.25us, fail:0 Unless warp test reports failure in either TSC, TOD or CLK, I haven't reproduced it, right? savevm then loadvm cannot reproduce this bug on a host with a guest up for 13 days, with warp running in guest. Any suggestions for QE to reproduce and verify this bug? (In reply to Chao Yang from comment #31) > (In reply to Marcelo Tosatti from comment #30) > > (In reply to Chao Yang from comment #29) > > > I have keep VM running time-warp-test for 2 days. And this TSC host has -18 > > > output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes > > > warp(guest is 2 mins slower than host), but I didn't see any hang in guest. > > > Can I say I have reproduced the original issue? If no, what further > > > operation should I do? > > > > Where did you saw the warp exactly? In the output of time-warp-test? > > No, it was from date. And time in guest caught up soon. > > Warp test in guest reported: > TSC: 2.36us, fail:0 > TOD: 2.27us, fail:0 > CLK: 2.25us, fail:0 > > Unless warp test reports failure in either TSC, TOD or CLK, I haven't > reproduced it, right? > > savevm then loadvm cannot reproduce this bug on a host with a guest up for > 13 days, with warp running in guest. > > Any suggestions for QE to reproduce and verify this bug? Do you recall what kernel version was being used in the host ? It should be easier to hit the bug with kernels < kernel-3.10.0-105.el7. (In reply to Marcelo Tosatti from comment #32) > (In reply to Chao Yang from comment #31) > > (In reply to Marcelo Tosatti from comment #30) > > > (In reply to Chao Yang from comment #29) > > > > I have keep VM running time-warp-test for 2 days. And this TSC host has -18 > > > > output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes > > > > warp(guest is 2 mins slower than host), but I didn't see any hang in guest. > > > > Can I say I have reproduced the original issue? If no, what further > > > > operation should I do? > > > > > > Where did you saw the warp exactly? In the output of time-warp-test? > > > > No, it was from date. And time in guest caught up soon. > > > > Warp test in guest reported: > > TSC: 2.36us, fail:0 > > TOD: 2.27us, fail:0 > > CLK: 2.25us, fail:0 > > > > Unless warp test reports failure in either TSC, TOD or CLK, I haven't > > reproduced it, right? > > > > savevm then loadvm cannot reproduce this bug on a host with a guest up for > > 13 days, with warp running in guest. > > > > Any suggestions for QE to reproduce and verify this bug? > > Do you recall what kernel version was being used in the host ? > > It should be easier to hit the bug with kernels < kernel-3.10.0-105.el7. I was using kernel-3.10.0-194.el7.x86_64. Retrying with kernels < kernel-3.10.0-105.el7. (In reply to Chao Yang from comment #33) > (In reply to Marcelo Tosatti from comment #32) > > (In reply to Chao Yang from comment #31) > > > (In reply to Marcelo Tosatti from comment #30) > > > > (In reply to Chao Yang from comment #29) > > > > > I have keep VM running time-warp-test for 2 days. And this TSC host has -18 > > > > > output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes > > > > > warp(guest is 2 mins slower than host), but I didn't see any hang in guest. > > > > > Can I say I have reproduced the original issue? If no, what further > > > > > operation should I do? > > > > > > > > Where did you saw the warp exactly? In the output of time-warp-test? > > > > > > No, it was from date. And time in guest caught up soon. > > > > > > Warp test in guest reported: > > > TSC: 2.36us, fail:0 > > > TOD: 2.27us, fail:0 > > > CLK: 2.25us, fail:0 > > > > > > Unless warp test reports failure in either TSC, TOD or CLK, I haven't > > > reproduced it, right? > > > > > > savevm then loadvm cannot reproduce this bug on a host with a guest up for > > > 13 days, with warp running in guest. > > > > > > Any suggestions for QE to reproduce and verify this bug? > > > > Do you recall what kernel version was being used in the host ? > > > > It should be easier to hit the bug with kernels < kernel-3.10.0-105.el7. > > I was using kernel-3.10.0-194.el7.x86_64. Retrying with kernels < > kernel-3.10.0-105.el7. I tested with kernel-3.10.0-104.el7(both host and guest), after 24h uptime the drift should be 1.27s, then savevm/loadvm, time-warp-test runs well and no hang happens. (In reply to Chao Yang from comment #34) > (In reply to Chao Yang from comment #33) > > (In reply to Marcelo Tosatti from comment #32) > > > (In reply to Chao Yang from comment #31) > > > > (In reply to Marcelo Tosatti from comment #30) > > > > > (In reply to Chao Yang from comment #29) > > > > > > I have keep VM running time-warp-test for 2 days. And this TSC host has -18 > > > > > > output from /var/lib/chrony/drift. After savevm/loadvm I saw 2 minutes > > > > > > warp(guest is 2 mins slower than host), but I didn't see any hang in guest. > > > > > > Can I say I have reproduced the original issue? If no, what further > > > > > > operation should I do? > > > > > > > > > > Where did you saw the warp exactly? In the output of time-warp-test? > > > > > > > > No, it was from date. And time in guest caught up soon. > > > > > > > > Warp test in guest reported: > > > > TSC: 2.36us, fail:0 > > > > TOD: 2.27us, fail:0 > > > > CLK: 2.25us, fail:0 > > > > > > > > Unless warp test reports failure in either TSC, TOD or CLK, I haven't > > > > reproduced it, right? > > > > > > > > savevm then loadvm cannot reproduce this bug on a host with a guest up for > > > > 13 days, with warp running in guest. > > > > > > > > Any suggestions for QE to reproduce and verify this bug? > > > > > > Do you recall what kernel version was being used in the host ? > > > > > > It should be easier to hit the bug with kernels < kernel-3.10.0-105.el7. > > > > I was using kernel-3.10.0-194.el7.x86_64. Retrying with kernels < > > kernel-3.10.0-105.el7. > > I tested with kernel-3.10.0-104.el7(both host and guest), after 24h uptime > the drift should be 1.27s, then savevm/loadvm, time-warp-test runs well and > no hang happens. Ok then, please mark the bug as verified (the standard kvmclock tests should be sufficient). According to comment35, set this bz as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0349.html The patch has been reverted in upstream. I want to know how about rhel downstream . Thanks https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02811.html |