Bug 1524412 - KVM clocks must not count time while host is suspended [NEEDINFO]
Summary: KVM clocks must not count time while host is suspended
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-11 12:52 UTC by Alan Jenkins
Modified: 2018-08-29 15:23 UTC (History)
21 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-08-29 15:23:43 UTC
Type: Bug
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)

Description Alan Jenkins 2017-12-11 12:52:43 UTC
Description of problem:

When resuming a KVM host, guests observe a jump in the clock which triggers timeouts.


Version-Release number of selected component (if applicable):
4.13.16-302.fc27.x86_64


How reproducible:
I can see easily weirdness after suspending a few minutes.  I might be missing something though, suspending for just 7 minutes did not trigger the systemd watchdog timer which claims to be triggered at 3 minutes.

---

Reduced reproducer:

1. Boot guest using init=/bin/bash
2. `date; echo test > /dev/kmsg`
3. Suspend host for 30 seconds+
4. Resume host
5. `date; echo test > /dev/kmsg`


Actual results:

30 seconds+ gap between timestamps, and clock is in sync with the host both before and after

Mon 11 Dec 12:41:08 GMT 2017
[  205.123381] test

Mon 11 Dec 12:41:49 GMT 2017
[  245.888211] test


Expected results:

Timestamps should be separated by only a few seconds, accounting for the time to manually suspend the host & rerun the command.  Also, I do not expect the clock to be in sync with the host afterwards.

E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=1264258#c12

I have read KVM developers talk about at least one of the VM pause states, and say that emulated hardware clocks must not count time spent in this state.[1]  I believe the reasoning for this should apply to all of "pause", VM snapshots, and host suspend.  Unfortunately this does not seem to be the case with host suspend.  (If anyone likes to send ^Z / SIGSTOP to qemu, I leave it to them to suggest what should be done in that case :).

Then, software which sets timeouts, especially using CLOCK_MONOTONIC, would not be see them expiring in the time lost during host suspend.

Adjustments to guest wall clock time and refreshing DHCP leases, for example, can still be handled by VM-aware userspace if desired.


Additional info:

The guest uses kvm-clock:

$ journalctl -k|grep 'Switched to clocksource'
Dec 11 11:04:46 fedora27-vm kernel: clocksource: Switched to clocksource kvm-clock

---

Steps to reproduce how this causes problems:

1. Run Linux VM in virt-manager (Fedora is fine)
2. Suspend host for a while - 7 minutes, or several hours.
3. Resume host
4. Check guest logs for madness


Actual results:

Suspending for 7 minutes:

Dec 11 11:37:08 fedora27-vm rtkit-daemon[594]: The canary thread is apparently starving. Taking action.
Dec 11 11:37:08 fedora27-vm rtkit-daemon[594]: Demoting known real-time threads.
Dec 11 11:37:08 fedora27-vm rtkit-daemon[594]: Successfully demoted thread 1086 of process 1053 (/usr/bin/pulseaudio).
Dec 11 11:37:08 fedora27-vm rtkit-daemon[594]: Successfully demoted thread 1077 of process 1053 (/usr/bin/pulseaudio).
Dec 11 11:37:08 fedora27-vm rtkit-daemon[594]: Successfully demoted thread 1053 of process 1053 (/usr/bin/pulseaudio).
Dec 11 11:37:08 fedora27-vm rtkit-daemon[594]: Demoted 3 threads.

Dec 11 11:37:08 fedora27-vm chronyd[631]: Forward time jump detected!


Suspending for ten hours:

Dec 11 09:38:19 fedora27-vm dmeventd[445]: dmeventd was idle for 35527 second(s), exiting.
Dec 11 09:38:19 fedora27-vm dmeventd[445]: dmeventd shutting down.

Dec 11 09:38:19 fedora27-vm systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Dec 11 09:38:19 fedora27-vm systemd[1]: systemd-journald.service: Killing process 436 (systemd-journal) with signal SIGABRT.
Dec 11 09:38:26 fedora27-vm systemd[1]: Starting Flush Journal to Persistent Storage...
Dec 11 09:38:26 fedora27-vm audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-jour
Dec 11 09:38:26 fedora27-vm sedispatch[555]: Connection Error (Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory): AV
Dec 11 09:38:26 fedora27-vm systemd[1]: Started Flush Journal to Persistent Storage.
Dec 11 09:38:26 fedora27-vm audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-jour
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: MESSAGE=Process 436 (systemd-journal) of user 0 dumped core.
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.9e8537fe5b1142a78a730a9f07171
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: Stack trace of thread 436:
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #0  0x00007f1f4183ad5c fsync (libpthread.so.0)
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #1  0x00007f1f41d7f6e6 journal_file_set_online (libsystemd-shared-235.so)
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #2  0x00007f1f41d8219a journal_file_append_object (libsystemd-shared-235.so)
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #3  0x00007f1f41d83a18 journal_file_append_data (libsystemd-shared-235.so)
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #4  0x00007f1f41d851d0 journal_file_append_entry (libsystemd-shared-235.so)
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #5  0x00005560454f92f0 write_to_journal (systemd-journald)
Dec 11 09:38:26 fedora27-vm systemd-coredump[623]: #6  0x00005560454fc5ab dispatch_message_real (systemd-journald)
Dec 11 09:38:26 fedora27-vm kernel: systemd-coredum: 10 output lines suppressed due to ratelimiting

Udev also dies with SIGABRT (I checked with `coredumpctl gdb`), I guess the message about udevs watchdog is just lost:

Dec 11 09:38:26 fedora27-vm systemd-coredump[624]: Process 462 (systemd-udevd) of user 0 dumped core.                                                   
                                                   Stack trace of thread 462:
                                                   #0  0x00007fc228120f80 epoll_pwait (libc.so.6)
                                                   #1  0x00007fc228e0b76c sd_event_wait (libsystemd-shared-235.so)
                                                   #2  0x00007fc228e0c0f1 sd_event_run (libsystemd-shared-235.so)
                                                   #3  0x00007fc228e0c26c sd_event_loop (libsystemd-shared-235.so)
                                                   #4  0x0000564d35ea7d37 run (systemd-udevd)
                                                   #5  0x0000564d35ea8682 main (systemd-udevd)
                                                   #6  0x00007fc22802803a __libc_start_main (libc.so.6)
                                                   #7  0x0000564d35ea139a _start (systemd-udevd)


The tests also cause the TSC to be detected as "unstable"; it's possible fixing this bug might also help with that.

Dec 11 09:38:26 fedora27-vm kernel: clocksource:                       'kvm-clock' wd_now: 27425e94b572 wd_last: 996d30c40d3 mask: ffffffffffffffff
Dec 11 09:38:26 fedora27-vm kernel: clocksource:                       'tsc' cs_now: 1601607929c3 cs_last: 160095293aae mask: ffffffffffffffff
Dec 11 09:38:26 fedora27-vm kernel: tsc: Marking TSC unstable due to clocksource watchdog

Comment 1 Alan Jenkins 2017-12-11 13:36:38 UTC
Sorry, wrong version.

4.13.16-302.fc27.x86_64 was the kernel used on the guest

4.14.3-300.fc27.x86_64 is the kernel used on the host, which I would say is more culpable.

qemu-kvm-2.10.1-1.fc27.x86_64 is also used on the host.


It is also possible to test "sleep 30".  In my test this returns immediately after resume (assuming the sleep lasted >30s), mirroring the timeouts shown in the logs.  strace shows that sleep is using nanosleep(), which refers to CLOCK_MONOTONIC on linux.

Comment 2 Laura Abbott 2017-12-11 16:54:46 UTC
Fedora isn't really doing anything special here so this is best tracked as a feature request/bug with the upstream KVM project.

Comment 3 Alan Jenkins 2017-12-11 18:02:32 UTC
That makes sense :).  Thanks for triage.

I wanted to write this up, but I don't think I have time right now, to reduce this report to the qemu level and install upstream qemu etc.

(It's annoying even to set "init=/bin/bash".  For some reason, running qemu-kvm specifying `-m` and `-drive`, I can boot the system but the GRUB menu doesn't accept keyboard input.  Linux is fine once it's booted.  qemu-kvm is version 2.10.1-1.fc27.x86_64).

CPU is Intel i5-5300U.

Comment 4 Laura Abbott 2018-02-20 20:08:22 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  As kernel maintainers, we try to keep up with bugzilla but due the rate at which the upstream kernel project moves, bugs may be fixed without any indication to us. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.
 
Fedora 27 has now been rebased to 4.15.3-300.f27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 5 Alan Jenkins 2018-03-19 14:53:14 UTC
Bugzilla really wants me to confirm.  So yes, as late as yesterday I'm still getting expired watchdog timers when suspending the host, and my kernel is:

4.15.9-300.fc27.x86_64

Comment 6 Mikhail Zabaluev 2018-05-24 05:11:28 UTC
I observe this in a Fedora 28 Atomic guest VM with ptp_kvm configured as a clock source for chronyd:

refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0

The host is also Fedora 28 running chronyd.

Comment 7 Justin M. Forbes 2018-07-23 15:33:50 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 8 Justin M. Forbes 2018-08-29 15:23:43 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.