Bug 2064160 - "failed to sync guest time" log spam in destination virt-launcher pod during VM live migration
Summary: "failed to sync guest time" log spam in destination virt-launcher pod during ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.9.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 4.13.3
Assignee: lpivarc
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-15 07:42 UTC by nijin ashok
Modified: 2023-09-20 16:22 UTC (History)
5 users (show)

Fixed In Version: hco-bundle-registry-container-v4.13.1.rhel9-79
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 14:09:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 9237 0 None Merged Dont flood logs on sync guest time 2023-05-30 11:04:41 UTC
Github kubevirt kubevirt pull 9381 0 None Merged [release-0.59] Dont flood logs on sync guest time 2023-07-31 12:15:29 UTC
Red Hat Issue Tracker CNV-16922 0 None None None 2022-12-15 08:41:27 UTC
Red Hat Product Errata RHSA-2023:4664 0 None None None 2023-08-16 14:10:13 UTC

Description nijin ashok 2022-03-15 07:42:56 UTC
Description of problem:

The destination domain will be in paused status during virtual machine live migration. However, it looks like virt-launcher is continuously trying to set the guest time when the VM is paused and results in the below events every second.

~~~
{"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"rhel8-organisational-pony","namespace":"nijin-cnv","pos":"manager.go:261","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-03-15T07:16:14.786101Z","uid":"31323a86-8f9d-425b-aec9-5ac514e8279d"}
{"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"rhel8-organisational-pony","namespace":"nijin-cnv","pos":"manager.go:261","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-03-15T07:16:15.785591Z","uid":"31323a86-8f9d-425b-aec9-5ac514e8279d"}
{"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"rhel8-organisational-pony","namespace":"nijin-cnv","pos":"manager.go:261","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-03-15T07:16:16.794454Z","uid":"31323a86-8f9d-425b-aec9-5ac514e8279d"}
~~~

The error is expected from the libvirtd when the VM is paused:

~~~
virsh -r list |grep -i nijin_test_2
 5    nijin_test_2           paused

virsh domtime nijin_test_2 --now
error: Requested operation is not valid: domain is not running
~~~


Version-Release number of selected component (if applicable):

kubevirt-hyperconverged-operator.v4.9.3

How reproducible:

100%

Steps to Reproduce:

1. Live migrate a virtual machine. 
2. Watch for the logs in the destination virt-launcher pod.

Actual results:

"failed to sync guest time" log spam in destination virt-launcher pod during VM live migration

Expected results:

These logs spam the destination virt-launcher pod and hence I think shouldn't be logged during the migration.

Additional info:

Comment 2 Itamar Holder 2022-05-16 09:21:24 UTC
Hey @nashok,

The log message comes from here: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/manager.go#L237.

Note that following the code I see that "setGuestTime()" function is called either when a VMI un-pauses or when a migration is finished, so I don't think the problem is that we log it during migration. Maybe it comes from the fact that right after migration this operation can fail therefore retried for a few times?

It's also important to ask - how spammy is that message? I mean, it's common to have such "error-retry" messages in our logs for a few times. Does it appear for 3 times or 30 times?

Thanks,
Itamar Holder.

Comment 3 nijin ashok 2022-05-31 12:41:30 UTC
(In reply to Itamar Holder from comment #2)
> Hey @nashok,
> 
> The log message comes from here:
> https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/
> manager.go#L237.
> 

I think this is called from here https://github.com/kubevirt/kubevirt/blob/5bf9394089f6790279aad9ac2001a7a230d34ec3/pkg/virt-handler/vm.go#L681 and this is an early stage of migration (domain detected in the target virt-launcher). The actual VM migration (migration of memory) starts after this and for a big VM, this is going to take time. During this time, the target virt-launcher will log this warning message every 1 second and will stop after a minute (ticker timeout is 60 seconds).

Output from target virt-launcher pod.
~~~
cluster-up/kubectl.sh logs  virt-launcher-vm-cirros-mdqn4|grep -i "failed to sync guest time" |wc -l
selecting docker as container runtime
60

cluster-up/kubectl.sh logs  virt-launcher-vm-cirros-mdqn4|grep -i "failed to sync guest time" |sed -n '1p;$p'
selecting docker as container runtime
{"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"vm-cirros","namespace":"default","pos":"manager.go:295","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-05-31T12:13:29.300151Z","uid":"6fd68580-a322-470b-a1ce-938b6f556cf1"}
{"component":"virt-launcher","kind":"","level":"error","msg":"failed to sync guest time","name":"vm-cirros","namespace":"default","pos":"manager.go:267","timestamp":"2022-05-31T12:14:28.299731Z","uid":"6fd68580-a322-470b-a1ce-938b6f556cf1"}
[root@rhel_83 kubevirt]#
~~~

> It's also important to ask - how spammy is that message? I mean, it's common
> to have such "error-retry" messages in our logs for a few times. Does it
> appear for 3 times or 30 times?
> 

It's very spammy and it logs 60 times for a long running migration.

It's easy to reproduce. Just reduce bandwidthPerMigration to a low value and watch the destination virt-launcher pod logs.

Comment 4 Kedar Bidarkar 2022-09-14 12:27:52 UTC
Deferring it to 4.13 due to Severity being medium and capacity.

Comment 7 Kedar Bidarkar 2023-06-21 12:21:15 UTC
Moving this to 4.13.3 as this needs time to investigate as per Virt Devs.

Comment 8 sgott 2023-06-21 12:21:43 UTC
Lubo, any progress on this?

Comment 10 lpivarc 2023-07-31 13:46:00 UTC
This needs to be retested with the latest release.

Comment 13 zhe peng 2023-08-03 08:03:06 UTC
verify with build:CNV-v4.13.3.rhel9-160

step same with comment 6

s1: default setting
$ oc get pod
NAME                            READY   STATUS      RESTARTS   AGE
virt-launcher-vm-fedora-9zvg5   1/1     Running     0          26s
virt-launcher-vm-fedora-hrcmz   0/1     Completed   0          3m9s
$ oc  logs virt-launcher-vm-fedora-9zvg5 | grep "failed to sync guest time" | wc -l
0

s2: low BW
migration takes more than 20 mins
$ oc logs virt-launcher-vm-fedora-hh5h5 | grep "failed to sync guest time"| wc -l
2


move to verified.

Comment 23 errata-xmlrpc 2023-08-16 14:09:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.13.3 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:4664


Note You need to log in before you can comment on or make changes to this bug.