Description of problem: The destination domain will be in paused status during virtual machine live migration. However, it looks like virt-launcher is continuously trying to set the guest time when the VM is paused and results in the below events every second. ~~~ {"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"rhel8-organisational-pony","namespace":"nijin-cnv","pos":"manager.go:261","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-03-15T07:16:14.786101Z","uid":"31323a86-8f9d-425b-aec9-5ac514e8279d"} {"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"rhel8-organisational-pony","namespace":"nijin-cnv","pos":"manager.go:261","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-03-15T07:16:15.785591Z","uid":"31323a86-8f9d-425b-aec9-5ac514e8279d"} {"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"rhel8-organisational-pony","namespace":"nijin-cnv","pos":"manager.go:261","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-03-15T07:16:16.794454Z","uid":"31323a86-8f9d-425b-aec9-5ac514e8279d"} ~~~ The error is expected from the libvirtd when the VM is paused: ~~~ virsh -r list |grep -i nijin_test_2 5 nijin_test_2 paused virsh domtime nijin_test_2 --now error: Requested operation is not valid: domain is not running ~~~ Version-Release number of selected component (if applicable): kubevirt-hyperconverged-operator.v4.9.3 How reproducible: 100% Steps to Reproduce: 1. Live migrate a virtual machine. 2. Watch for the logs in the destination virt-launcher pod. Actual results: "failed to sync guest time" log spam in destination virt-launcher pod during VM live migration Expected results: These logs spam the destination virt-launcher pod and hence I think shouldn't be logged during the migration. Additional info:
Hey @nashok, The log message comes from here: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/manager.go#L237. Note that following the code I see that "setGuestTime()" function is called either when a VMI un-pauses or when a migration is finished, so I don't think the problem is that we log it during migration. Maybe it comes from the fact that right after migration this operation can fail therefore retried for a few times? It's also important to ask - how spammy is that message? I mean, it's common to have such "error-retry" messages in our logs for a few times. Does it appear for 3 times or 30 times? Thanks, Itamar Holder.
(In reply to Itamar Holder from comment #2) > Hey @nashok, > > The log message comes from here: > https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/ > manager.go#L237. > I think this is called from here https://github.com/kubevirt/kubevirt/blob/5bf9394089f6790279aad9ac2001a7a230d34ec3/pkg/virt-handler/vm.go#L681 and this is an early stage of migration (domain detected in the target virt-launcher). The actual VM migration (migration of memory) starts after this and for a big VM, this is going to take time. During this time, the target virt-launcher will log this warning message every 1 second and will stop after a minute (ticker timeout is 60 seconds). Output from target virt-launcher pod. ~~~ cluster-up/kubectl.sh logs virt-launcher-vm-cirros-mdqn4|grep -i "failed to sync guest time" |wc -l selecting docker as container runtime 60 cluster-up/kubectl.sh logs virt-launcher-vm-cirros-mdqn4|grep -i "failed to sync guest time" |sed -n '1p;$p' selecting docker as container runtime {"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"vm-cirros","namespace":"default","pos":"manager.go:295","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-05-31T12:13:29.300151Z","uid":"6fd68580-a322-470b-a1ce-938b6f556cf1"} {"component":"virt-launcher","kind":"","level":"error","msg":"failed to sync guest time","name":"vm-cirros","namespace":"default","pos":"manager.go:267","timestamp":"2022-05-31T12:14:28.299731Z","uid":"6fd68580-a322-470b-a1ce-938b6f556cf1"} [root@rhel_83 kubevirt]# ~~~ > It's also important to ask - how spammy is that message? I mean, it's common > to have such "error-retry" messages in our logs for a few times. Does it > appear for 3 times or 30 times? > It's very spammy and it logs 60 times for a long running migration. It's easy to reproduce. Just reduce bandwidthPerMigration to a low value and watch the destination virt-launcher pod logs.
Deferring it to 4.13 due to Severity being medium and capacity.
Moving this to 4.13.3 as this needs time to investigate as per Virt Devs.
Lubo, any progress on this?
This needs to be retested with the latest release.
verify with build:CNV-v4.13.3.rhel9-160 step same with comment 6 s1: default setting $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm-fedora-9zvg5 1/1 Running 0 26s virt-launcher-vm-fedora-hrcmz 0/1 Completed 0 3m9s $ oc logs virt-launcher-vm-fedora-9zvg5 | grep "failed to sync guest time" | wc -l 0 s2: low BW migration takes more than 20 mins $ oc logs virt-launcher-vm-fedora-hh5h5 | grep "failed to sync guest time"| wc -l 2 move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.13.3 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:4664