Bug 2064160
Summary: | "failed to sync guest time" log spam in destination virt-launcher pod during VM live migration | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | nijin ashok <nashok> |
Component: | Virtualization | Assignee: | lpivarc |
Status: | CLOSED ERRATA | QA Contact: | zhe peng <zpeng> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.9.3 | CC: | acardace, iholder, kbidarka, lpivarc, sgott |
Target Milestone: | --- | ||
Target Release: | 4.13.3 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | hco-bundle-registry-container-v4.13.1.rhel9-79 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-08-16 14:09:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
nijin ashok
2022-03-15 07:42:56 UTC
Hey @nashok, The log message comes from here: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/manager.go#L237. Note that following the code I see that "setGuestTime()" function is called either when a VMI un-pauses or when a migration is finished, so I don't think the problem is that we log it during migration. Maybe it comes from the fact that right after migration this operation can fail therefore retried for a few times? It's also important to ask - how spammy is that message? I mean, it's common to have such "error-retry" messages in our logs for a few times. Does it appear for 3 times or 30 times? Thanks, Itamar Holder. (In reply to Itamar Holder from comment #2) > Hey @nashok, > > The log message comes from here: > https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/ > manager.go#L237. > I think this is called from here https://github.com/kubevirt/kubevirt/blob/5bf9394089f6790279aad9ac2001a7a230d34ec3/pkg/virt-handler/vm.go#L681 and this is an early stage of migration (domain detected in the target virt-launcher). The actual VM migration (migration of memory) starts after this and for a big VM, this is going to take time. During this time, the target virt-launcher will log this warning message every 1 second and will stop after a minute (ticker timeout is 60 seconds). Output from target virt-launcher pod. ~~~ cluster-up/kubectl.sh logs virt-launcher-vm-cirros-mdqn4|grep -i "failed to sync guest time" |wc -l selecting docker as container runtime 60 cluster-up/kubectl.sh logs virt-launcher-vm-cirros-mdqn4|grep -i "failed to sync guest time" |sed -n '1p;$p' selecting docker as container runtime {"component":"virt-launcher","kind":"","level":"warning","msg":"failed to sync guest time","name":"vm-cirros","namespace":"default","pos":"manager.go:295","reason":"virError(Code=55, Domain=10, Message='Requested operation is not valid: domain is not running')","timestamp":"2022-05-31T12:13:29.300151Z","uid":"6fd68580-a322-470b-a1ce-938b6f556cf1"} {"component":"virt-launcher","kind":"","level":"error","msg":"failed to sync guest time","name":"vm-cirros","namespace":"default","pos":"manager.go:267","timestamp":"2022-05-31T12:14:28.299731Z","uid":"6fd68580-a322-470b-a1ce-938b6f556cf1"} [root@rhel_83 kubevirt]# ~~~ > It's also important to ask - how spammy is that message? I mean, it's common > to have such "error-retry" messages in our logs for a few times. Does it > appear for 3 times or 30 times? > It's very spammy and it logs 60 times for a long running migration. It's easy to reproduce. Just reduce bandwidthPerMigration to a low value and watch the destination virt-launcher pod logs. Deferring it to 4.13 due to Severity being medium and capacity. Moving this to 4.13.3 as this needs time to investigate as per Virt Devs. Lubo, any progress on this? This needs to be retested with the latest release. verify with build:CNV-v4.13.3.rhel9-160 step same with comment 6 s1: default setting $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm-fedora-9zvg5 1/1 Running 0 26s virt-launcher-vm-fedora-hrcmz 0/1 Completed 0 3m9s $ oc logs virt-launcher-vm-fedora-9zvg5 | grep "failed to sync guest time" | wc -l 0 s2: low BW migration takes more than 20 mins $ oc logs virt-launcher-vm-fedora-hh5h5 | grep "failed to sync guest time"| wc -l 2 move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.13.3 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:4664 |