Bug 1552092
Summary: | request to qemu-guest-agent prevent other calls to qemu monitor | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Roman Hodain <rhodain> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | fjin, gveitmic, michal.skrivanek, mkalinin, mprivozn, pkrempa, pstehlik, rhodain, xuzhang, yafu, yanqzhan |
Target Milestone: | rc | Keywords: | Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-4.5.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-30 09:53:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1613514 |
Description
Roman Hodain
2018-03-06 13:45:14 UTC
Not sure we can do much about this. The 'modify' type job while the guest-agent is executing is correct, since the API is modifying state of the VM. On the other hand it should not be a problem to query the actual monitor in this case. It feels a bit strange to me that we need to prevent changes to the entire virDomainObjPtr while a guest agent command is taking place. No matter whether fsfreeze command passes/fails, it is not modifying any aspect of the QEMU guest state that we care about in virDomainObjPtr. All we do is check whether the guest is running or not. At least for the virDomainFSFreeze/Thaw APIs I don't see a strong reason why we need to hold the job lock for duration of the agent command in these cases. The virDomainSnapshotCreate API is harder though, as that's using the job lock to protect a whole lot of work related to the snapshot, not merely the agent command execution. For virConnectGetAllDomainStats() perhaps we need to think about some concept like a flag added to the API "VIR_DOMAIN_GET_STATS_BEST_EFFORT" which would allow the app to tell libvirt to just silently not report stats that aren't quickly available due to job lock being held ? (In reply to Daniel Berrange from comment #3) > For virConnectGetAllDomainStats() perhaps we need to think about some > concept like a flag added to the API "VIR_DOMAIN_GET_STATS_BEST_EFFORT" > which would allow the app to tell libvirt to just silently not report stats > that aren't quickly available due to job lock being held ? Well it's not as easy, since we don't annotate most of the places as allowing concurrent jobs (unless it's an async job). In this particular case it'd be possible since we are talking on a different monitor, but generally it's need some auditing. Reproduced with libvirt-3.9.0-14.el7_5.2.x86_64. Steps: 1.Start a guest 2. Add memory stress in guest 3. Do snapshot and "virsh domstats": # virsh snapshot-create-as foo --memspec file=/tmp/sn12.mem --diskspec vda --name sn12 & date;virsh domstats foo;date [1] 11353 Wed Mar 28 06:48:52 EDT 2018 Domain snapshot sn12 created Domain: 'foo' state.state=1 state.reason=5 cpu.time=182688791670 cpu.user=6480000000 cpu.system=21650000000 balloon.current=824000 balloon.maximum=1024000 balloon.swap_in=0 balloon.swap_out=0 balloon.major_fault=188 balloon.minor_fault=161033 balloon.unused=871608 balloon.available=990712 balloon.usable=819384 balloon.last-update=1522230762 balloon.rss=1055388 vcpu.current=2 vcpu.maximum=4 vcpu.0.state=1 vcpu.0.time=97670000000 vcpu.0.wait=0 vcpu.1.state=1 vcpu.1.time=55470000000 vcpu.1.wait=0 net.count=1 net.0.name=vnet0 net.0.rx.bytes=2927519 net.0.rx.pkts=4004 net.0.rx.errs=0 net.0.rx.drop=0 net.0.tx.bytes=73963 net.0.tx.pkts=1099 net.0.tx.errs=0 net.0.tx.drop=0 block.count=1 block.0.name=vda block.0.path=/nfs/RHEL-7.5-x86_64-latest.sn12 block.0.rd.reqs=121401 block.0.rd.bytes=3790876160 block.0.rd.times=2379883463253 block.0.wr.reqs=25336 block.0.wr.bytes=5261135872 block.0.wr.times=48958875859986 block.0.fl.reqs=2818 block.0.fl.times=22346022905 block.0.allocation=0 block.0.capacity=10737418240 block.0.physical=200704 [1]+ Done virsh snapshot-create-as foo --memspec file=/tmp/sn12.mem --diskspec vda --name sn12 Patches proposed upstream: https://www.redhat.com/archives/libvir-list/2018-June/msg00546.html To POST: 9486ed9071 (HEAD -> master, origin/master, origin/HEAD, domStats_v2) virsh: Introduce --nowait to domstats 0afbeb3740 Introduce VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT f38ea75b99 qemu_domain: Introduce qemuDomainObjBeginJobNowait af0715bb61 qemuDomainObjBeginJobInternal: Remove spurious @ret assignment dd92df93f4 qemu_domain: Document qemuDomainObjBeginJob v4.4.0-275-g9486ed9071 Verify with: libvirt-4.5.0-3.el7.x86_64 qemu-kvm-rhev-2.12.0-7.el7.x86_64 Steps to verify: 1.# virsh domstats --help|grep nowait domstats [--state]... [--nowait]... --nowait report only stats that are accessible instantly 2. # man virsh domstats [--raw] [--enforce] [--backing] [--nowait] ... ... When collecting stats libvirtd may wait for some time if there's already another job running on given domain for it to finish. This may cause unnecessary delay in delivering stats. Using --nowait suppresses this behaviour. On the other hand some statistics might be missing for such domain. 3.If --nowait not used, as comment 7, domstats will return info after a time interval, here it is about 17s, and domstats prints fully. # virsh snapshot-create-as V-7.6 --memspec file=/tmp/sp4.mem --diskspec vda --name sp4 & date;virsh domstats V-7.6;date [1] 19063 Tue Jul 17 02:51:00 EDT 2018 Domain snapshot sp4 created Domain: 'V-7.6' state.state=1 state.reason=5 cpu.time=233607095188 cpu.user=7820000000 cpu.system=40160000000 balloon.current=1048576 balloon.maximum=1048576 balloon.swap_in=0 balloon.swap_out=0 balloon.major_fault=192 balloon.minor_fault=153997 balloon.unused=897976 balloon.available=1014976 balloon.usable=848624 balloon.last-update=1531809592 balloon.rss=1195312 vcpu.current=1 vcpu.maximum=1 vcpu.0.state=1 vcpu.0.time=187580000000 vcpu.0.wait=0 net.count=1 net.0.name=vnet0 net.0.rx.bytes=15076943 net.0.rx.pkts=4489 net.0.rx.errs=0 net.0.rx.drop=0 net.0.tx.bytes=207189 net.0.tx.pkts=3007 net.0.tx.errs=0 net.0.tx.drop=0 block.count=1 block.0.name=vda block.0.path=/var/lib/libvirt/images/V-7.6.sp4 block.0.rd.reqs=171597 block.0.rd.bytes=5586383872 block.0.rd.times=29725084055 block.0.wr.reqs=24804 block.0.wr.bytes=7596663808 block.0.wr.times=1540675454432 block.0.fl.reqs=201 block.0.fl.times=13847416259 block.0.allocation=327680 block.0.capacity=10737418240 block.0.physical=266240 [1]+ Done virsh snapshot-create-as V-7.6 --memspec file=/tmp/sp4.mem --diskspec vda --name sp4 Tue Jul 17 02:51:17 EDT 2018 4.If use --nowait, the time interval is about 0s, and it report only stats that are accessible instantly. # virsh snapshot-create-as V-7.6 --memspec file=/tmp/sp7.mem --diskspec vda --name sp7 & date;virsh domstats V-7.6 --nowait;date [1] 22690 Tue Jul 17 04:36:59 EDT 2018 Domain: 'V-7.6' state.state=1 state.reason=5 cpu.time=1654444213784 cpu.user=19350000000 cpu.system=98160000000 balloon.current=1048576 balloon.maximum=1048576 vcpu.current=1 vcpu.maximum=1 vcpu.0.state=1 vcpu.0.time=1543480000000 vcpu.0.wait=0 net.count=1 net.0.name=vnet0 net.0.rx.bytes=15220637 net.0.rx.pkts=7190 net.0.rx.errs=0 net.0.rx.drop=0 net.0.tx.bytes=214497 net.0.tx.pkts=3092 net.0.tx.errs=0 net.0.tx.drop=0 block.count=1 block.0.name=vda block.0.path=/var/lib/libvirt/images/V-7.6.sp6 Tue Jul 17 04:36:59 EDT 2018 [root@hp-*** qemu]# Domain snapshot sp7 created [1]+ Done virsh snapshot-create-as V-7.6 --memspec file=/tmp/sp7.mem --diskspec vda --name sp7 5.Diff the qemuDomainObj msg in the two libvirtd.log between step3 and step4, the scenario without'--nowait' has following extra logs: ... debug : qemuDomainObjBeginJobInternal:6479 : Waiting for async job (vm=0x7fc8c0000b10 name=V-7.6) … debug : qemuDomainObjBeginJobInternal:6507 : Started job: query (async=none vm=0x7fc8c0000b10 name=V-7.6) debug : qemuDomainObjEnterMonitorInternal:6899 : Entering monitor (mon=0x7fc8e0016f40 vm=0x7fc8c0000b10 name=V-7.6) debug : qemuDomainObjExitMonitorInternal:6922 : Exited monitor (mon=0x7fc8e0016f40 vm=0x7fc8c0000b10 name=V-7.6) debug : qemuDomainObjEnterMonitorInternal:6899 : Entering monitor (mon=0x7fc8e0016f40 vm=0x7fc8c0000b10 name=V-7.6) debug : qemuDomainObjExitMonitorInternal:6922 : Exited monitor (mon=0x7fc8e0016f40 vm=0x7fc8c0000b10 name=V-7.6) debug : qemuDomainObjEndJob:6783 : Stopping job: query (async=none vm=0x7fc8c0000b10 name=V-7.6) Hi Roman, The issue fixed by patches is verified in comment 12. Do you think it's acceptable and if a rhv bug is needed to apply this "--no-wait" option as default in rhv? Thx. (In reply to yanqzhan from comment #15) > Hi Roman, > > The issue fixed by patches is verified in comment 12. Do you think it's > acceptable and if a rhv bug is needed to apply this "--no-wait" option as > default in rhv? > > Thx. From libvirt's perspective it makes sense to use it. Esp. if vdsm is querying for stats every 2 seconds AND is brittle when it comes to hiccups. The reason why it isn't the default in libvirt is that it changes API semantic. If your application was okay with slightly delayed (but complete) stats, with an update all of a sudden you'd get an incomplete (but immediate) stats. That is the reason why we can not switch the default behaviour. Thanks Michal. Hi Pavel, Pls refer to comment 16: If RHV product needs complete stats, and can accept slightly delay, keep default call("virsh domstats") is ok; If RHV product only needs a immediate return to make sure the guest is running well, but not the complete stats,a RHV bug is suggested to open to apply 'virsh domstats --nowait' as default call in RHV. Which one do you think is needed for RHV pls? Thx. So, I filed the RFE for RHV to consume this change: bz#1613514 And this is basically based on another urgent RHV bug: https://bugzilla.redhat.com/show_bug.cgi?id=1572801#c28 But if you are asking, I am not sure as of today, once we have the fsfreeze problem resolved, we really need this fix in RHV. Michal / Pavel / Roman / everyone else, For your consideration: https://bugzilla.redhat.com/show_bug.cgi?id=1613514 (In reply to Michal Privoznik from comment #16) > From libvirt's perspective it makes sense to use it. Esp. if vdsm is > querying for stats every 2 seconds AND is brittle when it comes to hiccups. > The reason why it isn't the default in libvirt is that it changes API > semantic. If your application was okay with slightly delayed (but complete) > stats, with an update all of a sudden you'd get an incomplete (but > immediate) stats. That is the reason why we can not switch the default > behaviour. We cannot change core behavior based on such amvague description. How incomplete those stats could be? Which of them are optional? Will they be missing or contain stale values? What is a “slight delay”? (In reply to Michal Skrivanek from comment #19) > (In reply to Michal Privoznik from comment #16) > > From libvirt's perspective it makes sense to use it. Esp. if vdsm is > > querying for stats every 2 seconds AND is brittle when it comes to hiccups. > > The reason why it isn't the default in libvirt is that it changes API > > semantic. If your application was okay with slightly delayed (but complete) > > stats, with an update all of a sudden you'd get an incomplete (but > > immediate) stats. That is the reason why we can not switch the default > > behaviour. > > We cannot change core behavior based on such amvague description. How > incomplete those stats could be? Which of them are optional? Will they be > missing or contain stale values? What is a “slight delay”? So whenever stats api is issued libvirt gathers info from various places, e.g. cgroups, /proc, ovs, etc. But some require talking to qemu monitor (e.g. balloon info). And for those we have to mutually exclude other threads which are trying to talk on monitor too. The flag I added merely does this: if we are fetching stats and as part of that we want to talk to qemu but there's already somebody else (possibly staled) skip to next family of stats. Therefore, you might be missing stats that require talking to qemu. Slight delay means (in the light of what I described above) the time it takes for stats querying thread to get to talk on the monitor to qemu. BTW: as part of the fix for this bug I've broken down the qemu monitor lock into two smaller locks to enhance parallelism. For instance, talking on guest agent monitor can now run in parallel with talking on qemu monitor. (In reply to Michal Privoznik from comment #20) > So whenever stats api is issued libvirt gathers info from various places, > e.g. cgroups, /proc, ovs, etc. But some require talking to qemu monitor > (e.g. balloon info). And for those we have to mutually exclude other threads > which are trying to talk on monitor too. The flag I added merely does this: > if we are fetching stats and as part of that we want to talk to qemu but > there's already somebody else (possibly staled) skip to next family of stats. so it is omitted then, correct? If that concurrent monitor access gets stuck (as it does in the original bug) it's going to skip over indefinitely, I assume? > Therefore, you might be missing stats that require talking to qemu. These are the ones we use in oVirt: libvirt.VIR_DOMAIN_STATS_STATE | libvirt.VIR_DOMAIN_STATS_CPU_TOTAL | libvirt.VIR_DOMAIN_STATS_BALLOON | libvirt.VIR_DOMAIN_STATS_VCPU | libvirt.VIR_DOMAIN_STATS_INTERFACE | libvirt.VIR_DOMAIN_STATS_BLOCK can you qualify them? Or refer to the right place where we can see what could potentially be missing and what we can depend on? > Slight delay means (in the light of what I described above) the time it > takes for stats querying thread to get to talk on the monitor to qemu. right, but that's the same delay as today (when there's no concurrent monitor call), right? I assume we do not skip the monitor call when there's no conflict and so it's going to take the same time with or without the new flag. > BTW: as part of the fix for this bug I've broken down the qemu monitor lock > into two smaller locks to enhance parallelism. For instance, talking on > guest agent monitor can now run in parallel with talking on qemu monitor. great, that could be quite a significant improvement. Does it apply to all the calls (with or without the new flag)? (In reply to Michal Skrivanek from comment #21) > (In reply to Michal Privoznik from comment #20) > > So whenever stats api is issued libvirt gathers info from various places, > > e.g. cgroups, /proc, ovs, etc. But some require talking to qemu monitor > > (e.g. balloon info). And for those we have to mutually exclude other threads > > which are trying to talk on monitor too. The flag I added merely does this: > > if we are fetching stats and as part of that we want to talk to qemu but > > there's already somebody else (possibly staled) skip to next family of stats. > > so it is omitted then, correct? If that concurrent monitor access gets stuck > (as it does in the original bug) it's going to skip over indefinitely, I > assume? Well, this was the reasoning for splitting the mutex. In the original report problem was that one mutex was shared between agent and monitor accesses. Therefore, when guest-fs-freeze was executed (which can take a long time to finish) all monitor accesses had to wait. Now that the mutex is split this is no longer the case. In addition to that, new flag was introduced to stats APIs so that even when a thread is stuck in monitor (this hasn't been observed though. yet) mgmt app can work around this by issuing the stats API with the flag set. > > > Therefore, you might be missing stats that require talking to qemu. > > These are the ones we use in oVirt: > libvirt.VIR_DOMAIN_STATS_STATE | Always in. > libvirt.VIR_DOMAIN_STATS_CPU_TOTAL | Always in. > libvirt.VIR_DOMAIN_STATS_BALLOON | Might miss some fields that require monitor access (swap_in, swap_out, major_fault, minor_fault, unused, available, rss, last-update, usable, disk_caches) > libvirt.VIR_DOMAIN_STATS_VCPU | Always in. > libvirt.VIR_DOMAIN_STATS_INTERFACE | Always in. > libvirt.VIR_DOMAIN_STATS_BLOCK Might be left out completely. > > can you qualify them? Or refer to the right place where we can see what > could potentially be missing and what we can depend on? You want to look at qemuDomainGetStatsWorkers[] (in qemu_drier.c) and one by one inspect if the callbacks listed there require HAVE_JOB() in order to fetch stats. For instance, qemuDomainGetStatsState() doesn't but qemuDomainGetStatsBlock does. > > > Slight delay means (in the light of what I described above) the time it > > takes for stats querying thread to get to talk on the monitor to qemu. > > right, but that's the same delay as today (when there's no concurrent > monitor call), right? I assume we do not skip the monitor call when there's > no conflict and so it's going to take the same time with or without the new > flag. > Correct. > > BTW: as part of the fix for this bug I've broken down the qemu monitor lock > > into two smaller locks to enhance parallelism. For instance, talking on > > guest agent monitor can now run in parallel with talking on qemu monitor. > > great, that could be quite a significant improvement. Does it apply to all > the calls (with or without the new flag)? Yes it does. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |