1040507 – Inconsistent blockInfo values when the VM is finalizing live migration

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1040507 - Inconsistent blockInfo values when the VM is finalizing live migration

Summary: Inconsistent blockInfo values when the VM is finalizing live migration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	John Ferlan
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1002699 1038069 1059904 1065531
TreeView+	depends on / blocked

Reported:	2013-12-11 14:31 UTC by Federico Simoncelli
Modified:	2016-04-26 15:09 UTC (History)
CC List:	20 users (show)
Fixed In Version:	libvirt-0.10.2-30.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1038069
Clones:	1065531 (view as bug list)
Environment:
Last Closed:	2014-10-14 04:18:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1374	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2014-10-14 08:11:54 UTC

Comment 1 Federico Simoncelli 2013-12-11 17:47:10 UTC

Description of problem:
It appears that blockInfo is returning inconsistent information when a VM is about to be destroyed at the end of live migration. The issue has been found on a rhev deployment running continuous migrations between two hosts.

Almost every time the migration is about to finish blockInfo reported allocation == physical:

Thread-547529::INFO::2013-12-04 00:01:43,671::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 42949672960 capacity: 16106127360, alloc: 42949672960, phys: 42949672960
Thread-547614::INFO::2013-12-04 00:05:19,606::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 44023414784 capacity: 16106127360, alloc: 44023414784, phys: 44023414784
Thread-547802::INFO::2013-12-04 00:08:52,475::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 45097156608 capacity: 16106127360, alloc: 45097156608, phys: 45097156608
Thread-548505::INFO::2013-12-04 00:27:03,603::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 46170898432 capacity: 16106127360, alloc: 46170898432, phys: 46170898432
Thread-548705::INFO::2013-12-04 00:29:04,005::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 47244640256 capacity: 16106127360, alloc: 47244640256, phys: 47244640256
Thread-549010::INFO::2013-12-04 00:37:16,096::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 48318382080 capacity: 16106127360, alloc: 48318382080, phys: 48318382080
Thread-549184::INFO::2013-12-04 00:41:13,268::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 49392123904 capacity: 16106127360, alloc: 49392123904, phys: 49392123904
Thread-549272::INFO::2013-12-04 00:43:22,801::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 50465865728 capacity: 16106127360, alloc: 50465865728, phys: 50465865728
Thread-549360::INFO::2013-12-04 00:47:26,494::vm::521::vm.Vm::(_highWrite) vmId=`05563fbc-6b2c-4f4e-9096-8daeb9754aa2`::d5a781d6-a319-4d9f-9d1a-1f776f709911/8cc85f7a-a7df-4566-9817-0f61b227660a apparent: 51539607552 capacity: 16106127360, alloc: 51539607552, phys: 51539607552

I haven't investigated if this is a libvirt or qemu problem yet. Feel free to reassign the bug eventually.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-29.el6.x86_64
qemu-img-rhev-0.12.1.2-2.414.el6.x86_64

How reproducible:
90%

Steps to Reproduce (untested):
1. migrate a virtual machine from one host to another
2. get blockInfo on a (qcow) disk when the vm is about to be destroyed (end of live migration)

Actual results:
blockInfo reports allocation == physical (virDomainBlockInfo).

Expected results:
blockInfo should either report the correct information or fail.

Comment 2 Eric Blake 2013-12-11 23:37:48 UTC

Hmm.  I think the blockInfo code in qemuDriver always attempts a qemu monitor query, then gracefully falls back to hardcoding capacity = allocation if qemu didn't answer.  This solution works around older qemu that can't query capacity, but differs from most commands that fail if the domain has quit in the meantime.  So maybe we need to update our qemu_capabilities matrix to determine whether qemu is new enough to support the capacity query; then fix the code to only attempt the query when capable, rather than always, so that we can then treat qemu failure as command failure, rather than fallback path, and all without losing the fallback path when qemu is not capable

Comment 3 Eric Blake 2013-12-11 23:40:50 UTC

For that matter, I seem to recall that we didn't like querying qemu on every blockInfo request anyways, because it could block the query command; and so in newer qemu we hooked up an event that fired any time the block info changed.  In that situation, the query never goes out to qemu, but instead uses the data cached from the last time the event fired.  I wonder if this is a case of not having the newer event code backported to the qemu/libvirt version combo being tested; but it may also be a case of libvirt losing the cached data when the qemu monitor goes away.  This is all off the top of my head, without actually looking at the code, but just from my guess, it sounds like this is libvirt's problem to solve.

Comment 4 John Ferlan 2013-12-12 21:24:39 UTC

Since you have an environment ready/working and I don't - is it possible to enable some libvirtd debugging? As described here:

http://libvirt.org/logging.html

I think setting:

log_level = 1
log_filters="3:remote 4:event 3:json 3:rpc"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

Should do the trick. I don't need too much data just a single pass through and the libvirtd.log output when you're getting back the "bad values".

What I'm looking to determine is the order of things. It's not perfectly clear from the description, but it seems as though from the libvirt side of things at least that there's a "migration" followed by a "get block info" on the domain. When that get is done, the allocation == physical which is unexpected from your point of view. Because of that "some code" is triggered to extend the LV on/for the storage of the source domain.

From a scan of the libvirt code, what I see is the following

qemuDomainGetBlockInfo() will set allocation == physical, then...

/* ..but if guest is running & not using raw
disk format and on a block device, then query
highest allocated extent from QEMU */

If we get past that test, then we begin a job to query the monitor for data:

qemuMonitorGetBlockExtent(... &allocation)

That will (hopefully) call the JSON parsing code:

qemuMonitorJSONGetBlockExtent()

which does a "query-blockstats" query in order to attempt to get the "wr_highest_offset" from the response and return it.

If the libvirtd.log doesn't show a path into this code when you get a return of allocation==physical, then perhaps that has to do with (as I think someone alluded to either in this case or the one blocked by this) an ordering and synchronization issue of "when" the data can/should be fetched.

Based on what I read here and I had in a discussion with Dave Allan is that there's a 'thread' that continually gets the blockInfo for running guests. That thread isn't synchronized with the actual migration start/end - it's just getting data for running guests with the goal of making sure to do an LVM extend when/if allocation gets too close to physical. Suffice to say libvirt isn't doing the extend, right?

Without getting too technical - when a guest is migrating from source to target it goes through a sequence of steps to copy over memory, storage, etc. from source to target. Eventually it gets to a point in time when things need to "freeze" because control from source needs to change to the target. During this time "last minute" dirty blocks and/or storage changes could be copied. Once the "final bits" are copied over, the target "accepts" the responsibility for the running guest and thus the source can/will destroy the guest. Technically it's told to, but that's a minor detail. The guest on the source still needs to be "on" or at least available to be restarted because if something goes wrong on the target, the source will reassert control. However, any query to the source in this period would have "limitations". I also think "expecting" to somehow capture (cache) the "last known state" of every data point prior to making that cutoff so that some (unknown) query afterwards gets the last set of data in that point in time could be (to say the least) a never ending task...

Anyway, it seems perhaps you are indicating that within this window of time when the blockInfo thread runs it determines the guest is still running (since its not destroyed); however, when it returns the data allocation==physical which triggers some other piece of code to extend the LV.

Does this seem to be a "fair assessment"?

Note that the data you are requesting can also be fetched via a 'virsh domblkinfo' to the guest and storage. Whether a guest is running or not, both allocation and physical are returned:

# virsh domstate rh65 --reason
shut off (destroyed)
# virsh domblkinfo <domain> <storage>
Capacity: 10737418240
Allocation: 913641472
Physical: 913641472

So it's not like Allocation could "not be" returned or "some error" would indicate a timing window between migration complete and destroy. To a degree libvirt is returning perfectly valid data.

What it seems is desired is to get some indication that a migration is taking place so that the caller can special case what to do based on that. That would require a change not only in libvirt, but the caller (unless there's some other event option I'm overlooking/missing). Even if a flag were added

Optionally, the blockInfo thread either has to get some knowledge that a migration is either "in process" or at some point in time where things are frozen before the eventual domain destroy. I'm not sure if there's a libvirt mechanism in place for that (I don't know all the bells and whistles available in/for a libvirt/qemu migration).

So, perhaps it may be a useful exercise in the blockInfo thread to call virDomainGetState() ensuring to also get a returned reason. Then if the state from virDomainState is VIR_DOMAIN_PAUSED, check the virDomainPausedReason flags and see if VIR_DOMAIN_PAUSED_MIGRATION is set. If that's true, then although the domain is "not stopped", it's also not really in any condition to return the "exact" blockinfo data.

Comment 5 John Ferlan 2013-12-16 18:48:40 UTC

cut-n-paste from some email conversation that has taken place and shouldn't be lost:

On 12/16/2013 08:48 AM, Federico Simoncelli wrote:
> ----- Original Message -----
>> From: "John Ferlan" <jferlan>
>> To: "Federico Simoncelli" <fsimonce>, "Ayal Baron" <abaron>
>> Cc: "Dave Allan" <dallan>, "Eric Blake" <eblake>
>> Sent: Sunday, December 15, 2013 11:43:50 PM
>> Subject: Re: [Bug 1040507] New: Inconsistent blockInfo values when the VM is finalizing live migration
>>
>>
>>
>> On 12/15/2013 03:37 PM, Federico Simoncelli wrote:
>>> ----- Original Message -----
>>>> From: "Ayal Baron" <abaron>
>>>> To: "John Ferlan" <jferlan>
>>>> Cc: "Dave Allan" <dallan>, "Eric Blake" <eblake>,
>>>> "Federico Simoncelli" <fsimonce>
>>>> Sent: Sunday, December 15, 2013 5:25:39 PM
>>>> Subject: Re: [Bug 1040507] New: Inconsistent blockInfo values when the VM
>>>> is finalizing live migration
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> Did you check the bug report for my analysis/response?
>>>>
>>>> Yes.
>>>> Please get access from Fede directly to a machine exhibiting this.
>>>> It appears it is pretty easy to reproduce.
>>>>
>>>
>>> Yes, feel free to ping me whenever you want.
>>> Anyway as far as I understood the cause is pretty clear and now it's
>>> just a matter of when/how it will be fixed.
>>>
>>
>> So is there any chance of attempting to try a change to the blockInfo
>> fetching thread on your end like I suggested in the case?  I need some
>> feedback on what I put in the BZ to have a chance at determining a 'root
>> case'.
>
> Ah sorry I assumed that last time me and Eric actually pinpointed the
> root cause. If there's some analysis to be completed then please get in
> contact with David Jasa (added to this thread) as I have never accessed
> the machines.
>

News to me.  Eric's notes in the case are correct to the degree of what
happens under "normal circumstances"... Since it seems at one time the
thread was getting "real data", I don't believe this is a capabilities
issue (although I could be wrong as Eric certainly knows the code better
than I do).

My belief is the non 100% ability to reproduce is because the timing of
the blockInfo thread fetch and the destroy of the guest on the source
after the migration to get real data aren't "synchronized".  When the
blockInfo thread goes to get the data, it perhaps finds the guest "not
down" and thus will inquire. Because the guest is frozen (in transit of
a migration), the qemu/json calls to get the "real" blockInfo data won't
occur leaving allocation == physical because 'virDomainObjIsActive()'
finds the domain id == -1 on the source host.

So the question becomes - how does the blockInfo thread ascertain when
to attempt to fetch the blockInfo from the guest?  Even if you're
checking as I noted at the end of my BZ response for more detailed
domain state - there's still the possibility of there being a timing
window between the check and the actual change in libvirt to set the
guest in a "frozen" state and/or down...

In the long run, libvirt isn't extending the LV. Yes, the choice to do
that is made because of a return value from libvirt; however, if my
assumption regarding timing is correct it's not like the code could
change to return an error. That could trigger other issues... The return
is just as if the guest wasn't active which while not expected by the
blockInfo thread, is to a degree correct.

> VDSM already runs with a quite verbose configuration for libvirt so you
> might be able to find all you need in /var/log/libvirt/libvirtd.log*.
>

If it's not already there, considering I don't know the VDSM code - I
would think it would be easier for someone that knows/understands the
code to at least check and see how the blockInfo thread goes about its
business.

> Eventually you can change the config and I am sure David will assist you
> with the tests he was running.
>

I would think the changing of the libvirtd.conf is fairly simple and
rerunning the test with that change to get the requested output would
take someone who understands the environment a few extra cycles.

John

-=-=-=-=-=-=-=-

On 12/16/2013 11:01 AM, Ayal Baron wrote:

<...snip...>

>
> John, cmiiw, but it is my understanding that this behaviour (returning allocation == physical when domain id == -1) is the wrong behaviour (regardless of migration), since this is inconsistent with the behaviour of the running VM.
> It is not ok to fallback to this behaviour when the vm is not running and giving the new behaviour when it is, otherwise you can have a situation where:
> 1. get allocation when VM down <- returns physical
> 2. run VM and get again <- returns high write
> 3. stop VM and get again <- returns physical
>
> If you know that qemu supports high write then you should either report high write or report failure.  reporting inconsistent values doesn't seem ok to me.
> We'd be fine with explicitly requesting the new behaviour as long as it's consistent and is exposed in the python bindings if you're concerned about backward compatibility (although I can't see where inconsistent values would be ok for any client)
>  
>

Whether changing a "historical" default is the "right thing" to do
requires a bit of thought. There's numerous checks in the existing code
to "fail" if the domain isn't running.

However, there are also instances where the check is made and if not
true a default value is provided. For example,

qemuDomainGetInfo:
    -> cpuTime is 0 if not running or gets a "real value" if running
    -> memory varies on the "memballoon"

qemuDomainGetSchedulerType:
    -> Different/default cgroup values are returned if not running (see
virsh schedinfo).  Those values are 0, but 0 isn't necessarily correct,
but it's returned...

So if as you assert, the right thing to do is to return an error if the
guest is not running, then we need to consider the instances where an
error return is not expected and how do we handle those?

If one goes by the theory that qemuDomainInfo() returns a non zero value
for 'Used Memory' (see virsh dominfo <domain> on a non running domain),
then the same theory could be applied to why the allocation == physical
for a non running domain.  Theoretically, the non running domain isn't
using the memory, but it's displayed as if it were. If a volume exists,
we can get some basic information about it.  The more specific
information is achievable through some qemu call "if" the guest is running.

Currently, if a VM is off the allocation can be displayed, see the bz
for an example:

# virsh domstate rh65 --reason
shut off (destroyed)
# virsh domblkinfo rh65 <path-to-storage>
Capacity:       10737418240
Allocation:     913641472
Physical:       913641472

So, with the "requested" change in functionality, we'd get an error here
and that I would think would be unexpected.

Now, I'm not against changing the default, but understanding the
ramifications of that type of change are important. There has to be
"some reason" that a default was decided upon. There's nothing in the
code/documentation that indicates that the domain must be running in
order to get values - just that defaults are supplied in the event the
code cannot get the very particular case where we can pull the data from
qemu via qcow/lvm query.

Looking at the history of the code... The code was originally added in
commit id 'db57a7be':

http://libvirt.org/git/?p=libvirt.git;a=commit;h=db57a7be

The code was changed to attempt to fetch the extent value as part of
commit id 'ebb0c19c':

http://libvirt.org/git/?p=libvirt.git;a=commit;h=ebb0c19c

There's nothing in either change that indicates to me that it's expected
to have an error if the domain is not running as you're getting
information about a storage volume for a domain - a volume that does
exist...

An alternative that could be more easily done is to add a flag that
indicates to return an error if the domain isn't running. At least that
way we'll know that the caller is expecting the error.  Of course that's
akin to having the blockInfo thread check if the domain is running
before asking. Changes would have to be made to libvirt, then vdsm would
have to rely on those changes being applied.  Then vdsm would have to
change the way it calls the function to add the flag.

John

IMO: As much as it's a pain to do, conversation should be in the BZ not
via email - that way history is better kept.

Comment 6 Ayal Baron 2013-12-17 12:39:05 UTC

> An alternative that could be more easily done is to add a flag that
> indicates to return an error if the domain isn't running. At least that
> way we'll know that the caller is expecting the error.  Of course that's
> akin to having the blockInfo thread check if the domain is running
> before asking. Changes would have to be made to libvirt, then vdsm would
> have to rely on those changes being applied.  Then vdsm would have to
> change the way it calls the function to add the flag.
> 

This sounds like the best solution to me.
We have no problem with specifically sending such a flag from vdsm and it would both retain backward compatibility and make it explicit what behaviour we expect.

Comment 7 John Ferlan 2013-12-17 12:58:10 UTC

And what type of information are you looking to get? I see no question asked unless the implied question is how long would it take...   For which the answer is - not sure yet.  Although it would be good to have some of the data regarding libvirtd.log from comment 4, so that I can feel better about the assumption I'm working under.

Comment 8 John Ferlan 2013-12-17 14:58:29 UTC

Digging deeper into history of the code shows a few more commits that are quite telling in this area...


Changes as a result of issues during migration:
http://libvirt.org/git/?p=libvirt.git;a=commit;h=054d43f5

Attempt to add the ability to fetch during a migration:
http://libvirt.org/git/?p=libvirt.git;a=commit;h=18c2a592

Fixes as a result of regressions caused by previous commit:
http://libvirt.org/git/?p=libvirt.git;a=commit;h=2027e184

Removal of the special case code in blockinfo fetch:
http://libvirt.org/git/?p=libvirt.git;a=commit;h=fb3cada0

Another adjustment based on the state of monitor:
http://libvirt.org/git/?p=libvirt.git;a=commit;h=193cd0f3


Suffice to say it's a tricky timing area and it certainly seems a varied number of algorithms to get a specific answer in a specific corner case through the years hasn't always been successful. Resolving an edge condition is never a straightforward exercise.

It's really too bad your blockInfo thread wasn't a bit more stateful with respect to being able to know that a migration is taking place on the domain so it may be best to not query. Once the domain reaches its target (or not perhaps) and (re)starts again - it should be able to return the data (on whichever host it lands).

Comment 11 John Ferlan 2013-12-18 15:26:15 UTC

I did post a possible patch upstream regarding this case. However, it met with the same concerns that I've voiced in this case regarding why can't the app (in this case the blockInfo thread) have a bit more state knowledge and not make the request on a domain that's not up.  see:

http://www.redhat.com/archives/libvir-list/2013-December/msg00984.html

Comment 14 David Jaša 2013-12-18 16:37:31 UTC

(In reply to John Ferlan from comment #4)
> Since you have an environment ready/working and I don't - is it possible to
> enable some libvirtd debugging? As described here:
> 
> http://libvirt.org/logging.html
> 
> I think setting:
> 
> log_level = 1
> log_filters="3:remote 4:event 3:json 3:rpc"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
> 
> Should do the trick.  I don't need too much data just a single pass through
> and the libvirtd.log output when you're getting back the "bad values".
> 

I did several back-and-fort migrations between 16:17 and 16:22 (till Actual Size in RHEV portal grew by 1 GB). VM name and uuid: w7x32-33-2-dj, de193671-ae6b-42bd-883e-8705b66770d0.

Comment 19 John Ferlan 2014-01-07 18:18:09 UTC

The fix wasn't accepted.   I started looking at the debug log information before the holiday break, but haven't been able to get back to it yet.  It's too bad the data has multiple domains intermixed as it makes it difficult & time consuming to pore through the data looking for the data specific to the call.  So far I haven't seen anything indicates anything more than what I've already surmised regarding data returned and the state of the migrated from domain.

Comment 20 Federico Simoncelli 2014-01-08 15:48:09 UTC

(In reply to John Ferlan from comment #19)
> The fix wasn't accepted.   I started looking at the debug log information
> before the holiday break, but haven't been able to get back to it yet.  It's
> too bad the data has multiple domains intermixed as it makes it difficult &
> time consuming to pore through the data looking for the data specific to the
> call.  So far I haven't seen anything indicates anything more than what I've
> already surmised regarding data returned and the state of the migrated from
> domain.

As I suggested upstream:

https://www.redhat.com/archives/libvir-list/2013-December/msg01145.html

it seems that part of the problem is a race condition. Libvirt treats a transient domain as non-transient when qemu dies before returning the virDomainGetBlockInfo values.

Getting rid of the race would make libvirt return either a valid value for virDomainGetBlockInfo or VIR_ERR_NO_DOMAIN (Domain not found).

In short, virDomainGetBlockInfo for a transient domain is valid only when the data comes from qemu.

Comment 22 Ayal Baron 2014-01-09 07:46:32 UTC

John, I think the distinction Federico was trying to make is that the behaviour for transient domains should be different than the behaviour for non-transient domains.
In the transient case, it doesn't make sense to report any value when there is no domain since it is, well, transient.
So you can basically keep current behaviour for persistent domains while report only info from qemu for transient domains.

Comment 25 John Ferlan 2014-01-09 14:04:57 UTC

Correct on why allocation = capacity

Not sure what you mean by it doesn't exist.  There's a qemu process associated with it. I'm missing what you're driving at, so please be more explicit.

An on disk configuration doesn't exist, but something had to create the domain.  The XML for a transient domain doesn't get saved, but in order to create the caller still has to provide XML to define things.

I think the http://wiki.libvirt.org/page/VM_lifecycle defines things pretty well.

Comment 26 Ayal Baron 2014-01-09 16:23:21 UTC

When you call virDomainGetBlockInf on a *persistent* vm which is *not* running you get allocation=physical

when you call virDomainGetBlockInf on a *persistent* vm which *is* running you get high write

if you happen to catch the domain when qemu stopped running but libvirt still has a reference to it then it will return allocation=physical to mimic the state where the VM is not running.

So far, the behaviour is totally reasonable.

Now, if you call virDomainGetBlockInf on a *transient* vm which is *not* running you will get an error that libvirt does not know this vm

So it stands to reason that the behaviour you need to mimic in this case when you hit the race of the VM being migrated and right before it is removed from libvirts reference to treat it the same as when it is down = return error.

Does this make sense?

Comment 27 John Ferlan 2014-01-09 19:25:15 UTC

Makes sense for a specific corner case where you have all this knowledge built up. Makes sense for what you're trying to accomplish.  I'm not disagreeing - it takes time to investigate and figure out the conditions/timing.

Unfortunately I keep coming back to taking the code standing alone without any of this knowledge. All it knows is that it can fetch the high water for a running domain that's using a specific storage type. If that's not possible - return something because that's what the API dictates and has done for multiple releases. I thought the best scenario was the specific error for the specific case if the calling application desires that knowledge.  That wasn't accepted initially - perhaps now with more information it can be acceptable. That conversation still needs to take place, but other things are interfering with that.  Hopefully early next week that conversation will be held.

The failure you mention about calling on a transient vm which is not running and receiving an error occurs much earlier in the stack. Once higher level code checks are made the process is set in motion.  Adding redundant checks in lower level code is a never ending battle.

Comment 28 John Ferlan 2014-01-23 17:33:43 UTC

Considering Comment 26 - would it be viable to only return a failure for a transient domain?

For a persistent domain you'd need something like:

    if (virDomainIsActive()) {
        /* Perhaps other collections but the one we want: */
        virDomainGetBlockInfo();
        if (allocation == capacity) {
            if (!virDomainIsActive())
                /* don't trust results */
            }
        }
        /* Perhaps other collections */
    }

NOTE: The first virDomainIsActive() could also be the result of a
'virConnectListAllDomains()' to get a list of all active/transient domains.

But for a transient domain, if virDomainGetBlockInfo() runs into this condition, return a failure.  I'm guessing it "works" for RHEV - but the question becomes a more generic what are the downsides?

Consider the code mentioned in Comment 11 and remove the concept of a new flag VIR_DOMAIN_BLOCK_CHECK_ACTIVE and replace the check:

+    if (activeFail && (flags & VIR_DOMAIN_BLOCK_CHECK_ACTIVE)) {

with

+    if (activeFail && !dom->persisent) {

using a common error message "domain is not running".

Thus for a transient domain, if you're asking for block info of this
special type device and qemu cannot provide it since the domain no
longer is active, then there's an error. Similarly, if you asked right
after the domain is removed, there would be an error even if some
initial "virDomainGetBlockInfo()" were made on a transient domain that
may have been active in a just executed virDomainIsActive().

For a persistent domain, I'm not sure what we can do other than document
the code snippet and say it's up to the application that cares to make
the extra check since the default is to provide allocation == capacity
when the domain isn't running.

I put the needsinfo on Ayal, but moving it to Federico also works.  What I'm looking to make certain is there's no downside to returning an error and that doing so just for transient domains is 'good enough'.

Comment 29 Eric Blake 2014-01-23 18:03:18 UTC

(In reply to John Ferlan from comment #28)
> Considering Comment 26 - would it be viable to only return a failure for a
> transient domain?

Yes, I think that works.  Existing behavior for persistent domains is unchanged (the race is still there for whether the domain went offline, but you already have to deal with that race); while VDSM only uses transient domains (when the domain is no longer running, we can't report stats because the stats we care about only exist when the domain is running).

> But for a transient domain, if virDomainGetBlockInfo() runs into this
> condition, return a failure.  I'm guessing it "works" for RHEV - but the
> question becomes a more generic what are the downsides?

I'm not seeing any downsides from libvirt's perspective - let's go with the approach.

Comment 30 Ayal Baron 2014-01-23 19:31:15 UTC

(In reply to Eric Blake from comment #29)
> (In reply to John Ferlan from comment #28)
> > Considering Comment 26 - would it be viable to only return a failure for a
> > transient domain?
> 
> Yes, I think that works.  Existing behavior for persistent domains is
> unchanged (the race is still there for whether the domain went offline, but
> you already have to deal with that race); while VDSM only uses transient
> domains (when the domain is no longer running, we can't report stats because
> the stats we care about only exist when the domain is running).
> 
> > But for a transient domain, if virDomainGetBlockInfo() runs into this
> > condition, return a failure.  I'm guessing it "works" for RHEV - but the
> > question becomes a more generic what are the downsides?
> 
> I'm not seeing any downsides from libvirt's perspective - let's go with the
> approach.

perfect, thanks.

Comment 36 zhe peng 2014-04-15 03:34:44 UTC

This bug hard to reproduce this bug under pure libvirt environment(Accoring to the comment 9 and comment 11's description in bug 1059904), so we can verify this bug via checking the libvirt code.
Check libvirt-0.10.2-32.el6.src.rpm, the code have included the patch, so mark this bug verified.

Comment 38 errata-xmlrpc 2014-10-14 04:18:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1374.html

Note You need to log in before you can comment on or make changes to this bug.

amureini
bazulay
cpelland
djasa
dyuan
eblake
fsimonce
iheim
jferlan
jiahu
lpeer
michal.skrivanek
mjenner
mzhan
rbalakri
scohen
tlavigne
ydu
yeylon
zpeng