Description of problem:
calling status on an instance of class Vm(object) throws the following exception:
2016-01-05 23:41:17,725 - mom.GuestManager - ERROR - Guest Manager crashed
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/mom/GuestManager.py", line 114, in run
File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 75, in getVmList
File "/usr/share/vdsm/API.py", line 1380, in getVMList
File "/usr/share/vdsm/API.py", line 1370, in reportedStatus
File "/usr/share/vdsm/virt/vm.py", line 2817, in status
File "/usr/share/vdsm/virt/vm.py", line 2817, in <genexpr>
RuntimeError: dictionary changed size during iteration
2016-01-05 23:41:19,843 - mom - ERROR - Thread 'GuestManager' has exited
Version-Release number of selected component (if applicable):
Randomly (race condition)
Steps to Reproduce:
Most probably by forcing massive live migration
the mentioned exception
Exception cannot raise
In this case the exception causes malfunctioning MoM as the GuestManager crashes due to this exception.
The issue has been partially fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1298190
but the exception still occurs even if the fix is applied
Of course crashes are not nice, but I suppose impact is less significant in 3.6 as the mom functionality was separate out of vdsm and is now an independent process. There should be little to no impact on the actual stuff mom does.
Francesco, other thoughts?
(In reply to Michal Skrivanek from comment #2)
> Of course crashes are not nice, but I suppose impact is less significant in
> 3.6 as the mom functionality was separate out of vdsm and is now an
> independent process. There should be little to no impact on the actual stuff
> mom does.
> Francesco, other thoughts?
Yes, this is how it should work - MOM side should be able to recover from those API failures, and go ahead without crashing like it did in 3.5. However, to be sure I'll need to review the MOM code, specifically the xmlrpc interface.
Speaking of Vdsm, we can (and will) improve further the handling of Vm.conf.
The problem is that Vm.conf is indeed abused and misused. Too much data is stored here freely by many complex flows. I acknowledge the fix was partial; the problem is that a clean complete fix would require a large rewrite.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see firstname.lastname@example.org with any questions
the remaining patches are either unhelpful for this issue or not directly related, so moving now to MODIFIED.
not sure this needs doc_text. Added just in case.
verified on :
oVirt Engine Version: 4.0.0-0.7.master.el7ev
create a massive migration on hosts
1. create a pool of 100 VMs and put all 100 VMs on the same host
(each VM has 8GB memory and was loaded 90% memory)
2. migrate all the 100 VMs
3. check in mom logs
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.