Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1064471

Summary: [vdsm] resuming VM from paused state fails with an AttributeError
Product: [Retired] oVirt Reporter: Elad <ebenahar>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4CC: acathrow, amureini, bazulay, ebenahar, gklein, iheim, mgoldboi, michal.skrivanek, nsednev, nsoffer, yeylon
Target Milestone: ---   
Target Release: 3.4.1   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-08 13:37:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1036358    
Attachments:
Description Flags
engine and vdsm logs
none
vdsm.log none

Description Elad 2014-02-12 16:36:44 UTC
Created attachment 862438 [details]
engine and vdsm logs

Description of problem:
Resuming from paused state fails with AttributeError on vdsm.

Version-Release number of selected component (if applicable):
vdsm-4.14.1-3.el6.x86_64


Steps to Reproduce:
On shared storage data center with an iSCSI domain:
1. have a running VM
2. block connectivity between host to storage server using iptables, wait for VM to become 'paused'
2. resume connectivity to the storage server and wait for the domain to become active
3. try to start VM from pause

Actual results:
Resuming the VM fails with the following message in vdsm.log:

Thread-1768::ERROR::2014-02-12 17:45:13,409::BindingXMLRPC::989::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 973, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 209, in vmCont
    return vm.cont()
  File "/usr/share/vdsm/API.py", line 154, in cont
    return v.cont()
  File "/usr/share/vdsm/vm.py", line 2576, in cont
    self._underlyingCont()
  File "/usr/share/vdsm/vm.py", line 3715, in _underlyingCont
    hooks.before_vm_cont(self._dom.XMLDesc(0), self.conf)
AttributeError: 'NoneType' object has no attribute 'XMLDesc'



Additional info: engine and vdsm logs

Comment 1 Nir Soffer 2014-02-16 20:42:47 UTC
Looks like a duplicate of bug 1063336. Vm._dom is None, when storage is not available when starting vdsm and a vm is running.

Comment 2 Nir Soffer 2014-02-26 21:13:43 UTC
(In reply to Elad from comment #0)
> Steps to Reproduce:
> On shared storage data center with an iSCSI domain:
> 1. have a running VM
> 2. block connectivity between host to storage server using iptables, wait
> for VM to become 'paused'
> 2. resume connectivity to the storage server and wait for the domain to
> become active
> 3. try to start VM from pause

Why do you start the vm? it should start when the domain becomes valid.

> 
> Actual results:
> Resuming the VM fails with the following message in vdsm.log:
> ...
> AttributeError: 'NoneType' object has no attribute 'XMLDesc'

There is no such error in the attached log.

Please reproduce again and provide complete vdsm.log.

Note for reproduction:

This seems to be a duplicate of bug 1063336. In that bug, a vm was running when vdsm was started but storage was not available. So the vm recovery failed, leading to vm object with a _dom == None.

To make sure this is not a duplicate, make sure that storage is up when vdsm start, and start the vm after vdsm starts.

1. start vdsm
2. ensure that the domain is accessible
3. start a vm and wait unitl it is up
4. block connectivity to the storage server
5. wait until vm is paused
6. unblock connectivity to storage server
7. wait until vm is unpaused

If you start the vm before storage becomes connected again, the vm is expected to fail - but not with the error in this bug. It takes time to get the connectivity back after blocking the connection.

Comment 3 Elad 2014-02-27 08:37:39 UTC
Created attachment 868350 [details]
vdsm.log

(In reply to Nir Soffer from comment #2)
> (In reply to Elad from comment #0)
> > Steps to Reproduce:
> > On shared storage data center with an iSCSI domain:
> > 1. have a running VM
> > 2. block connectivity between host to storage server using iptables, wait
> > for VM to become 'paused'
> > 2. resume connectivity to the storage server and wait for the domain to
> > become active
> > 3. try to start VM from pause
> 
> Why do you start the vm? it should start when the domain becomes valid.

It doesn't resume the VM automatically when the domain becomes active, I waited for ~20 minutes and only then, resumed it manually
> 
> > 
> > Actual results:
> > Resuming the VM fails with the following message in vdsm.log:
> > ...
> > AttributeError: 'NoneType' object has no attribute 'XMLDesc'
> 
> There is no such error in the attached log.

Uploading the right log
> 
> 
> Note for reproduction:
> 
> This seems to be a duplicate of bug 1063336. In that bug, a vm was running
> when vdsm was started but storage was not available. So the vm recovery
> failed, leading to vm object with a _dom == None.
> 
> To make sure this is not a duplicate, make sure that storage is up when vdsm
> start, and start the vm after vdsm starts.
> 
> 1. start vdsm
> 2. ensure that the domain is accessible
> 3. start a vm and wait unitl it is up
> 4. block connectivity to the storage server
> 5. wait until vm is paused
> 6. unblock connectivity to storage server
> 7. wait until vm is unpaused
This is exactly the scenario, only VM didn't get unpaused, manual intervention was required.
 
> If you start the vm before storage becomes connected again, the vm is
> expected to fail - but not with the error in this bug. It takes time to get
> the connectivity back after blocking the connection.

Comment 4 Sandro Bonazzola 2014-03-04 09:27:33 UTC
This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 5 Michal Skrivanek 2014-03-09 14:11:06 UTC
indeed this is probably bug 1063336. But do we know why it wasn't started automatically? Is it a consequence of bug 1063336?

Comment 6 Nir Soffer 2014-03-09 16:16:51 UTC
(In reply to Michal Skrivanek from comment #5)
> indeed this is probably bug 1063336. But do we know why it wasn't started
> automatically? Is it a consequence of bug 1063336?

Well if libvirt connection was not created (_dom is None), then how would this vm be started?

Comment 7 Michal Skrivanek 2014-03-09 16:19:39 UTC
If this is the only failure this bug is going to be addressed by the fix of bug 1063336

Comment 8 Allon Mureinik 2014-03-25 16:31:27 UTC
As per Michal's comment, moving this bug to MODIFIED.
Once a solution to bug 1063336 is delivered to QA, /this/ bug should be tested with the new build too.

Comment 9 Allon Mureinik 2014-04-13 14:35:38 UTC
(In reply to Allon Mureinik from comment #8)
> As per Michal's comment, moving this bug to MODIFIED.
> Once a solution to bug 1063336 is delivered to QA, /this/ bug should be
> tested with the new build too.
Moving to ON_QA, as bug 1063336 is CLOSED CURRENTRELEASE

Comment 10 Sandro Bonazzola 2014-05-08 13:37:03 UTC
This is an automated message

oVirt 3.4.1 has been released:
 * should fix your issue
 * should be available at your local mirror within two days.

If problems still persist, please make note of it in this bug report.