RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 797526 - 3.1 - vm recovery fail after vdsm is restarted as we try to restore vm before pool is connected
Summary: 3.1 - vm recovery fail after vdsm is restarted as we try to restore vm before...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Federico Simoncelli
QA Contact: Dafna Ron
URL:
Whiteboard: virt
: 819460 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-26 14:13 UTC by Haim
Modified: 2022-07-09 05:34 UTC (History)
14 users (show)

Fixed In Version: vdsm-4.9.6-12
Doc Type: Bug Fix
Doc Text:
Previously, when attempting to restart a virtual machine through VDSM, the operation would fail, as the VDSM attempted to recover the virtual machine before it was re-connected to the pool. This caused the virtual machine to appear to be running in libvirt, but paused in the VDSM. Now, the evaluation occurs correctly so that the virtual machine is first re-connected to the pool before VDSM attempts to restart it.
Clone Of:
Environment:
Last Closed: 2012-12-04 18:53:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log (947.45 KB, application/x-gzip)
2012-02-26 14:16 UTC, Haim
no flags Details
unexpected exception (Error code:16) on vdsm-13 (831.98 KB, application/x-gzip)
2012-06-05 06:05 UTC, Chao Yang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1508 0 normal SHIPPED_LIVE Important: rhev-3.1.0 vdsm security, bug fix, and enhancement update 2012-12-04 23:48:05 UTC

Description Haim 2012-02-26 14:13:17 UTC
Description of problem:

vm recover flow fails after service is restarted due lack of connection to pool. 
when i search the logs, i clearly see that recover is called before the connection to pool. 

Thread-35::DEBUG::2012-02-26 15:25:02,384::vm::565::vm.Vm::(_startUnderlyingVm) vmId=`a194d529-2a5a-4ff0-aced-942ec0a41d8f`::_ongoingCreations acquired
Thread-35::INFO::2012-02-26 15:25:02,387::libvirtvm::1229::vm.Vm::(_run) vmId=`a194d529-2a5a-4ff0-aced-942ec0a41d8f`::VM wrapper has started
Thread-35::WARNING::2012-02-26 15:25:02,388::vm::396::vm.Vm::(getConfDevices) vmId=`a194d529-2a5a-4ff0-aced-942ec0a41d8f`::Unknown type found, device: '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address': {u'bus': u'
0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}' found
Thread-35::WARNING::2012-02-26 15:25:02,389::vm::396::vm.Vm::(getConfDevices) vmId=`a194d529-2a5a-4ff0-aced-942ec0a41d8f`::Unknown type found, device: '{'device': u'spicevmc', 'alias': u'channel1', 'type': u'channel', 'address': {u'bus'
: u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}' found
Thread-35::INFO::2012-02-26 15:25:02,391::vm::574::vm.Vm::(_startUnderlyingVm) vmId=`a194d529-2a5a-4ff0-aced-942ec0a41d8f`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 567, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1272, in _run
    self._devices[devType].append(devClass(self.conf, self.log, **dev))
  File "/usr/share/vdsm/libvirtvm.py", line 989, in __init__
    self.blockDev = utils.isBlockDevice(self.path)
  File "/usr/share/vdsm/utils.py", line 52, in isBlockDevice
    return stat.S_ISBLK(os.stat(path).st_mode)
OSError: [Errno 2] No such file or directory: '/rhev/data-center/711a080f-4702-450c-9f5f-bf54f1e99383/45290416-1dd4-494d-8ac9-29471752bd21/images/4d7d52fa-ff77-4089-a19c-9fb6a2f47e39/78c1e6de-ad7e-4f94-8ab9-3f748f75d8a0'

Repro:

- run vms 
- restart vdsm 
- recover vm flow starts
- recover vm flow fails 

Notes:


- vms are reported as running in libvirt 
- vms are reported as paused on vdsm 

git commit: b39d55059fc6cbc4fb3de522e45ea2381f68a3a7

Comment 1 Haim 2012-02-26 14:16:00 UTC
Created attachment 565897 [details]
vdsm log

Comment 6 Royce Lv 2012-05-09 10:31:35 UTC
1.HSM:__init__
  def storageRefresh():
    try:
      self.__cleanStorageRepository()-->this cleans the spUUID directory
    ....
    for spUUID in dirList:
      self.log.info("lvroyce find storage pool file")
      poolPath = os.path.join(self.storage_repository, spUUID)
      try:
        if os.path.exists(poolPath):-->will not exsisted and will not connect
          self.log.info("lvroyce find storage pool path")
          self._connectStoragePool(spUUID, None, None, None, None)

2.threading.Thread(target=storageRefresh).start() is a thread
  recover vm is also a thread
  even if storageRefresh connectted the pool,if it is later then recover vm,the recover process will fail. can be seen from your log:

Thread-49::INFO::2012-02-26 14:39:13,470::vm::574::vm.Vm::(_startUnderlyingVm) vmId=`34c67b37-9ace-4a39-9040-71cdc0f1e397`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 567, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1272, in _run
    self._devices[devType].append(devClass(self.conf, self.log, **dev))
  File "/usr/share/vdsm/libvirtvm.py", line 989, in __init__
    self.blockDev = utils.isBlockDevice(self.path)
  File "/usr/share/vdsm/utils.py", line 52, in isBlockDevice
    return stat.S_ISBLK(os.stat(path).st_mode)

Thread-11::DEBUG::2012-02-26 14:39:13,534::hsm::353::Storage.HSM::(__cleanStorageRepository) Finished cleaning storage repository at '/rhev/data-center'

Comment 7 Federico Simoncelli 2012-05-10 18:25:56 UTC
BZ#797526 Wait pool connection before recovery

In this patch:
* Serialize connectStoragePool and _recoverExistingVms
* Don't check if the pool directory is present to reconnect to the pool
  (_connectStoragePool), in fact the pool directory was removed few
   lines before by __cleanStorageRepository
* Prepare the volume paths (preparePaths) also during the recovery

Signed-off-by: Federico Simoncelli <fsimonce>
Change-Id: I418ce9641d1cdb7f7202fd8c08e7c027d1f6d325

http://gerrit.ovirt.org/4283

Comment 8 Federico Simoncelli 2012-05-16 11:17:37 UTC
Recently I proposed a better fix for the issue:

BZ#797526 Lazy evaluation for Drive.blockDev

Signed-off-by: Federico Simoncelli <fsimonce>
Change-Id: I93bc95650544372cec8883a8b7b568ecf419c129

http://gerrit.ovirt.org/4339

Comment 9 Federico Simoncelli 2012-05-16 18:24:26 UTC
*** Bug 819460 has been marked as a duplicate of this bug. ***

Comment 11 Chao Yang 2012-06-05 06:03:48 UTC
I am changing this bug to ASSIGNED, since I hit this issue again on vdsm-4.9.6-13.el6.x86_64. If any mistake, please move it back. 

A slice of vdsm log, will upload the whole vdsm log soon:
Thread-231::DEBUG::2012-06-05 13:25:14,627::BindingXMLRPC::864::vds::(wrapper) client [10.66.11.208]::call vmCont with ('fdabf547-909a-4242-b957-be6d60bf1ca7',) {}
Thread-231::ERROR::2012-06-05 13:25:14,629::BindingXMLRPC::879::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 869, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 210, in vmCont
    return vm.cont()
  File "/usr/share/vdsm/API.py", line 121, in cont
    return v.cont()
  File "/usr/share/vdsm/vm.py", line 823, in cont
    self._underlyingCont()
  File "/usr/share/vdsm/libvirtvm.py", line 1581, in _underlyingCont
    hooks.before_vm_cont(self._dom.XMLDesc(0), self.conf)
AttributeError: 'NoneType' object has no attribute 'XMLDesc'

# vdsClient -s 0 list table
5fd21790-5d3b-4be3-accd-d05cec96db86  15816  win7-16-6-1          Up                   10.66.108.78 10.66.109.239
404eacb0-ac9d-40a8-80ff-ebaecccc97bd   6919  win2k8r2-16-6-2      Up                   10.66.110.78 10.66.107.81
39a88c9e-35a8-4fde-b1fa-d5a0e6534319   7240  rhel6u3-16-6-1       Up                   10.66.106.15 10.66.109.250
bf8c3501-94e4-40b3-93a9-b7d4a23b7ccc   2331  rhel5u8-16-6-2       Up                   10.66.108.253 10.66.110.249
68b2308d-df40-475f-94d2-c396d0857233  13498  win2k8r2-16-6-1      Up                   10.66.107.227 10.66.105.3
fdabf547-909a-4242-b957-be6d60bf1ca7  12617  win7-16-6-2          Paused                                   
cb9c3a4c-f1e5-462a-ba4a-cf1973c00485   5957  rhel6u2-16-6-1       Up                   10.66.108.255 10.66.107.254
91192d02-5ecf-43ec-b5d3-6b616e6fc513   2835  rhel5u8-16-6-1       Up                   10.66.110.252       
9856096f-d7fc-4285-b7f8-a924ab04201e   3393  rhel6u2-16-6-2       Up                   10.66.110.250 10.66.110.248
10b0f8ec-24ad-45c5-b72e-5f549d202337   5415  rhel6u3-16-6-2       Up                   10.66.109.252 10.66.106.244
# vdsClient -s 0 continue fdabf547-909a-4242-b957-be6d60bf1ca7
Unexpected exception

Comment 12 Chao Yang 2012-06-05 06:05:27 UTC
Created attachment 589388 [details]
unexpected exception (Error code:16) on vdsm-13

# rpm -q vdsm libvirt qemu-kvm-rhev
vdsm-4.9.6-13.el6.x86_64
libvirt-0.9.10-21.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.294.el6.x86_64

Comment 13 Dan Kenigsberg 2012-06-05 09:12:03 UTC
moving to rhel-6.3.z as this is a rhev-3.1 blocker, not a rhel-6.3.0 one.

Comment 14 Federico Simoncelli 2012-06-05 11:15:43 UTC
The VM object initialization was stopped by such 
(In reply to comment #11)
> I am changing this bug to ASSIGNED, since I hit this issue again on
> vdsm-4.9.6-13.el6.x86_64. If any mistake, please move it back. 
> 
> A slice of vdsm log, will upload the whole vdsm log soon:
> Thread-231::DEBUG::2012-06-05
> 13:25:14,627::BindingXMLRPC::864::vds::(wrapper) client [10.66.11.208]::call
> vmCont with ('fdabf547-909a-4242-b957-be6d60bf1ca7',) {}
> Thread-231::ERROR::2012-06-05
> 13:25:14,629::BindingXMLRPC::879::vds::(wrapper) unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/BindingXMLRPC.py", line 869, in wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/BindingXMLRPC.py", line 210, in vmCont
>     return vm.cont()
>   File "/usr/share/vdsm/API.py", line 121, in cont
>     return v.cont()
>   File "/usr/share/vdsm/vm.py", line 823, in cont
>     self._underlyingCont()
>   File "/usr/share/vdsm/libvirtvm.py", line 1581, in _underlyingCont
>     hooks.before_vm_cont(self._dom.XMLDesc(0), self.conf)
> AttributeError: 'NoneType' object has no attribute 'XMLDesc'

The traceback and the logs you attached don't contain anything related to comment #0. In fact the issue here is generated by something different:

Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 275, in _recoverExistingVms
    vmObj.preparePaths(vmObj.getConfDevices()[vm.DISK_DEVICES])
  File "/usr/share/vdsm/vm.py", line 402, in getConfDevices
    self.normalizeDrivesIndices(devices[DISK_DEVICES])
  File "/usr/share/vdsm/vm.py", line 540, in normalizeDrivesIndices
    if not self._usedIndices.has_key(drv['iface']):
KeyError: 'iface'

Comment 15 Chao Yang 2012-06-05 11:26:28 UTC
(In reply to comment #14)
> The VM object initialization was stopped by such 
> (In reply to comment #11)
> > I am changing this bug to ASSIGNED, since I hit this issue again on
> > vdsm-4.9.6-13.el6.x86_64. If any mistake, please move it back. 
> > 
> > A slice of vdsm log, will upload the whole vdsm log soon:
> > Thread-231::DEBUG::2012-06-05
> > 13:25:14,627::BindingXMLRPC::864::vds::(wrapper) client [10.66.11.208]::call
> > vmCont with ('fdabf547-909a-4242-b957-be6d60bf1ca7',) {}
> > Thread-231::ERROR::2012-06-05
> > 13:25:14,629::BindingXMLRPC::879::vds::(wrapper) unexpected error
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/BindingXMLRPC.py", line 869, in wrapper
> >     res = f(*args, **kwargs)
> >   File "/usr/share/vdsm/BindingXMLRPC.py", line 210, in vmCont
> >     return vm.cont()
> >   File "/usr/share/vdsm/API.py", line 121, in cont
> >     return v.cont()
> >   File "/usr/share/vdsm/vm.py", line 823, in cont
> >     self._underlyingCont()
> >   File "/usr/share/vdsm/libvirtvm.py", line 1581, in _underlyingCont
> >     hooks.before_vm_cont(self._dom.XMLDesc(0), self.conf)
> > AttributeError: 'NoneType' object has no attribute 'XMLDesc'
> 
> The traceback and the logs you attached don't contain anything related to
> comment #0. In fact the issue here is generated by something different:
> 
> Traceback (most recent call last):
>   File "/usr/share/vdsm/clientIF.py", line 275, in _recoverExistingVms
>     vmObj.preparePaths(vmObj.getConfDevices()[vm.DISK_DEVICES])
>   File "/usr/share/vdsm/vm.py", line 402, in getConfDevices
>     self.normalizeDrivesIndices(devices[DISK_DEVICES])
>   File "/usr/share/vdsm/vm.py", line 540, in normalizeDrivesIndices
>     if not self._usedIndices.has_key(drv['iface']):
> KeyError: 'iface'

So, I need to open another bz?

Comment 16 Federico Simoncelli 2012-06-06 13:15:56 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > The traceback and the logs you attached don't contain anything related to
> > comment #0. In fact the issue here is generated by something different:
> > 
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/clientIF.py", line 275, in _recoverExistingVms
> >     vmObj.preparePaths(vmObj.getConfDevices()[vm.DISK_DEVICES])
> >   File "/usr/share/vdsm/vm.py", line 402, in getConfDevices
> >     self.normalizeDrivesIndices(devices[DISK_DEVICES])
> >   File "/usr/share/vdsm/vm.py", line 540, in normalizeDrivesIndices
> >     if not self._usedIndices.has_key(drv['iface']):
> > KeyError: 'iface'
> 
> So, I need to open another bz?

Yes please.

Comment 17 Chao Yang 2012-06-07 03:30:18 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > The traceback and the logs you attached don't contain anything related to
> > > comment #0. In fact the issue here is generated by something different:
> > > 
> > > Traceback (most recent call last):
> > >   File "/usr/share/vdsm/clientIF.py", line 275, in _recoverExistingVms
> > >     vmObj.preparePaths(vmObj.getConfDevices()[vm.DISK_DEVICES])
> > >   File "/usr/share/vdsm/vm.py", line 402, in getConfDevices
> > >     self.normalizeDrivesIndices(devices[DISK_DEVICES])
> > >   File "/usr/share/vdsm/vm.py", line 540, in normalizeDrivesIndices
> > >     if not self._usedIndices.has_key(drv['iface']):
> > > KeyError: 'iface'
> > 
> > So, I need to open another bz?
> 
> Yes please.

New bz entry opened at: https://bugzilla.redhat.com/show_bug.cgi?id=829566

Comment 22 Dafna Ron 2012-11-27 12:34:16 UTC
verified on si24.4 vdsm-4.9.6-44.0.el6_3.x86_64

Comment 24 errata-xmlrpc 2012-12-04 18:53:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html


Note You need to log in before you can comment on or make changes to this bug.