Bug 837003 - 3.1.z - VM suspend leads to paused state instead of suspended
3.1.z - VM suspend leads to paused state instead of suspended
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
x86_64 Linux
medium Severity high
: rc
: ---
Assigned To: Michal Skrivanek
Oded Ramraz
virt
: Regression, Reopened, TestBlocker
Depends On: 826575
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-02 09:39 EDT by Oded Ramraz
Modified: 2014-07-01 08:00 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 826575
Environment:
Last Closed: 2013-01-06 12:11:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
VDSM log (136.49 KB, text/plain)
2012-10-15 11:19 EDT, Pavel Novotny
no flags Details

  None (edit)
Description Oded Ramraz 2012-07-02 09:39:01 EDT
+++ This bug was initially created as a clone of Bug #826575 +++

Created attachment 587745 [details]
full log & testcase

Description of problem:
when appling to suspend VM this error appers (and machine ends in paused state):

Thread-1245::ERROR::2012-05-29 16:34:55,565::vm::176::vm.Vm::(_recover) vmId=`7d1605f4-7d0b-496c-af52-74e63934d7d5`::unable to close /rhev/data-center/3c3bb676-a5a7-4f9c-8e2a-3ff2cc505285/3e91a341-f773-44b3-89f2-ce2e97672b6f/images/f8ecb0c2-9218-40fb-a31d-27e8dc93a3f8/f4619812-e573-495f-a1f8-1ef05934ad2e: Input/output error
Thread-1245::ERROR::2012-05-29 16:34:55,565::vm::821::vm.Vm::(cont) vmId=`7d1605f4-7d0b-496c-af52-74e63934d7d5`::cannot cont while Saving State
Thread-1245::ERROR::2012-05-29 16:34:55,576::vm::240::vm.Vm::(run) vmId=`7d1605f4-7d0b-496c-af52-74e63934d7d5`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 223, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/libvirtvm.py", line 409, in _startUnderlyingMigration
    self._vm._dom.save(fname)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1215, in save
    if ret == -1: raise libvirtError ('virDomainSave() failed', dom=self)
libvirtError: unable to close /rhev/data-center/3c3bb676-a5a7-4f9c-8e2a-3ff2cc505285/3e91a341-f773-44b3-89f2-ce2e97672b6f/images/f8ecb0c2-9218-40fb-a31d-27e8dc93a3f8/f4619812-e573-495f-a1f8-1ef05934ad2e: Input/output error
Thread-1247::DEBUG::2012-05-29 16:34:55,662::BindingXMLRPC::864::vds::(wrapper) client [10.35.16.225]::call vmGetStats with ('7d1605f4-7d0b-496c-af52-74e63934d7d5',) {}


Additional info:
Attached is full log & testcase

--- Additional comment from danken@redhat.com on 2012-05-30 10:33:05 EDT ---

vdsm version? is it nfs or block domain?
since you pinpoint a libvirtError - could you attach relevant part of libvirtd.log, and mention libvirt version?

--- Additional comment from pdufek@redhat.com on 2012-06-13 08:19:13 EDT ---

- NFS storage was used

software versions:
------------------
vdsm-4.9.6-16.0.el6.x86_64
libvirt-0.9.10-21.el6.x86_64

from vdsm.log:
--------------
Thread-1319::ERROR::2012-06-13 14:52:36,483::vm::176::vm.Vm::(_recover) vmId=`73594152-cec5-4115-9357-a1b805945e04`::unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
Thread-1319::ERROR::2012-06-13 14:52:36,483::vm::821::vm.Vm::(cont) vmId=`73594152-cec5-4115-9357-a1b805945e04`::cannot cont while Saving State
Thread-1319::ERROR::2012-06-13 14:52:36,589::vm::240::vm.Vm::(run) vmId=`73594152-cec5-4115-9357-a1b805945e04`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 223, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/libvirtvm.py", line 409, in _startUnderlyingMigration
    self._vm._dom.save(fname)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1215, in save
    if ret == -1: raise libvirtError ('virDomainSave() failed', dom=self)
libvirtError: unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
Thread-1322::DEBUG::2012-06-13 14:52:37,530::BindingXMLRPC::864::vds::(wrapper) client [10.35.16.189]::call vmGetStats with ('73594152-cec5-4115-9357-a1b805945e04',) {}
Thread-1322::DEBUG::2012-06-13 14:52:37,530::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`73594152-cec5-4115-9357-a1b805945e04`::Disk hdc stats not available

from libvirtd.log:
------------------
2012-06-13 11:52:36.331+0000: 2215: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7f821800b800 refs=3
2012-06-13 11:52:36.331+0000: 2225: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7f821800b800 refs=2
2012-06-13 11:52:36.420+0000: 2225: debug : virStorageFileIsSharedFSType:1031 : Check if path /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c with FS magic 26985 is shared
2012-06-13 11:52:36.421+0000: 2225: error : qemuDomainSaveInternal:2722 : unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
2012-06-13 11:52:36.421+0000: 2225: debug : qemuDomainObjEndAsyncJob:887 : Stopping async job: save
2012-06-13 11:52:36.465+0000: 2225: debug : virDomainFree:2313 : dom=0x7f821800b060, (VM: name=restvm_basic_operations, uuid=73594152-cec5-4115-9357-a1b805945e04)
2012-06-13 11:52:36.487+0000: 2224: debug : virDomainGetXMLDesc:4379 : dom=0x7f822400e880, (VM: name=restvm_basic_operations, uuid=73594152-cec5-4115-9357-a1b805945e04), flags=0


new vdsm.log & corresponding libvirtd.log are attached

--- Additional comment from pdufek@redhat.com on 2012-06-13 08:20:01 EDT ---

Created attachment 591466 [details]
vdsm & libvirtd logs
Comment 3 Barak 2012-07-19 14:12:24 EDT
The above happened due to ioerror:
"""
libvirtError: unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
"""

so qemu pauses the VM.

Can it be reproduced often? on a different SD ?
Comment 4 Pavel Novotny 2012-10-15 11:16:18 EDT
Reopening.
The issue still persists.


Version:

vdsm-4.9.6-37.0.el6_3.x86_64
libvirt-0.9.10-21.el6_3.4.x86_64


Reproducer:

1. In webadmin, create new VM in NFS data center.
2. Add diskt to the VM (1GB size, Thin provision, bootable, no guest OS needed) and update the VM to boot from it.
3. Start the VM. 
4. When up, Suspend the VM.


Results: 

VM is Paused instead of Suspended and stuck in the paused mode. Shutting down or stopping the VM does not work from this point.


Expected results:

VM should be suspended.


Additional info:

VDSM log excerpt (see full log in attachment):

Thread-689::DEBUG::2012-10-15 14:15:09,982::BindingXMLRPC::880::vds::(wrapper) return vmHibernate with {'status': {'message': 'Hibernation process starting', 'code': 0}}
...
...
Thread-690::ERROR::2012-10-15 14:15:11,410::task::853::TaskManager.Task::(_setError) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2814, in teardownImage
    img.teardown(sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/image.py", line 354, in teardown
    chain = self.getChain(sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/image.py", line 286, in getChain
    uuidlist = volclass.getImageVolumes(self.repoPath, sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 387, in getImageVolumes
    produceVolume(imgUUID, volid).
  File "/usr/share/vdsm/storage/fileSD.py", line 174, in produceVolume
    return fileVolume.FileVolume(repoPath, self.sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 71, in __init__
    volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 127, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/volume.py", line 140, in validate
    self.validateVolumePath()
  File "/usr/share/vdsm/storage/fileVolume.py", line 557, in validateVolumePath
    raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist: ('f6ee3a1c-e0e3-470c-a175-63c17eafb0bd',)
Thread-690::DEBUG::2012-10-15 14:15:11,430::task::872::TaskManager.Task::(_run) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::Task._run: e4d8ba80-4109-4e5f-b553-99a197cfce75 ('3774dc69-8ba3-456e-b584-4c08e3ba3826', 'e42f70cc-802e-46be-b931-867450cc6282', 'ffc59ea3-64f7-4fbf-ae9a-889a6efbf331') {} failed - stopping task
Thread-690::DEBUG::2012-10-15 14:15:11,434::task::1199::TaskManager.Task::(stop) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::stopping in state preparing (force False)
Thread-690::DEBUG::2012-10-15 14:15:11,439::task::978::TaskManager.Task::(_decref) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::ref 1 aborting True
Thread-690::INFO::2012-10-15 14:15:11,443::task::1157::TaskManager.Task::(prepare) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::aborting: Task is aborted: 'Volume does not exist' - code 201
Thread-690::DEBUG::2012-10-15 14:15:11,448::task::1162::TaskManager.Task::(prepare) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::Prepare: aborted: Volume does not exist
Comment 5 Pavel Novotny 2012-10-15 11:19:32 EDT
Created attachment 627512 [details]
VDSM log

VDSM log starting at the moment of VM suspension.
Comment 6 Michal Skrivanek 2012-10-15 11:23:52 EDT
currently debugging with pdufek. Looks like either libvirtd bug or misconfiguration(the lack of) by vdsm
Comment 7 Michal Skrivanek 2012-10-22 10:08:36 EDT
verified it works correctly when /etc/libvirt/qemu.conf is changed to use vdsm:kvm instead of root:root.
The NFS share is not writable for "root", only for "vdsm" user
Comment 10 Michal Skrivanek 2012-12-11 13:23:21 EST
After investigation the affected VMs were on a storage provided by TLV engops , a non Linux-based NFS server with root-squash enabled. I suspect a bug/incompatibility in their NFSv4 implementation.
Switching back to NFSv3 or disabling root_squash likely solves the issue.
Would you please retry with those settings and/or on different storage server?

I'd also suggest to remove Regression keyword
Comment 11 Michal Skrivanek 2013-01-06 12:11:39 EST
No response, there's nothing to fix here, hence closing. Feel free to reopen

Note You need to log in before you can comment on or make changes to this bug.