Bug 837003

Summary: 3.1.z - VM suspend leads to paused state instead of suspended
Product: Red Hat Enterprise Linux 6 Reporter: Oded Ramraz <oramraz>
Component: vdsmAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED WORKSFORME QA Contact: Oded Ramraz <oramraz>
Severity: high Docs Contact:
Priority: medium    
Version: 6.3CC: abaron, acathrow, bazulay, bugzilla-qe-tlv, dyasny, eedri, iheim, mgoldboi, oramraz, pnovotny, sgrinber, ykaul
Target Milestone: rcKeywords: Regression, Reopened, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 826575 Environment:
Last Closed: 2013-01-06 17:11:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 826575    
Bug Blocks:    
Attachments:
Description Flags
VDSM log none

Description Oded Ramraz 2012-07-02 13:39:01 UTC
+++ This bug was initially created as a clone of Bug #826575 +++

Created attachment 587745 [details]
full log & testcase

Description of problem:
when appling to suspend VM this error appers (and machine ends in paused state):

Thread-1245::ERROR::2012-05-29 16:34:55,565::vm::176::vm.Vm::(_recover) vmId=`7d1605f4-7d0b-496c-af52-74e63934d7d5`::unable to close /rhev/data-center/3c3bb676-a5a7-4f9c-8e2a-3ff2cc505285/3e91a341-f773-44b3-89f2-ce2e97672b6f/images/f8ecb0c2-9218-40fb-a31d-27e8dc93a3f8/f4619812-e573-495f-a1f8-1ef05934ad2e: Input/output error
Thread-1245::ERROR::2012-05-29 16:34:55,565::vm::821::vm.Vm::(cont) vmId=`7d1605f4-7d0b-496c-af52-74e63934d7d5`::cannot cont while Saving State
Thread-1245::ERROR::2012-05-29 16:34:55,576::vm::240::vm.Vm::(run) vmId=`7d1605f4-7d0b-496c-af52-74e63934d7d5`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 223, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/libvirtvm.py", line 409, in _startUnderlyingMigration
    self._vm._dom.save(fname)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1215, in save
    if ret == -1: raise libvirtError ('virDomainSave() failed', dom=self)
libvirtError: unable to close /rhev/data-center/3c3bb676-a5a7-4f9c-8e2a-3ff2cc505285/3e91a341-f773-44b3-89f2-ce2e97672b6f/images/f8ecb0c2-9218-40fb-a31d-27e8dc93a3f8/f4619812-e573-495f-a1f8-1ef05934ad2e: Input/output error
Thread-1247::DEBUG::2012-05-29 16:34:55,662::BindingXMLRPC::864::vds::(wrapper) client [10.35.16.225]::call vmGetStats with ('7d1605f4-7d0b-496c-af52-74e63934d7d5',) {}


Additional info:
Attached is full log & testcase

--- Additional comment from danken on 2012-05-30 10:33:05 EDT ---

vdsm version? is it nfs or block domain?
since you pinpoint a libvirtError - could you attach relevant part of libvirtd.log, and mention libvirt version?

--- Additional comment from pdufek on 2012-06-13 08:19:13 EDT ---

- NFS storage was used

software versions:
------------------
vdsm-4.9.6-16.0.el6.x86_64
libvirt-0.9.10-21.el6.x86_64

from vdsm.log:
--------------
Thread-1319::ERROR::2012-06-13 14:52:36,483::vm::176::vm.Vm::(_recover) vmId=`73594152-cec5-4115-9357-a1b805945e04`::unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
Thread-1319::ERROR::2012-06-13 14:52:36,483::vm::821::vm.Vm::(cont) vmId=`73594152-cec5-4115-9357-a1b805945e04`::cannot cont while Saving State
Thread-1319::ERROR::2012-06-13 14:52:36,589::vm::240::vm.Vm::(run) vmId=`73594152-cec5-4115-9357-a1b805945e04`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 223, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/libvirtvm.py", line 409, in _startUnderlyingMigration
    self._vm._dom.save(fname)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1215, in save
    if ret == -1: raise libvirtError ('virDomainSave() failed', dom=self)
libvirtError: unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
Thread-1322::DEBUG::2012-06-13 14:52:37,530::BindingXMLRPC::864::vds::(wrapper) client [10.35.16.189]::call vmGetStats with ('73594152-cec5-4115-9357-a1b805945e04',) {}
Thread-1322::DEBUG::2012-06-13 14:52:37,530::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`73594152-cec5-4115-9357-a1b805945e04`::Disk hdc stats not available

from libvirtd.log:
------------------
2012-06-13 11:52:36.331+0000: 2215: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7f821800b800 refs=3
2012-06-13 11:52:36.331+0000: 2225: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7f821800b800 refs=2
2012-06-13 11:52:36.420+0000: 2225: debug : virStorageFileIsSharedFSType:1031 : Check if path /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c with FS magic 26985 is shared
2012-06-13 11:52:36.421+0000: 2225: error : qemuDomainSaveInternal:2722 : unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
2012-06-13 11:52:36.421+0000: 2225: debug : qemuDomainObjEndAsyncJob:887 : Stopping async job: save
2012-06-13 11:52:36.465+0000: 2225: debug : virDomainFree:2313 : dom=0x7f821800b060, (VM: name=restvm_basic_operations, uuid=73594152-cec5-4115-9357-a1b805945e04)
2012-06-13 11:52:36.487+0000: 2224: debug : virDomainGetXMLDesc:4379 : dom=0x7f822400e880, (VM: name=restvm_basic_operations, uuid=73594152-cec5-4115-9357-a1b805945e04), flags=0


new vdsm.log & corresponding libvirtd.log are attached

--- Additional comment from pdufek on 2012-06-13 08:20:01 EDT ---

Created attachment 591466 [details]
vdsm & libvirtd logs

Comment 3 Barak 2012-07-19 18:12:24 UTC
The above happened due to ioerror:
"""
libvirtError: unable to close /rhev/data-center/61285a30-7edd-467a-8596-ce62bb4f3d44/602a3739-d688-4471-ae17-47007e18154b/images/6a21b667-da7e-4bdc-a8fd-e35dfc6bc9bf/3909323e-ecd3-4bd2-97ee-7314f460d49c: Input/output error
"""

so qemu pauses the VM.

Can it be reproduced often? on a different SD ?

Comment 4 Pavel Novotny 2012-10-15 15:16:18 UTC
Reopening.
The issue still persists.


Version:

vdsm-4.9.6-37.0.el6_3.x86_64
libvirt-0.9.10-21.el6_3.4.x86_64


Reproducer:

1. In webadmin, create new VM in NFS data center.
2. Add diskt to the VM (1GB size, Thin provision, bootable, no guest OS needed) and update the VM to boot from it.
3. Start the VM. 
4. When up, Suspend the VM.


Results: 

VM is Paused instead of Suspended and stuck in the paused mode. Shutting down or stopping the VM does not work from this point.


Expected results:

VM should be suspended.


Additional info:

VDSM log excerpt (see full log in attachment):

Thread-689::DEBUG::2012-10-15 14:15:09,982::BindingXMLRPC::880::vds::(wrapper) return vmHibernate with {'status': {'message': 'Hibernation process starting', 'code': 0}}
...
...
Thread-690::ERROR::2012-10-15 14:15:11,410::task::853::TaskManager.Task::(_setError) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2814, in teardownImage
    img.teardown(sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/image.py", line 354, in teardown
    chain = self.getChain(sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/image.py", line 286, in getChain
    uuidlist = volclass.getImageVolumes(self.repoPath, sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 387, in getImageVolumes
    produceVolume(imgUUID, volid).
  File "/usr/share/vdsm/storage/fileSD.py", line 174, in produceVolume
    return fileVolume.FileVolume(repoPath, self.sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 71, in __init__
    volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 127, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/volume.py", line 140, in validate
    self.validateVolumePath()
  File "/usr/share/vdsm/storage/fileVolume.py", line 557, in validateVolumePath
    raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist: ('f6ee3a1c-e0e3-470c-a175-63c17eafb0bd',)
Thread-690::DEBUG::2012-10-15 14:15:11,430::task::872::TaskManager.Task::(_run) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::Task._run: e4d8ba80-4109-4e5f-b553-99a197cfce75 ('3774dc69-8ba3-456e-b584-4c08e3ba3826', 'e42f70cc-802e-46be-b931-867450cc6282', 'ffc59ea3-64f7-4fbf-ae9a-889a6efbf331') {} failed - stopping task
Thread-690::DEBUG::2012-10-15 14:15:11,434::task::1199::TaskManager.Task::(stop) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::stopping in state preparing (force False)
Thread-690::DEBUG::2012-10-15 14:15:11,439::task::978::TaskManager.Task::(_decref) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::ref 1 aborting True
Thread-690::INFO::2012-10-15 14:15:11,443::task::1157::TaskManager.Task::(prepare) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::aborting: Task is aborted: 'Volume does not exist' - code 201
Thread-690::DEBUG::2012-10-15 14:15:11,448::task::1162::TaskManager.Task::(prepare) Task=`e4d8ba80-4109-4e5f-b553-99a197cfce75`::Prepare: aborted: Volume does not exist

Comment 5 Pavel Novotny 2012-10-15 15:19:32 UTC
Created attachment 627512 [details]
VDSM log

VDSM log starting at the moment of VM suspension.

Comment 6 Michal Skrivanek 2012-10-15 15:23:52 UTC
currently debugging with pdufek. Looks like either libvirtd bug or misconfiguration(the lack of) by vdsm

Comment 7 Michal Skrivanek 2012-10-22 14:08:36 UTC
verified it works correctly when /etc/libvirt/qemu.conf is changed to use vdsm:kvm instead of root:root.
The NFS share is not writable for "root", only for "vdsm" user

Comment 10 Michal Skrivanek 2012-12-11 18:23:21 UTC
After investigation the affected VMs were on a storage provided by TLV engops , a non Linux-based NFS server with root-squash enabled. I suspect a bug/incompatibility in their NFSv4 implementation.
Switching back to NFSv3 or disabling root_squash likely solves the issue.
Would you please retry with those settings and/or on different storage server?

I'd also suggest to remove Regression keyword

Comment 11 Michal Skrivanek 2013-01-06 17:11:39 UTC
No response, there's nothing to fix here, hence closing. Feel free to reopen