Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 732031

Summary: VM moved to Paused while hibernating multiple VMs
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Henner <jhenner>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED WORKSFORME QA Contact: yeylon <yeylon>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: abaron, bazulay, danken, iheim, srevivo, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-12 12:53:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 733920    
Bug Blocks:    
Attachments:
Description Flags
logs
none
events log none

Description Jaroslav Henner 2011-08-19 14:28:05 UTC
Created attachment 519048 [details]
logs

Description of problem:
When suspending four vms, I went into situation that the VM Migrations_pool-0 got paused.

In the vdsm.log:
Thread-481::DEBUG::2011-08-19 15:10:30,224::vm::739::vm.Vm::(setDownStatus) vmId=`29ed44b1-02cd-4c14-bbbc-e32926855202`::Changed state to Down: SaveState succeeded
Thread-481::DEBUG::2011-08-19 15:10:30,224::utils::314::vm.Vm::(stop) vmId=`29ed44b1-02cd-4c14-bbbc-e32926855202`::Stop statistics collection
Thread-221::ERROR::2011-08-19 15:10:30,227::utils::369::vm.Vm::(collect) vmId=`29ed44b1-02cd-4c14-bbbc-e32926855202`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x26ac0a0>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 366, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 250, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 65, in _highWrite
    self._vm._dom.blockInfo(vmDrive.path, 0)
  File "/usr/share/vdsm/libvirtvm.py", line 327, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1284, in blockInfo
    if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '29ed44b1-02cd-4c14-bbbc-e32926855202'
Thread-221::DEBUG::2011-08-19 15:10:30,228::utils::345::vm.Vm::(run) vmId=`29ed44b1-02cd-4c14-bbbc-e32926855202`::Stats thread finished
Thread-524::DEBUG::2011-08-19 15:10:30,364::clientIF::54::vds::(wrapper) [10.34.60.83]::call getVmStats with ('29ed44b1-02cd-4c14-bbbc-e32926855202',) {}
Thread-524::DEBUG::2011-08-19 15:10:30,364::clientIF::59::vds::(wrapper) return getVmStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'timeOffset': '0', 'vmId': '29ed44b1-02cd-4c14-bbbc-e32926855202', 'exitMessage': 'SaveState succeeded', 'exitCode': 0}]}


Version-Release number of selected component (if applicable):
vdsm-4.9-91.1.el6.x86_64
libvirt-0.9.4-0rc1.2.el6.x86_64
qemu-kvm-0.12.1.2-2.182.el6.x86_64
rhevm ic137

How reproducible:
Unknown

Steps to Reproduce:
Unknown
  
Actual results:
VM paused

Expected results:
VM suspends and then resumes.

Additional info:

Comment 1 Jaroslav Henner 2011-08-19 14:28:49 UTC
Created attachment 519049 [details]
events log

Comment 3 Jaroslav Henner 2011-08-19 14:34:53 UTC
I failed to mention that there is the follwing in the other host's vdsm.log.

Thread-99::DEBUG::2011-08-19 14:57:32,223::clientIF::225::Storage.Dispatcher.Protect::(wrapper) [10.34.60.83]
Thread-99::INFO::2011-08-19 14:57:32,223::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: getSpmStatus, args: ( spUUID=8791b66f-b117-4cb5-a5fd-3fdc81d6eb39)
Thread-99::DEBUG::2011-08-19 14:57:32,223::task::492::TaskManager.Task::(_debug) Task 440e7086-9e3b-4a37-92b7-50165eda691e: moving from state init -> state preparing
Thread-99::ERROR::2011-08-19 14:57:32,224::task::865::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/spm.py", line 708, in public_getSpmStatus
    pool = hsm.HSM.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 125, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('8791b66f-b117-4cb5-a5fd-3fdc81d6eb39',)
Thread-99::DEBUG::2011-08-19 14:57:32,225::task::492::TaskManager.Task::(_debug) Task 440e7086-9e3b-4a37-92b7-50165eda691e: Task._run: 440e7086-9e3b-4a37-92b7-50165eda691e ('8791b66f-b117-4cb5-a5fd-3fdc81d6eb39',) {} failed - stopping task
Thread-99::DEBUG::2011-08-19 14:57:32,225::task::492::TaskManager.Task::(_debug) Task 440e7086-9e3b-4a37-92b7-50165eda691e: stopping in state preparing (force False)

Comment 4 Dan Kenigsberg 2011-08-21 07:59:42 UTC
I guess this is nothing more than log noise, but Edu, please take a better look.

Comment 5 Eduardo Warszawski 2011-08-23 12:13:13 UTC
getSpmStatus on the recently created pool fails because the host is not connected to this pool yet.

createPool not implies that the host is connected to this pool.

Can be observed that the next getSpmStatus, after the connectPool succeed.



Thread-82::INFO::2011-08-19 14:57:08,528::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: createStoragePool, args: ( poolType=3 spUUID=8791b66f-b117-4cb5-a5fd-3fdc81d6eb39 poolName=Migrations_DC masterDom=678a45a4-b5f8-418b-b541-19a6df9c78eb domList=['678a45a4-b5f8-418b-b541-19a6df9c78eb'] masterVersion=1 lockPolicy=ON lockRenewalIntervalSec=5 leaseTimeSec=60 ioOpTimeoutSec=10 leaseRetries=3)

14:57:31,342::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: createStoragePool, Return response: {'status': {'message': 'OK', 'code': 0}}
14:57:32,223::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: getSpmStatus, args: ( spUUID=8791b66f-b117-4cb5-a5fd-3fdc81d6eb39)
Thread-100::INFO::2011-08-19 

<<< FAILED >>>


14:57:32,247::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: connectStoragePool, args: ( spUUID=8791b66f-b117-4cb5-a5fd-3fdc81d6eb39 hostID=1 scsiKey=8791b66f-b117-4cb5-a5fd-3fdc81d6eb39 msdUUID=678a45a4-b5f8-418b-b541-19a6df9c78eb masterVersion=1)
Thread-100::INFO::2011-08-19 

14:57:35,623::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: connectStoragePool, Return response: {'status': {'message': 'OK', 'code': 0}}


Thread-110::INFO::2011-08-19 14:57:35,629::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: getSpmStatus, args: ( spUUID=8791b66f-b117-4cb5-a5fd-3fdc81d6eb39)

14:57:35,631::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: getSpmStatus, Return response: {'status': {'message': 'OK', 'code': 0}, 'spm_st': {'spmId': -1, 'spmStatus': 'Free', 'spmLver': -1}}




Comment 6 Eduardo Warszawski 2011-08-23 12:46:39 UTC
In order to clarify the libvirt error the libvirt log is required.

Comment 7 Ayal Baron 2011-08-31 19:01:57 UTC
First of all title is totally incorrect and misleading.
Second, you suspended your VMs and they paused, what exactly is the problem?
There is some noise in the log indeed ("libvirtError: Domain not found: no domain with matching uuid") and should be handled, but I don't see it's important enough to pass as an Excepion.

Suspend operation succeeded as can be noted in:
Thread-524::DEBUG::2011-08-19 15:10:30,364::clientIF::59::vds::(wrapper) return
getVmStats with {'status': {'message': 'Done', 'code': 0}, 'statsList':
[{'status': 'Down', 'timeOffset': '0', 'vmId':
'29ed44b1-02cd-4c14-bbbc-e32926855202', 'exitMessage': 'SaveState succeeded',
'exitCode': 0}]}


Wrt the libvirt error, it looks like vdsm is calling virDomainGetBlockInfo after the VM status is set to down, is this supported at all? (Dan?)

Comment 8 Jaroslav Henner 2011-08-31 19:11:34 UTC
(In reply to comment #7)
> First of all title is totally incorrect and misleading.
Sorry about that. I could write something VM got paused when suspending it, but this would be too broad and I don't know how can I know what happen wrong.

> Second, you suspended your VMs and they paused, what exactly is the problem?
It should have got suspended, not paused. In rhevm terminology, the VM gets paused for example when the storage is inaccessible which was not the case.

Comment 9 Dan Kenigsberg 2011-09-01 23:09:32 UTC
(In reply to comment #7)
> Wrt the libvirt error, it looks like vdsm is calling virDomainGetBlockInfo
> after the VM status is set to down, is this supported at all? (Dan?)

we have a thread polling libvirt for info. When a VM dies (due to successful migration, for example) we may have a race between the event notifying us about it and the polling failure. That's what I meant by log noise.

Comment 10 Dan Kenigsberg 2011-09-04 08:51:29 UTC
Ok, so the bug here is that a VM moved to Paused state before successfully being hibernated (=saved) to disk.

Would you please reproduce this, and add matching libvirt.log as requested in comment 6.

Note that I do not consider this temporary state change as a high-severity bug; it may well not get into 3.0 at this point.

Comment 11 Jaroslav Henner 2011-09-05 14:04:30 UTC
I'm sorry, I couldn't reproduce. I made like 10 suspends without any problem like this.

Comment 12 Dan Kenigsberg 2011-09-05 15:12:47 UTC
Are you using latest libvirt version now?

In any case, if it works for you now, there's bug resolution for that.

Comment 13 Jaroslav Henner 2011-09-12 12:53:04 UTC
Yes, I worked with the latest libvirt.