Bug 755830 - oz fails to install an image with a backtrace generated
Summary: oz fails to install an image with a backtrace generated
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: oz
Version: 1.0.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
Assignee: Ian McLeod
QA Contact: Martin Kočí
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-22 06:55 UTC by Steven Dake
Modified: 2016-04-26 13:58 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-30 17:15:32 UTC


Attachments (Terms of Use)

Description Steven Dake 2011-11-22 06:55:35 UTC
Description of problem:
occasionally when running with oz 0.7.0, oz fails to generate an image and instead prints the following error:
Libvirt Block Stats Failed:
 code is 55
 domain is 10
 message is Requested operation is not valid: domain is not running
 level is 2
 str1 is Requested operation is not valid: %s
 str2 is domain is not running
 str3 is None
 int1 is -1
 int2 is -1
Cleaning up guest named U10-x86_64-jeos
Cleaning up after install
Traceback (most recent call last):
  File "/usr/bin/oz-install", line 152, in <module>
    libvirt_xml = guest.install(timeout, force_download)
  File "/usr/lib/python2.7/site-packages/oz/Ubuntu.py", line 144, in install
    return self._do_install(timeout, force, 0)
  File "/usr/lib/python2.7/site-packages/oz/Guest.py", line 1413, in _do_install
    self._wait_for_install_finish(dom, timeout)
  File "/usr/lib/python2.7/site-packages/oz/Guest.py", line 505, in _wait_for_install_finish
    rd_req, rd_bytes, wr_req, wr_bytes, errs = libvirt_dom.blockStats(dev)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1394, in blockStats
    if ret is None: raise libvirtError ('virDomainBlockStats() failed', dom=self)
libvirt.libvirtError: Requested operation is not valid: domain is not running


Version-Release number of selected component (if applicable):
I am using master although this problem has existed since oz 0.4 or so.

How reproducible:
10-15%

Steps to Reproduce:
1. oz install an image
2. sometimes it fails
3.
  
Actual results:
fails

Expected results:
shouldn't fail

Additional info:
running f16 now.  In f15, libvirt actually segfaulted.  This may still be happening - it is hard to tell with systemd.

Comment 1 wes hayutin 2011-12-08 13:43:40 UTC
fixing version.. move to 1.0.0

Comment 2 wes hayutin 2012-01-10 17:11:04 UTC
adding to ce-sprint-next

Comment 3 wes hayutin 2012-01-10 17:14:06 UTC
adding to ce-sprint-next

Comment 4 wes hayutin 2012-01-12 16:35:37 UTC
adding to ce-sprint

Comment 5 wes hayutin 2012-01-12 16:41:58 UTC
removing ce-sprint-next tracker

Comment 6 wes hayutin 2012-01-12 16:44:13 UTC
taking off ce-sprint-next..

Comment 7 wes hayutin 2012-01-13 15:47:43 UTC
Please include the recreate steps..

Comment 8 Steven Dake 2012-01-13 17:18:31 UTC
Wes,

If I knew recreate steps I would have fixed bug already.

Regards
-steve

Comment 9 Ian McLeod 2012-01-13 19:31:17 UTC
Steve,

So, we expect libvirt methods to fail as part of our JEOS install.

What seems to be the issue here is that at some point libvirt
started failing with a new error code that we are not catching.

I've added that code to the list of codes that are caught and
interpreted as a sign that the install has finished (rather than
being re-thrown as a fatal exception).

That code is available here:

https://github.com/aeolusproject/oz/tree/libvirt_exception

I've submitted a pull request to both our fork and Chris' 
upstream master as well.

Comment 10 Martin Kočí 2012-02-06 13:47:08 UTC
Successfully tested few times with packages
# rpm -q oz
oz-0.8.0-4.el6.noarch
$ rpm -q oz
oz-0.8.0-4.fc15.noarch
and
# rpm -q oz
oz-0.9.0-0.20120206131816git043d582.el6.noarch

Moving bug to VERIFIED. I will keep testing that and if I recreate the issue I can reopen this bug..

Created HUDSON test too.


Note You need to log in before you can comment on or make changes to this bug.