Bug 726107 - [RFE] More descriptive logging for aeolus-image failures
Summary: [RFE] More descriptive logging for aeolus-image failures
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: imagefactory
Version: 1.0.0
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
Assignee: Steve Linabery
QA Contact: Martin Kočí
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-27 14:52 UTC by james labocki
Modified: 2012-05-15 20:10 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-15 20:10:26 UTC


Attachments (Terms of Use)
[PATCH Oz] Add some text to the log pointing the user to the VM console screenshot when one is taken. (1.78 KB, patch)
2011-07-27 21:01 UTC, Steve Loranz
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:0588 0 normal SHIPPED_LIVE new packages: imagefactory 2012-05-15 22:31:27 UTC

Description james labocki 2011-07-27 14:52:57 UTC
Description of problem:
When building an image using aeolus-image, it would be helpful if errors in the build process were more descriptive as to the problem that was encountered (unable to write to NFS export domain in RHEV is example given below).

How reproducible:
Run aeolus-image command and have RHEV-M export domain not writable by user executing aeolus-image command.

Steps to Reproduce:
1. Make RHEV-M provider's export domain read-only
2. execute aeolus-image build --target rhevm --template template.tpl
3. tail -f /var/log/imagefactory.log
  
Actual results:
 161.
      2011-07-27 05:22:53,974 DEBUG oz.Guest.FedoraGuest pid(2650) Message: Waiting for tmpl1-a33d1b20-6500-422d-aa86-269f98ab984f to finish installing, 3210/3600
 162.
      2011-07-27 05:23:03,999 DEBUG oz.Guest.FedoraGuest pid(2650) Message: Waiting for tmpl1-a33d1b20-6500-422d-aa86-269f98ab984f to finish installing, 3200/3600
 163.
      2011-07-27 05:23:10,177 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Exception caught in ImageFactory
 164.
      2011-07-27 05:23:10,178 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Traceback (most recent call last):
 165.
        File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 172, in build_upload
 166.
          libvirt_xml = self.guest.install(self.app_config["timeout"])
 167.
        File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1154, in install
 168.
          return self.do_install(timeout, force, 0)
 169.
        File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1139, in do_install
 170.
          self.wait_for_install_finish(dom, timeout)
 171.
        File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 451, in wait_for_install_finish
 172.
          raise oz.OzException.OzException("No disk activity in %d seconds, failing" % (inactivity_timeout))
 173.
      OzException: No disk activity in 300 seconds, failing
 174.
       
 175.
      2011-07-27 05:23:10,179 INFO oz.Guest.FedoraGuest pid(2650) Message: Cleaning up guest named tmpl1-a33d1b20-6500-422d-aa86-269f98ab984f
 176.
      2011-07-27 05:23:10,655 DEBUG imagefactory.BuildJob.BuildAdaptor pid(2650) Message: Raising event with agent handler (<ImageFactoryAgent(Thread-1, initial)>), changed status from BUILDING to FAILED
 177.
      2011-07-27 05:23:10,656 INFO oz.Guest.FedoraGuest pid(2650) Message: Cleaning up after install
 178.
      2011-07-27 05:23:10,656 DEBUG oz.Guest.FedoraGuest pid(2650) Message: Removing modified ISO
 179.
      2011-07-27 05:23:10,693 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Exception caught in ImageFactory
 180.
      2011-07-27 05:23:10,694 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Traceback (most recent call last):
 181.
        File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 124, in build_image
 182.
          self.build_upload(build_id)
 183.
        File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 172, in build_upload
 184.
          libvirt_xml = self.guest.install(self.app_config["timeout"])
 185.
        File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1154, in install
 186.
          return self.do_install(timeout, force, 0)
 187.
        File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1139, in do_install
 188.
          self.wait_for_install_finish(dom, timeout)
 189.
        File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 451, in wait_for_install_finish
 190.
          raise oz.OzException.OzException("No disk activity in %d seconds, failing" % (inactivity_timeout))
 191.
      OzException: No disk activity in 300 seconds, failing
 192.
       
 193.
      2011-07-27 05:23:10,694 DEBUG imagefactory.BuildJob.BuildAdaptor pid(2650) Message: Raising event with agent handler (<ImageFactoryAgent(Thread-1, initial)>), changed status from FAILED to FAILED 


Expected results:
Inform user that the RHEV-M export domain was not writable and that is why the failure occurred.

Additional info:

Comment 1 Steve Loranz 2011-07-27 15:45:48 UTC
We have already been down this road several times and are limited by the amount of introspection is available to us for a VM.  Oz takes a snapshot of the VM console and, by default, puts that in /tmp.  For now this is as much as we can get.  The only other suggestion I can make is to include the path to the snapshot in the exception message so one knows it exists and to view it for more details.

Comment 2 Steve Loranz 2011-07-27 21:01:00 UTC
Created attachment 515595 [details]
[PATCH Oz] Add some text to the log pointing the user to the VM console screenshot when one is taken.

I sent this patch for Oz to Chris Lalancette.

Comment 3 Chris Lalancette 2011-07-28 08:44:59 UTC
Right, pointing out where the snapshot file was dropped is an obvious enhancement.

The problem in general as Steve points out is that we do this all in a VM.  In the general case, it is essentially impossible to get structured data out of the VM about why it failed.  That's why we went the snapshot route, to make sure we had some debugging tools.

It is possible to get some data out of the VMs by using certain features of (say) anaconda.  In that case, we could fetch out the data that anaconda is dumping to its logs, but the data is not consumable by a computer since it is not structured.  However, it could be useful for a human to diagnose.  That sort of functionality has been on the TODO list for some time; someone needs to spend some time making it happen.  It is not entirely trivial to do, but it should be doable in special cases.

Comment 4 Steve Loranz 2011-08-16 16:45:44 UTC
Just chatted with Chris and he says printing the path to the console snapshot is slated for 0.6.0 of Oz.  Not sure if that is the above patch or if he rolled his own.

Comment 5 wes hayutin 2011-09-28 16:37:57 UTC
making sure all the bugs are at the right version for future queries

Comment 7 Steve Linabery 2012-01-18 18:06:46 UTC
Spoke with Steve Loranz and we think this was resolved with release of oz 0.6.0

Comment 9 Martin Kočí 2012-02-02 16:24:43 UTC
Test passed (shown path to image) with package 
oz-0.9.0-0.20120127190146git043d582.el6.noarch
....
imagefactory-1.0.0rc3_5_g1041f75-1.el6.noarch
2012-02-01 12:04:08,590 DEBUG imgfac.builders.BaseBuilder.RHEL6_rhevm_Builder pid(4380) Message: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_rhevm_Builder.py", line 65, in build_image
    self.build_upload(build_id)
  File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_rhevm_Builder.py", line 112, in build_upload
    libvirt_xml = guest.install(self.app_config["timeout"])
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1456, in install
    return self._do_install(timeout, force, 0)
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1438, in _do_install
    self._wait_for_install_finish(dom, timeout)
  File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 546, in _wait_for_install_finish
    raise oz.OzException.OzException(exc_str)
OzException: No disk activity in 300 seconds, failing.  Check screenshot at /factory-build-06cec2a3-eaa4-4acc-bdcf-48c60af34e2f-1328115846.88.png for more detail
....
Moving bug to VERIFIED

Comment 10 errata-xmlrpc 2012-05-15 20:10:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0588.html


Note You need to log in before you can comment on or make changes to this bug.