| Summary: | [RFE] More descriptive logging for aeolus-image failures | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] CloudForms Cloud Engine | Reporter: | james labocki <jlabocki> | ||||
| Component: | imagefactory | Assignee: | Steve Linabery <slinaber> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Kočí <mkoci> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 1.0.0 | CC: | akarol, clalance, dajohnso, deltacloud-maint, mkoci, morazi, sloranz, ssachdev | ||||
| Target Milestone: | rc | Keywords: | FutureFeature | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Enhancement | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-05-15 20:10:26 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
We have already been down this road several times and are limited by the amount of introspection is available to us for a VM. Oz takes a snapshot of the VM console and, by default, puts that in /tmp. For now this is as much as we can get. The only other suggestion I can make is to include the path to the snapshot in the exception message so one knows it exists and to view it for more details. Created attachment 515595 [details]
[PATCH Oz] Add some text to the log pointing the user to the VM console screenshot when one is taken.
I sent this patch for Oz to Chris Lalancette.
Right, pointing out where the snapshot file was dropped is an obvious enhancement. The problem in general as Steve points out is that we do this all in a VM. In the general case, it is essentially impossible to get structured data out of the VM about why it failed. That's why we went the snapshot route, to make sure we had some debugging tools. It is possible to get some data out of the VMs by using certain features of (say) anaconda. In that case, we could fetch out the data that anaconda is dumping to its logs, but the data is not consumable by a computer since it is not structured. However, it could be useful for a human to diagnose. That sort of functionality has been on the TODO list for some time; someone needs to spend some time making it happen. It is not entirely trivial to do, but it should be doable in special cases. Just chatted with Chris and he says printing the path to the console snapshot is slated for 0.6.0 of Oz. Not sure if that is the above patch or if he rolled his own. making sure all the bugs are at the right version for future queries Spoke with Steve Loranz and we think this was resolved with release of oz 0.6.0 Test passed (shown path to image) with package
oz-0.9.0-0.20120127190146git043d582.el6.noarch
....
imagefactory-1.0.0rc3_5_g1041f75-1.el6.noarch
2012-02-01 12:04:08,590 DEBUG imgfac.builders.BaseBuilder.RHEL6_rhevm_Builder pid(4380) Message: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_rhevm_Builder.py", line 65, in build_image
self.build_upload(build_id)
File "/usr/lib/python2.6/site-packages/imgfac/builders/Fedora_rhevm_Builder.py", line 112, in build_upload
libvirt_xml = guest.install(self.app_config["timeout"])
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1456, in install
return self._do_install(timeout, force, 0)
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1438, in _do_install
self._wait_for_install_finish(dom, timeout)
File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 546, in _wait_for_install_finish
raise oz.OzException.OzException(exc_str)
OzException: No disk activity in 300 seconds, failing. Check screenshot at /factory-build-06cec2a3-eaa4-4acc-bdcf-48c60af34e2f-1328115846.88.png for more detail
....
Moving bug to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0588.html |
Description of problem: When building an image using aeolus-image, it would be helpful if errors in the build process were more descriptive as to the problem that was encountered (unable to write to NFS export domain in RHEV is example given below). How reproducible: Run aeolus-image command and have RHEV-M export domain not writable by user executing aeolus-image command. Steps to Reproduce: 1. Make RHEV-M provider's export domain read-only 2. execute aeolus-image build --target rhevm --template template.tpl 3. tail -f /var/log/imagefactory.log Actual results: 161. 2011-07-27 05:22:53,974 DEBUG oz.Guest.FedoraGuest pid(2650) Message: Waiting for tmpl1-a33d1b20-6500-422d-aa86-269f98ab984f to finish installing, 3210/3600 162. 2011-07-27 05:23:03,999 DEBUG oz.Guest.FedoraGuest pid(2650) Message: Waiting for tmpl1-a33d1b20-6500-422d-aa86-269f98ab984f to finish installing, 3200/3600 163. 2011-07-27 05:23:10,177 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Exception caught in ImageFactory 164. 2011-07-27 05:23:10,178 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Traceback (most recent call last): 165. File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 172, in build_upload 166. libvirt_xml = self.guest.install(self.app_config["timeout"]) 167. File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1154, in install 168. return self.do_install(timeout, force, 0) 169. File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1139, in do_install 170. self.wait_for_install_finish(dom, timeout) 171. File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 451, in wait_for_install_finish 172. raise oz.OzException.OzException("No disk activity in %d seconds, failing" % (inactivity_timeout)) 173. OzException: No disk activity in 300 seconds, failing 174. 175. 2011-07-27 05:23:10,179 INFO oz.Guest.FedoraGuest pid(2650) Message: Cleaning up guest named tmpl1-a33d1b20-6500-422d-aa86-269f98ab984f 176. 2011-07-27 05:23:10,655 DEBUG imagefactory.BuildJob.BuildAdaptor pid(2650) Message: Raising event with agent handler (<ImageFactoryAgent(Thread-1, initial)>), changed status from BUILDING to FAILED 177. 2011-07-27 05:23:10,656 INFO oz.Guest.FedoraGuest pid(2650) Message: Cleaning up after install 178. 2011-07-27 05:23:10,656 DEBUG oz.Guest.FedoraGuest pid(2650) Message: Removing modified ISO 179. 2011-07-27 05:23:10,693 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Exception caught in ImageFactory 180. 2011-07-27 05:23:10,694 DEBUG imagefactory.builders.BaseBuilder.FedoraBuilder pid(2650) Message: Traceback (most recent call last): 181. File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 124, in build_image 182. self.build_upload(build_id) 183. File "/usr/lib/python2.6/site-packages/imagefactory/builders/FedoraBuilder.py", line 172, in build_upload 184. libvirt_xml = self.guest.install(self.app_config["timeout"]) 185. File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1154, in install 186. return self.do_install(timeout, force, 0) 187. File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 1139, in do_install 188. self.wait_for_install_finish(dom, timeout) 189. File "/usr/lib/python2.6/site-packages/oz/Guest.py", line 451, in wait_for_install_finish 190. raise oz.OzException.OzException("No disk activity in %d seconds, failing" % (inactivity_timeout)) 191. OzException: No disk activity in 300 seconds, failing 192. 193. 2011-07-27 05:23:10,694 DEBUG imagefactory.BuildJob.BuildAdaptor pid(2650) Message: Raising event with agent handler (<ImageFactoryAgent(Thread-1, initial)>), changed status from FAILED to FAILED Expected results: Inform user that the RHEV-M export domain was not writable and that is why the failure occurred. Additional info: