Bug 805171
Summary: | various deltacloud exceptions while working w/ vsphere | |||
---|---|---|---|---|
Product: | [Retired] CloudForms Cloud Engine | Reporter: | wes hayutin <whayutin> | |
Component: | deltacloud-core | Assignee: | Michal Fojtik <mfojtik> | |
Status: | CLOSED ERRATA | QA Contact: | Ronelle Landy <rlandy> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 1.0.0 | CC: | cpelland, hbrock, jprovazn, jrd, juwu, rananda, whayutin | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
An overloaded VSphere provider caused Conductor to constantly switch between pending and vanished states. This update adds error handling for VSphere internal errors.
|
Story Points: | --- | |
Clone Of: | ||||
: | 827454 (view as bug list) | Environment: | ||
Last Closed: | 2012-12-04 15:00:09 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 827454 |
Description
wes hayutin
2012-03-20 15:55:44 UTC
[root@qeblade31 ~]# rpm -qa | grep deltacloud deltacloud-core-ec2-0.5.0-5.el6.noarch deltacloud-core-0.5.0-5.el6.noarch rubygem-deltacloud-client-0.5.0-2.el6.noarch deltacloud-core-vsphere-0.5.0-5.el6.noarch deltacloud-core-rhevm-0.5.0-5.el6.noarch Hi Wes, Those exceptions are normally returned when the VSphere is overloaded or misconfigured. I would recommend to reboot the VSphere machine. We can't do anything better than 'translating' this exception to something more human-readable, since this appears like backend error. If this isn't fixable by dev, should it be on-qa? Spoke with Wes regarding what is needed to resolve this issue. The problem is two-fold: Firstly - as Michal points out in comment 2 above - we see this error sporadically when vsphere is overloaded. See JIRA https://issues.apache.org/jira/browse/DTACLOUD-150 for a report of a similar error when the vsphere provider is invalid or otherwise unavailable. Deltacloud needs better error handling and reporting. Moving this BZ back to ASSIGNED (dev) to address the error handling and reporting issue in Deltacloud. As both this BZ and the JIRA show, the error is 'Unhandled'. DTACLOUD-150 is in 'reopened' status. Secondly, there is an issue of conductor retrying the request if the vsphere provider is busy/unavailable. Copying chat comment: <weshay> rlandy, actually I was hoping for a conductor fix.. to limit the number of times the request is submitted Wes will clone this BZ to request a conductor fix for this second part. Not clear how this got blocker+, it isn't. MOving to 1.0.z/1.1.0 Fix pushed to master repo and also to brew: commit b922f47472fdb35db2cd36be58bbfd35412515fb Author: Michal Fojtik <mfojtik> Date: Fri Jun 1 08:48:29 2012 +0200 VSphere: Added error handling for the Undefined namespace prefix error (BZ: #805171) From above, it seems that "Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //soapenv:Body/*" exception is raised both when vsphere server is overloaded or is unreachable. This not descriptive message is the displayed in UI if an instance launch fails -> we might catch the error and display something more descriptive. Switching between vanished/pending states is not a bug, but if this overload is quite common we might add extend dbomatic to swtich an instance into VANISHED state only after X fails. Tested rpms: >> rpm -qa |grep aeolus aeolus-configure-2.8.7-1.el6cf.noarch rubygem-aeolus-image-0.3.0-12.el6.noarch rubygem-aeolus-cli-0.7.2-1.el6cf.noarch aeolus-conductor-0.13.14-1.el6cf.noarch aeolus-conductor-daemons-0.13.14-1.el6cf.noarch aeolus-conductor-doc-0.13.14-1.el6cf.noarch aeolus-all-0.13.14-1.el6cf.noarch >> rpm -qa |grep deltacloud rubygem-deltacloud-client-0.5.0-2.el6.noarch deltacloud-core-vsphere-0.5.0-10.el6_2.noarch deltacloud-core-rhevm-0.5.0-10.el6_2.noarch deltacloud-core-ec2-0.5.0-10.el6_2.noarch deltacloud-core-0.5.0-10.el6_2.noarch Launching an instance to a Vsphere provider that is currently unavailable does result in the instance status being shown as 'Vanished'. *********** Alerts 1 ce-2h6kn/testVsphere Instance Failure vanished *********** See the attached screenshot. Current versions of Deltacloud (Deltacloud 1.0.3) return a 504 error if a user tries to access a Vsphere provider that is unavailable. Copying from the logs: 10.11.9.168 - - [27/Sep/2012 16:02:17] "GET /api/images?format=xml HTTP/1.1" 504 46057 60.0481 What is interesting is that looking at the Application History, the output shows: ************** 27-Sep-2012 19:51:26: Instance ce-gje7a/AutoImageImport-DoNotDelete created 27-Sep-2012 19:51:26: State changed to pending 27-Sep-2012 19:51:27: Attempting to launch this deployment on provider account vsphere-default_Administrator 27-Sep-2012 19:51:32: Instance ce-gje7a/AutoImageImport-DoNotDelete ce-gje7a/AutoImageImport-DoNotDelete: 500 : Unhandled exception or status code (No route to host - connect(2)) ************** which picks up an unhandled exception - although with a correct error message. It is true that there is no route to the host. Considering that the error message is now correct, that the unhandled exception no longer shows up in an newer version of Deltacloud and that the application itself is shown as 'Vanished', going to mark this as verified and touch base with development wrt the exception in the History. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-1516.html |