Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Parameter mismatch in returned parameters caused deadlock|
|Product:||CloudForms Cloud Engine||Reporter:||Steve Reichard <sreichar>|
|Component:||aeolus-audrey-agent||Assignee:||Dan Radez <dradez>|
|Status:||CLOSED ERRATA||QA Contact:||Rehana <aeolus-qa-list>|
|Version:||1.0.0||CC:||akarol, asettle, cpelland, dajohnso, deltacloud-maint, dgao, dmacpher, dradez, gblomqui, hbrock, jliberma, scollier, whayutin|
|Target Milestone:||rc||Keywords:||Triaged, ZStream|
|Fixed In Version:||Doc Type:||Bug Fix|
A faulty Application Blueprint locked instances in a boot state. For example, an Application Blueprint with a mistyped parameter name failed due to Audrey Configuration Server being unable to find the correct runtime configuration settings. This bug fix enables Audrey Configuration Server to use a ten minute time-out to bypass any errors with Application Blueprints. This allows the instances to boot without runtime configuration. Ensure to verify all Application Blueprints before launch to avoid this issue.
|:||824904 (view as bug list)||Environment:|
|Last Closed:||2012-12-04 09:58:26 EST||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Steve Reichard 2012-03-09 11:51:49 EST
Description of problem: While I understand that I caused this issue to myself, it seems that some potential checking would have save a bit of aggravation. I had a mutli-instance Blueprint which one assembly returned two parameters to another instance. Access to the parameters use a reference to the assembly name. I mistyped a "-" instead of a "_", thus the expect parameters were not actually defined. Then the instance that was waiting in the boot sequence for the parameters was waiting for config server to pass them to it, it wait since config server never had the 'correct' parameters. Seems like this case could have been caught prior to launch by checking that names and reference existing in the Blueprint. Version-Release number of selected component (if applicable): Beta3 Cfg Server [root@dhcp-105 aeolus-configserver]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.2 (Santiago) [root@dhcp-105 aeolus-configserver]# uname -a Linux dhcp-105.cloud.lab.eng.bos.redhat.com 2.6.32-220.4.2.el6.x86_64 #1 SMP Mon Feb 6 16:39:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux [root@dhcp-105 aeolus-configserver]# rpm -qa aeolus-configserver aeolus-configserver-0.4.6-0.el6.noarch [root@dhcp-105 aeolus-configserver]# CE yum list | grep aeolus aeolus-audrey-agent.noarch aeolus-configserver.noarch [root@dhcp-105 aeolus-configserver]# How reproducible: Was reproducible while debuggin it. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Comment 1 wes hayutin 2012-03-12 16:02:28 EDT
multi-instance not supported in 1.0 kicking off
Comment 2 Hugh Brock 2012-05-08 13:02:15 EDT
Greg, we are considering this for Z -- how hard is it to fix?
Comment 3 Greg Blomquist 2012-05-11 10:37:55 EDT
The zstream-able fix here is to force audrey-agent to bail after X number of tries at resolving parameters. Dan Radez claims that this is a low LOE in the 1.0 codebase for Audrey. I'll trust his claim, since he has a much better grasp of that code that I do. Assigned to Dan to patch this for zstream.
Comment 4 Dan Radez 2012-05-23 22:21:37 EDT
fixed in b2ff50ae2a4a1b123b96ce18dd7b370bb2fe78bd this add a counter that will bail after a count of apx 10 mins
Comment 5 Dan Radez 2012-05-23 22:32:38 EDT
one more update in e3981fd5faf8f304e096d9620628b8ff73463b9b agent version 0.4.8
Comment 6 wes hayutin 2012-05-24 09:07:21 EDT
(In reply to comment #4) > fixed in b2ff50ae2a4a1b123b96ce18dd7b370bb2fe78bd > this add a counter that will bail after a count of apx 10 mins Need some doc to reflect the 10 min timeout.
Comment 7 Greg Blomquist 2012-05-24 09:46:12 EDT
Wes, should the docs to reflect 10min timeout be in the form of release notes (i.e., tech notes in BZ)?
Comment 10 Dan Macpherson 2012-05-29 01:18:25 EDT
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A faulty application blueprint can lock instances in a boot state. For example, an application blueprint with a mistyped parameter name fails due to Audrey Configuration Server being unable to find the correct runtime configuration settings. The Audrey Configuration Server uses a ten minute timeout to bypass any errors with application blueprints. This allows the instances to boot without runtime configuration. Ensure to verify all application blueprints before launch to avoid this issue.
Comment 11 dgao 2012-06-19 13:30:03 EDT
[root@dhcp77-213 ~]# rpm -qa | grep "aeolus-audrey" aeolus-audrey-agent-0.4.9-1.el6_2.noarch 2012-06-19 07:38:12,886 - INFO : audrey:951 Invoked CSClient.get_cs_tooling() 2012-06-19 07:38:12,929 - INFO : audrey:683 Invoked unpack_tooling() 2012-06-19 07:38:12,936 - INFO : audrey:908 Invoked CSClient.get_cs_configs() 2012-06-19 07:38:12,976 - INFO : audrey:1369 No configuration parameters provided. status: 202 . . . 2012-06-19 07:48:18,513 - INFO : audrey:908 Invoked CSClient.get_cs_configs() 2012-06-19 07:48:18,560 - INFO : audrey:1369 No configuration parameters provided. status: 202 Audrey agent exists after 10mins. Verified.
Comment 13 errata-xmlrpc 2012-12-04 09:58:26 EST
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-1516.html