Hide Forgot
Description of problem: While I understand that I caused this issue to myself, it seems that some potential checking would have save a bit of aggravation. I had a mutli-instance Blueprint which one assembly returned two parameters to another instance. Access to the parameters use a reference to the assembly name. I mistyped a "-" instead of a "_", thus the expect parameters were not actually defined. Then the instance that was waiting in the boot sequence for the parameters was waiting for config server to pass them to it, it wait since config server never had the 'correct' parameters. Seems like this case could have been caught prior to launch by checking that names and reference existing in the Blueprint. Version-Release number of selected component (if applicable): Beta3 Cfg Server [root@dhcp-105 aeolus-configserver]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.2 (Santiago) [root@dhcp-105 aeolus-configserver]# uname -a Linux dhcp-105.cloud.lab.eng.bos.redhat.com 2.6.32-220.4.2.el6.x86_64 #1 SMP Mon Feb 6 16:39:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux [root@dhcp-105 aeolus-configserver]# rpm -qa aeolus-configserver aeolus-configserver-0.4.6-0.el6.noarch [root@dhcp-105 aeolus-configserver]# CE yum list | grep aeolus aeolus-audrey-agent.noarch aeolus-configserver.noarch [root@dhcp-105 aeolus-configserver]# How reproducible: Was reproducible while debuggin it. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
multi-instance not supported in 1.0 kicking off
Greg, we are considering this for Z -- how hard is it to fix?
The zstream-able fix here is to force audrey-agent to bail after X number of tries at resolving parameters. Dan Radez claims that this is a low LOE in the 1.0 codebase for Audrey. I'll trust his claim, since he has a much better grasp of that code that I do. Assigned to Dan to patch this for zstream.
fixed in b2ff50ae2a4a1b123b96ce18dd7b370bb2fe78bd this add a counter that will bail after a count of apx 10 mins
one more update in e3981fd5faf8f304e096d9620628b8ff73463b9b agent version 0.4.8
(In reply to comment #4) > fixed in b2ff50ae2a4a1b123b96ce18dd7b370bb2fe78bd > this add a counter that will bail after a count of apx 10 mins Need some doc to reflect the 10 min timeout.
Wes, should the docs to reflect 10min timeout be in the form of release notes (i.e., tech notes in BZ)?
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A faulty application blueprint can lock instances in a boot state. For example, an application blueprint with a mistyped parameter name fails due to Audrey Configuration Server being unable to find the correct runtime configuration settings. The Audrey Configuration Server uses a ten minute timeout to bypass any errors with application blueprints. This allows the instances to boot without runtime configuration. Ensure to verify all application blueprints before launch to avoid this issue.
[root@dhcp77-213 ~]# rpm -qa | grep "aeolus-audrey" aeolus-audrey-agent-0.4.9-1.el6_2.noarch 2012-06-19 07:38:12,886 - INFO : audrey:951 Invoked CSClient.get_cs_tooling() 2012-06-19 07:38:12,929 - INFO : audrey:683 Invoked unpack_tooling() 2012-06-19 07:38:12,936 - INFO : audrey:908 Invoked CSClient.get_cs_configs() 2012-06-19 07:38:12,976 - INFO : audrey:1369 No configuration parameters provided. status: 202 . . . 2012-06-19 07:48:18,513 - INFO : audrey:908 Invoked CSClient.get_cs_configs() 2012-06-19 07:48:18,560 - INFO : audrey:1369 No configuration parameters provided. status: 202 Audrey agent exists after 10mins. Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-1516.html