Bug 801861 - Parameter mismatch in returned parameters caused deadlock
Parameter mismatch in returned parameters caused deadlock
Status: CLOSED ERRATA
Product: CloudForms Cloud Engine
Classification: Red Hat
Component: aeolus-audrey-agent (Show other bugs)
1.0.0
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Dan Radez
Rehana
: Triaged, ZStream
Depends On:
Blocks: 824904
  Show dependency treegraph
 
Reported: 2012-03-09 11:51 EST by Steve Reichard
Modified: 2012-12-04 09:58 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
A faulty Application Blueprint locked instances in a boot state. For example, an Application Blueprint with a mistyped parameter name failed due to Audrey Configuration Server being unable to find the correct runtime configuration settings. This bug fix enables Audrey Configuration Server to use a ten minute time-out to bypass any errors with Application Blueprints. This allows the instances to boot without runtime configuration. Ensure to verify all Application Blueprints before launch to avoid this issue.
Story Points: ---
Clone Of:
: 824904 (view as bug list)
Environment:
Last Closed: 2012-12-04 09:58:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
dradez: needinfo+


Attachments (Terms of Use)

  None (edit)
Description Steve Reichard 2012-03-09 11:51:49 EST
Description of problem:

While I understand that I caused this issue to myself, it seems that some potential checking would have save a bit of aggravation.

I had a mutli-instance Blueprint which one assembly returned two parameters to another instance.  Access to the parameters use a reference to the assembly 
name.  I mistyped a "-" instead of a "_", thus the expect parameters were not actually defined.   Then the instance that was waiting in the boot sequence for the parameters was waiting for config server to pass them to it, it wait since config server never had the 'correct' parameters.

Seems like this case could have been caught prior to launch by checking that names and reference existing in the Blueprint.


Version-Release number of selected component (if applicable):


Beta3


Cfg Server

[root@dhcp-105 aeolus-configserver]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.2 (Santiago)
[root@dhcp-105 aeolus-configserver]# uname -a
Linux dhcp-105.cloud.lab.eng.bos.redhat.com 2.6.32-220.4.2.el6.x86_64 #1 SMP Mon Feb 6 16:39:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-105 aeolus-configserver]# rpm -qa aeolus-configserver
aeolus-configserver-0.4.6-0.el6.noarch
[root@dhcp-105 aeolus-configserver]# 

CE

yum list | grep aeolus
aeolus-audrey-agent.noarch
aeolus-configserver.noarch
[root@dhcp-105 aeolus-configserver]# 


How reproducible:

Was reproducible while debuggin it.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 wes hayutin 2012-03-12 16:02:28 EDT
multi-instance not supported in 1.0 kicking off
Comment 2 Hugh Brock 2012-05-08 13:02:15 EDT
Greg, we are considering this for Z -- how hard is it to fix?
Comment 3 Greg Blomquist 2012-05-11 10:37:55 EDT
The zstream-able fix here is to force audrey-agent to bail after X number of tries at resolving parameters.

Dan Radez claims that this is a low LOE in the 1.0 codebase for Audrey.  I'll trust his claim, since he has a much better grasp of that code that I do.

Assigned to Dan to patch this for zstream.
Comment 4 Dan Radez 2012-05-23 22:21:37 EDT
fixed in b2ff50ae2a4a1b123b96ce18dd7b370bb2fe78bd
this add a counter that will bail after a count of apx 10 mins
Comment 5 Dan Radez 2012-05-23 22:32:38 EDT
one more update in e3981fd5faf8f304e096d9620628b8ff73463b9b
agent version 0.4.8
Comment 6 wes hayutin 2012-05-24 09:07:21 EDT
(In reply to comment #4)
> fixed in b2ff50ae2a4a1b123b96ce18dd7b370bb2fe78bd
> this add a counter that will bail after a count of apx 10 mins

Need some doc to reflect the 10 min timeout.
Comment 7 Greg Blomquist 2012-05-24 09:46:12 EDT
Wes, should the docs to reflect 10min timeout be in the form of release notes (i.e., tech notes in BZ)?
Comment 10 Dan Macpherson 2012-05-29 01:18:25 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A faulty application blueprint can lock instances in a boot state. For example, an application blueprint with a mistyped parameter name fails due to Audrey Configuration Server being unable to find the correct runtime configuration settings. The Audrey Configuration Server uses a ten minute timeout to bypass any errors with application blueprints. This allows the instances to boot without runtime configuration. Ensure to verify all application blueprints before launch to avoid this issue.
Comment 11 dgao 2012-06-19 13:30:03 EDT
[root@dhcp77-213 ~]# rpm -qa | grep "aeolus-audrey"
aeolus-audrey-agent-0.4.9-1.el6_2.noarch


2012-06-19 07:38:12,886 - INFO    : audrey:951 Invoked CSClient.get_cs_tooling()
2012-06-19 07:38:12,929 - INFO    : audrey:683 Invoked unpack_tooling()
2012-06-19 07:38:12,936 - INFO    : audrey:908 Invoked CSClient.get_cs_configs()
2012-06-19 07:38:12,976 - INFO    : audrey:1369 No configuration parameters provided. status: 202
.
.
.

2012-06-19 07:48:18,513 - INFO    : audrey:908 Invoked CSClient.get_cs_configs()
2012-06-19 07:48:18,560 - INFO    : audrey:1369 No configuration parameters provided. status: 202

Audrey agent exists after 10mins. Verified.
Comment 13 errata-xmlrpc 2012-12-04 09:58:26 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-1516.html

Note You need to log in before you can comment on or make changes to this bug.