Bug 733516 - support for proposed Aviary endpoint lookup feature
Summary: support for proposed Aviary endpoint lookup feature
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: cumin
Version: Development
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: 2.3
: ---
Assignee: Trevor McKay
QA Contact: Stanislav Graf
URL:
Whiteboard:
Depends On: 733515
Blocks: 865863
TreeView+ depends on / blocked
 
Reported: 2011-08-25 20:43 UTC by Pete MacKinnon
Modified: 2013-03-06 18:39 UTC (History)
7 users (show)

Fixed In Version: cumin-0.1.5338-1
Doc Type: Enhancement
Doc Text:
Cause The release of CuminAviary as a Technology Preview feature required manual configuration of Aviary endpoints in the /etc/cumin/cumin.conf file. Consequence In larger scale deployments manual configuration of Aviary endpoints in Cumin presents a maintenance burden. Change Cumin has been extended to use the Aviary locator service. This service allows Cumin to discover all of the Aviary endpoints in the pool by querying a service at a single well-known URL. Use of the locator service is off by default but may be enabled in /etc/cumin/cumin.conf. See the Management Console Installation Guide for more information. Result Enabling the locator service relieves the maintenance burden in deployments with multiple Aviary services.
Clone Of:
Environment:
Last Closed: 2013-03-06 18:39:00 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0564 normal SHIPPED_LIVE Low: Red Hat Enterprise MRG Grid 2.3 security update 2013-03-06 23:37:09 UTC
Red Hat Bugzilla 733515 None None None Never

Internal Links: 733515

Description Pete MacKinnon 2011-08-25 20:43:21 UTC
Changes in cumin to obtain Aviary endpoints from either QMF or a new remote service.

Comment 3 Martin Kudlej 2012-04-04 15:58:52 UTC
How can we test this, please? Is there any change in cumin installation or interface?

Comment 4 Trevor McKay 2012-04-05 15:39:48 UTC
There will be changes in the cumin.conf file to inform Cumin of where the Aviary lookup service can be found.

Testing will involve

1) Set up an installation of cumin/qpid/condor, with condor-aviary

2) Modify the standard config file to point at the aviary locator (TBD how to do this)

3) Submit jobs (preferably to multiple different schedulers), drill into submissions for job lists, drill into jobs for detailed descriptions, edit job attributes, hold/release/remove jobs -- all Aviary operations.  If the locator integration is working correctly, this will all work.

4) Probably need to repeat for ssl and non-ssl Aviary configuration

(In reply to comment #3)
> How can we test this, please? Is there any change in cumin installation or
> interface?

Comment 5 Trevor McKay 2012-04-18 15:21:07 UTC
Fixed in revision 5311.

More information on testing to come shortly.

Comment 6 Trevor McKay 2012-04-26 19:48:17 UTC
Out of the box configuration for Cumin will match condor-aviary defaults for the query service, job service, and locator service -- that is, Aviary is active for both job and query services, using well-known ports, no locator service and no SSL.  Cumin assumes that condor-aviary WILL be installed with condor and that Cumin is installed on the CM with condor.

Turning the locator use on in Cumin requires setting the "aviary-locator" parameter in /etc/cumin/cumin.conf.  A portion of the standard cumin config file is shown below.  Use of the parameters is documented in the comments.  When Cumin is on the CM, just uncomment the #aviary-locator line below.

The corresponding parameter in Condor to turn on the locator service is AVIARY_PUBLISH_LOCATION in /etc/condor/config.d/61aviary.config

# ****************************************************
# Aviary interface to condor

# The value for this parameter is a comma separated list of URLs for Aviary
# job servers. If the Aviary locator is used, this value will be overriden 
# but must still be non-empty to enable use of Aviary job servers.
# Default value is shown. Uncomment and leave the value blank to disable.
# aviary-job-servers: http://localhost:9090

# The value for this parameter is a comma separated list of URLs for Aviary
# query servers. If the Aviary locator is used, this value will be overriden 
# but must still be non-empty to enable use of Aviary query servers.
# Default value is shown. Uncomment and leave the value blank to disable.
# aviary-query-servers: http://localhost:9091

# The locator allows Cumin to retrive values for Aviary job servers and
# Aviary query servers automatically.  If the Aviary locator is enabled the
# values for aviary-job-servers and aviary-query-servers will be overriden
# (but those parameters must still be non-empty to be enabled).
# Default is empty string (aviary locator will not be used). Uncomment the 
# following line and edit as needed to enable.
# aviary-locator: http://localhost:9000

Comment 7 Trevor McKay 2012-04-26 20:31:49 UTC
Testing that out of the box configuration uses Aviary with well-known ports and that configuration parameters operate as expected:

1) Use a standard installation of condor, condor-qmf, condor-aviary, qpid-cpp-server, and cumin all on a single host.  Set up SASL cumin user, etc.

2) Start qpid, condor, and cumin

3) Examine /var/log/cumin/web.log for the following lines.  These log entries verify that the locator is off and the job and query services are both being used.

4073 2012-04-26 12:42:11,449 INFO AviaryOperations: no root certificate file specified, using client validation only for ssl connections.
4073 2012-04-26 12:42:11,450 INFO Disabled Aviary locator interface
4073 2012-04-26 12:42:11,450 INFO Enabled Aviary interface for job submission and control.
4073 2012-04-26 12:42:11,451 INFO Enabled Aviary interface for query operations.

4) Do Submissions from  Cumin and drill into submissions and jobs to verify that the service is working.

5) To verify that Aviary is running with the standard well known ports out of the box, you can visit these URLs on the CM:

http://localhost:9090/services
http://localhost:9091/services

6) To test error handling, stop the condor service and change the well-known ports for the job service and query service in /etc/condor/config.d/61aviary.config to something else:

SCHEDD.HTTP_PORT = 9093
QUERY_SERVER.HTTP_PORT = 9094

7) Restart condor, wait for objects to appear in Cumin.

8) Try to drill into a Submission.  Operation with fail:

There are no jobs
[hide details]
The call status is Trouble reaching host HOSTNAME, (111,'Connection refused')

9) Try to submit a job.  Operation will fail:
Submit job 'cat': Failed (Trouble reaching host HOSTNAME, (111,'Connection refused'))

10) Test that Cumin config parameters are working to specify well-known ports.  Edit /etc/cumin/cumin.conf and change aviary settings to match the modified condor config:

aviary-job-servers: http://localhost:9093
aviary-query-servers: http://localhost:9094

11) Restart Cumin

12) Drilling into Submissions and submitting jobs should work again

13) Test that well known ports are not used by Cumin when the locator parameter is turned on.  Edit /etc/cumin/cumin.conf and uncomment the aviary-locator setting.

aviary-locator: http://localhost:9000

14) Restart cumin.

15) Try to drill into a job.  Result will be:

Cannot locate query service on HOSTNAME via aviary locator.

16) Try to submit a job.  Result will be:

Submit job 'cat':  Failed (Cannot locate job service on HOSTNAME via aviary locator)

17) Test that use of Aviary in Cumin will be disabled if config parameters are set to blank values.  Edit /etc/cumin/cumin.conf and change parameters as follows:

aviary-job-servers:
aviary-query-servers:

18) Restart cumin

19) Examine /var/log/cumin/web.log:

9127 2012-04-26 16:29:18,497 INFO Disabled Aviary locator interface
9127 2012-04-26 16:29:18,497 INFO Disabled Aviary interface for job submission and control.
9127 2012-04-26 16:29:18,498 INFO Disabled Aviary interface for query operations.


20) Submit jobs and drill into submissions.  Operations will work (QMF is being used)

Comment 8 Trevor McKay 2012-04-27 01:49:29 UTC
Note, the suds logs are another resource for testing.  They will show the requests and responses to the aviary servers, and the URLs with port number will be in the logs.  The suds.client.log or suds.transport.log in /var/log/cumin should work for this.  Otherwise, Cumin does not log the endpoints.  Turn on suds logging with this in cumin.config:

aviary-suds-logs: True

Testing with the locator (using the same set up from above)

1) Edit /etc/condor/config.d/61aviary.config and set AVIARY_PUBLISH_LOCATION

AVIARY_PUBLISH_LOCATION = True

2) Restart condor

3) Optionally use /usr/share/condor/aviary/locator.py to check that the job service and query service are up and using the locator:

# cd /usr/share/condor/aviary
# export PYTHONPATH=`pwd`/module:$PYTHONPATH
# ./locator.py --type=ANY
CUSTOM | QUERY_SERVER | query@hostname | http://hostname:41760/services/query/
SCHEDULER | JOB | job@hostname | http://hostname:56696/services/job/

4) Edit /etc/cumin/cumin.conf.  Make sure that aviary-locator is set and that aviary-job-servers and aviary-query-servers are non-blank (value doesn't matter if locator is on)

aviary-job-servers: yeah I want this
aviary-query-servers: and this too
aviary-locator: http://localhost:9000

5) Restart Cumin

6) If aviary-suds-logs has been turned on, this is a nice way to see the ports being used:

# tail -f /var/log/cumin/suds.client.log | grep "sending to"

7) Create some Submissions and drill into jobs, hold, release etc.  Operations should work.  Output from the tail command will look something like this:

sending to (http://hostname:41760/services/query/getSubmissionSummary)
sending to (http://hostname:41760/services/query/getSubmissionSummary)
sending to (http://hostname:56696/services/job/submitJob)

Testing that Cumin will query the locator for a new endpoint if the endpoint moves:

8) Turn on debug level logging in cumin.conf:

log-level: debug

9) Let the master restart just cumin-web (convenience)

#pkill cumin-web

10) Submit, drill into submissions, etc to establish endpoints and see Cumin working

11) Stop the condor query server

#condor_off -daemon QUERY_SERVER

12) Try to drill into a submission in Cumin.  The operation will fail:

Cannot locate query service on HOSTNAME via aviary locator.

13) /var/log/cumin/web.log will contain lines like these.  The failed operation causes Cumin to contact the locator and look for a new endpoint before retrying the operation:

10594 2012-04-26 21:15:29,227 DEBUG AviaryOperations: refresh server list for CUSTOM QUERY_SERVER
10594 2012-04-26 21:15:29,444 INFO AviaryOperations: locator returned no endpoints for CUSTOM QUERY_SERVER
10594 2012-04-26 21:15:29,446 INFO AviaryOperations: failed to locate query service on hostname

14)  The suds.client.log will also show Cumin trying to contact the locator:

sending to (http://localhost:9000/services/locator/locate)

15) Restart the condor query server

#condor_on -daemon QUERY_SERVER

16) Try to drill into a submission again in Cumin. The web.log and suds.client.log should show Cumin contacting the locator.  The suds.client.log should show Cumin sending to a new end point for the query server, and the operation should succeed.

17) Steps analogous to 11-16 can be done for the job service as well.  Submit a job to make sure the job service is running.

18) Use condor_restart -schedd to cause the job service to restart with a new endpoint.  /usr/share/condor/aviary/locator.py can be used to verify when the job server comes back.

19) Submit another job, or hold/release an existing one.  Operation should succeeed.

20) /var/log/cumin/web.log will show Cumin contacting the locator for a new endpoint

11085 2012-04-26 21:38:24,256 DEBUG AviaryOperations: refresh server list for SCHEDULER JOB

21) /var/log/cumin/suds.client.log will show Cumin contacting the locator and sending to the new endpoint:

sending to (http://localhost:9000/services/locator/locate)
sending to (http://hostname:58630/services/job/holdJob)

Comment 9 Trevor McKay 2012-05-04 19:23:35 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    The initial release of CuminAviary as a Technology Preview feature required manual configuration of Aviary endpoints in the /etc/cumin/cumin.conf file.

Consequence
    In larger scale deployments with multiple condor schedulers using Aviary services, manual configuration of the endpoints in Cumin presents a maintenance burden.

Change
    Cumin has been extended to use the Aviary locator service.  This service allows Cumin to discover all of the Aviary endpoints in the pool by querying a service at a single well-known URL.  Use of the locator service is off by default but may be enabled in /etc/cumin/cumin.conf.  See the Management Console Installation Guide for more information.

Result
    Enabling the locator service relieves the maintenance burden in deployments with multiple Aviary services.

Comment 15 Stanislav Graf 2013-01-28 07:54:10 UTC
(In reply to comment #7) 

All works as expected (cumin-0.1.5658-1)

Comment 16 Stanislav Graf 2013-01-28 09:14:27 UTC
(In reply to comment #8)

All works as expected (cumin-0.1.5658-1)

Comment 17 Stanislav Graf 2013-02-11 14:39:22 UTC
SSL testing done. (cumin-0.1.5675-1)

--> VERIFIED on RHEL 5/6 i386/x86_64

Comment 19 errata-xmlrpc 2013-03-06 18:39:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html


Note You need to log in before you can comment on or make changes to this bug.