Bug 1387344

Summary: sahara-api services not running after Liberty to Mitaka Upgrade
Product: Red Hat OpenStack Reporter: Randy Perryman <randy_perryman>
Component: openstack-tripleo-heat-templatesAssignee: Marios Andreou <mandreou>
Status: CLOSED NOTABUG QA Contact: mlammon
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: achernet, arkady_kanevsky, david_paterson, egafford, jcoufal, jschluet, kasmith, ltoscano, mandreou, mburns, morazi, ohochman, randy_perryman, rhel-osp-director-maint, sathlang, sumedh_sathaye, wayne_allen
Target Milestone: ---Flags: mandreou: needinfo-
Target Release: 9.0 (Mitaka)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1381628 Environment:
Last Closed: 2016-12-02 17:54:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1381628    
Bug Blocks: 1305654, 1337794    
Attachments:
Description Flags
Netwrok Enviroment Files Used
none
Dell Specific Environment
none
Config File
none
api-paste none

Description Randy Perryman 2016-10-20 16:31:54 UTC
+++ This bug was initially created as a clone of Bug #1381628 +++

Description of problem:
As described in the upstream bug @ https://bugs.launchpad.net/tripleo/+bug/1630247 Sahara services go from default On in OSP9 but default Off for OSP10.

This bug is to track the related changes for the downstream package builds (see upstream bug for details on reviews)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from marios on 2016-10-06 13:04:22 EDT ---

Adding a note here on request:

For the duration of the OSP9 to OSP10 upgrade we handle this special case and default to keeping the existing sahara services, both after the controller upgrade step and after the converge step. This is done with the review at [0] which has now landed into stable/newton.

However, *any* subsequent stack update operations will *have* to include the existing -e 'services/sahara.yaml' [1]  environment file, as part of the deployment command. Not doing so would mean that none of the Sahara related configuration would be effected, since by default Newton specifies that sahara services map to 'OS::Heat::None' [2]

If the operator decides they do *not* want sahara after their upgrade, they need to include the provided -e 'major-upgrade-remove-sahara.yaml' environment file as part of the deployment command for the controller upgrade and converge steps. The sahara services would not be restarted after the major upgrade in this case.



[0] https://review.openstack.org/#/c/382748/
[1] https://github.com/openstack/tripleo-heat-templates/blob/master/environments/services/sahara.yaml
[2] https://github.com/openstack/tripleo-heat-templates/blob/883addf267933395c580e0eab3efc379401c946c/overcloud-resource-registry-puppet.j2.yaml#L138

--- Additional comment from marios on 2016-10-12 09:59:02 EDT ---

moved to POST as it merged into newton upstream

--- Additional comment from Randy Perryman on 2016-10-20 10:40:58 EDT ---

I am seeing the same thing with Upgrade from OSP 8 to OSP 9 is the same fix available for this?

--- Additional comment from Randy Perryman on 2016-10-20 12:30:00 EDT ---

What rpm supplies the -e 'major-upgrade-remove-sahara.yaml'?  

I have installed the following:

openstack-tripleo-heat-templates-2.0.0-34.el7ost.noarch

Comment 1 Randy Perryman 2016-10-20 16:33:14 UTC
I have just complete the Liberty to Mitaka Update and find that Sahara is now installed and trying to run, but the service is not configured and should not be installed.

Comment 2 Marios Andreou 2016-10-21 08:17:19 UTC
Hi Randy... indeed the Sahara service was new for Mitaka. It *should* be configured with reasonable defaults (i.e. even if you aren't setting any of the Sahara config explicitly) - for example a grep for 'sahara' on the current stable/mitaka branch of the tripleo-heat-templates https://github.com/openstack/tripleo-heat-templates/tree/stable/mitaka


        ./puppet/hieradata/database.yaml:72:sahara::db::mysql::user: sahara
        ./puppet/hieradata/database.yaml:73:sahara::db::mysql::host: "%{hiera('mysql_virtual_ip')}"
        ./puppet/controller.yaml:1310:        sahara_password: {get_param: SaharaPassword}
        ./puppet/controller.yaml:1311:        sahara_dsn:
        ./puppet/controller.yaml-1312-          list_join:

        ./puppet/controller.yaml:1735:                sahara::admin_password: {get_input: sahara_password}
        ./puppet/controller.yaml:1736:                sahara::auth_uri: {get_input: keystone_auth_uri}
        ./puppet/controller.yaml:1737:                sahara::admin_user: sahara
        ./puppet/controller.yaml:1738:                sahara::identity_uri: {get_input: keystone_identity_uri}
        ./puppet/controller.yaml:1739:                sahara::use_neutron: true

        ./puppet/hieradata/controller.yaml:59:sahara::admin_tenant_name: 'service'
        ./puppet/manifests/overcloud_controller_pacemaker.pp:1429:    pacemaker::constraint::base { 'sahara-api-then-sahara-engine-constraint':



So are you seeing a Sahara related error during the Liberty to Mitaka upgrade which you believe to be related to missing configuration? It *could* be a bug and I'm trying to understand more about what is happening here. Are you sure you've used the mitaka templates during your upgrade (e.g. Liberty templates do *not* setup config for Sahara...)?

If so can we see a trace/more info about the error you get

Comment 4 Luigi Toscano 2016-10-21 09:24:42 UTC
Hi,
is this a real bug or just a clone of the "9-to-10" upgrade for the "8-to-9" case?

The two scenarios are different:
9 has no composable roles, so Sahara is forcibly on. Unless Sahara is not working after the upgrade, there is much to be done (and I would say that this should be closed).
10 has composable roles, but this is in the scope of the other bug (rhbz#1387343).

Comment 5 Luigi Toscano 2016-10-21 09:31:03 UTC
(In reply to Luigi Toscano from comment #4)
> Hi,
> is this a real bug or just a clone of the "9-to-10" upgrade for the "8-to-9"
> case?
> 
> The two scenarios are different:
> 9 has no composable roles, so Sahara is forcibly on. Unless Sahara is not
> working after the upgrade, there is much to be done (and I would say that
> this should be closed).
> 10 has composable roles, but this is in the scope of the other bug
> (rhbz#1387343).

Correction: the main bug for 10 is the original one (1381628). I'd suggest to close this one as duplicate of 1387343, and move then discuss there (I still think that 1387343 should be closed too for the reasons exposed above).

Comment 6 Randy Perryman 2016-10-21 12:01:16 UTC
*** Bug 1387343 has been marked as a duplicate of this bug. ***

Comment 7 Randy Perryman 2016-10-21 12:21:12 UTC
I was not expecting Sahara installed and configured, as I have not setup my install to support it.   The reason for the bug is that the sahara-api failed to start on any of my servers.

Comment 8 Randy Perryman 2016-10-21 12:22:11 UTC
 openstack-sahara-api_start_0 on overcloud-controller-1 'not running' (7): call=1433, status=complete, exitreason='none',
    last-rc-change='Fri Oct 21 12:05:11 2016', queued=0ms, exec=2118ms

Comment 10 Randy Perryman 2016-10-21 12:27:51 UTC
sosreports are attached to bug https://bugzilla.redhat.com/show_bug.cgi?id=1385143

Comment 11 Randy Perryman 2016-10-21 12:29:29 UTC
from Sahara api log:

2016-10-21 12:06:19.969 25006 INFO keystonemiddleware.auth_token [-] Starting Keystone auth_token middleware
2016-10-21 12:06:19.971 25006 WARNING keystonemiddleware.auth_token [-] Use of the auth_admin_prefix, auth_host, auth_port, auth_protocol, identity_uri, admin_token, admin_user, admin_password, and admin_tenant_name configuration options was deprecated in the Mitaka release in favor of an auth_plugin and its related options. This class may be removed in a future release.
2016-10-21 12:06:19.976 25006 INFO sahara.main [-] Driver distributed successfully loaded
2016-10-21 12:06:19.979 25006 ERROR oslo.service.wsgi [-] Could not bind to :8386
2016-10-21 12:06:19.980 25006 CRITICAL sahara [-] error: [Errno 98] Address already in use
2016-10-21 12:06:19.980 25006 ERROR sahara Traceback (most recent call last):
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/bin/sahara-api", line 10, in <module>
2016-10-21 12:06:19.980 25006 ERROR sahara     sys.exit(main())
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/lib/python2.7/site-packages/sahara/cli/sahara_api.py", line 60, in main
2016-10-21 12:06:19.980 25006 ERROR sahara     api_service = server.SaharaWSGIService("sahara-api", app)
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/lib/python2.7/site-packages/sahara/main.py", line 70, in __init__
2016-10-21 12:06:19.980 25006 ERROR sahara     use_ssl=sslutils.is_enabled(CONF))
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/lib/python2.7/site-packages/oslo_service/wsgi.py", line 115, in __init__
2016-10-21 12:06:19.980 25006 ERROR sahara     self.socket = self._get_socket(host, port, backlog)
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/lib/python2.7/site-packages/oslo_service/wsgi.py", line 143, in _get_socket
2016-10-21 12:06:19.980 25006 ERROR sahara     sock = eventlet.listen(bind_addr, family, backlog=backlog)
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/lib/python2.7/site-packages/eventlet/convenience.py", line 44, in listen
2016-10-21 12:06:19.980 25006 ERROR sahara     sock.listen(backlog)
2016-10-21 12:06:19.980 25006 ERROR sahara   File "/usr/lib64/python2.7/socket.py", line 224, in meth
2016-10-21 12:06:19.980 25006 ERROR sahara     return getattr(self._sock,name)(*args)
2016-10-21 12:06:19.980 25006 ERROR sahara error: [Errno 98] Address already in use
2016-10-21 12:06:19.980 25006 ERROR sahara

Comment 12 Randy Perryman 2016-10-21 12:31:20 UTC
haproxy configuration:


listen sahara
  bind 192.168.120.136:8386 transparent
  bind 192.168.190.125:8386 transparent
  server overcloud-controller-0 :8386 check fall 5 inter 2000 rise 2


------------

Comment 13 Randy Perryman 2016-10-21 12:37:14 UTC
to fix add the following to haproxy

 bind 192.168.120.136:8386 transparent
  bind 192.168.190.125:8386 transparent
  server overcloud-controller-0 192.168.140.102:8386 check fall 5 inter 2000 rise 2
  server overcloud-controller-1 192.168.140.104:8386 check fall 5 inter 2000 rise 2
  server overcloud-controller-0 192.168.140.106:8386 check fall 5 inter 2000 rise 2
 

and in sahara.conf

host = 192.168.140.102 (IP of Server)

Comment 14 Randy Perryman 2016-10-21 12:42:03 UTC
So two items:
1. Why is sahara installed when it was not asked for?
2. Why was it not properly configured?

Comment 15 Luigi Toscano 2016-10-22 00:16:53 UTC
(In reply to Randy Perryman from comment #14)
> So two items:
> 1. Why is sahara installed when it was not asked for?

I can answer this: because OSP9 has no composable roles and Sahara is always installed and enabled. This is no more the case with OSP10, where it is disabled by default on new installations.

> 2. Why was it not properly configured?
That's the good question.

Comment 16 David Paterson 2016-11-07 19:27:01 UTC
If Sahara is enabled by default why isn't upgrade working with default settings.  Please pursue resolution and if more information needed on our end please specify what you need.

Comment 17 Mike Burns 2016-12-01 16:58:52 UTC
any update on this bug for why it's not getting enabled successfully on upgrade?

Comment 18 Marios Andreou 2016-12-02 09:53:18 UTC
Hi folks, to be clear the description in comment #0 is wrong (and potentially confusing) for the scenario being tracked here. I am removing the external trackers above as they do not apply for this BZ (they were brought in with the cloning of BZ 1381628 which is specific to OSP9 to OSP 10 upgrade).

I think its worth clarifying that the issue seen here is: "sahara-api is not running on any of the controller nodes after the OSP8 to OSP9 upgrade". Randy has provided a trace in comment #11 of the error.

From a couple of comments but notably comment #14 Randy brings up the issue of 'why is sahara installed at all?' @Randy unfortunately prior to OSP10 it isn't possible to configure the list of services that are deployed on nodes. In OSP8 there was no Sahara. In OSP9 there is Saraha. If removal of that service is important enough (I mean other than overcoming the potential misconfiguration here) then I'd suggest filing a new BZ like "remove sahara from the installed services in OSP9 because foo" - it should be tracked as a stand-alone issue if it is to be considered and effected.

Randy can we please have the templates you used for this deployment? Looking at the trace in comment #11 it seems the culprit is quite clearly "2016-10-21 12:06:19.980 25006 CRITICAL sahara [-] error: [Errno 98] Address already in use". I am not sure why you hit this issue when it wasn't seen in our CI/QE testing - we may well have missed something but we should start by sanity checking the values you provided on deploy (i.e. the templates and environment files you used for the upgrade process).

In the meantime I'll also followup to the Storage and Deployment teams to have a look once those are available. 

thanks, marios

Comment 20 Randy Perryman 2016-12-02 13:29:11 UTC
Marios that install is long gone, so I do not have the templates. I do have the two templates we run for network-environment.yaml and the dell-environment.yaml. The rest were all stock from the install. I will attach them.  Also could the Address already in use be a by-product of the service starting, moving under PCS and attempt restart?

Comment 21 Randy Perryman 2016-12-02 13:30:12 UTC
Created attachment 1227325 [details]
Netwrok Enviroment Files Used

Comment 22 Randy Perryman 2016-12-02 13:30:42 UTC
Created attachment 1227326 [details]
Dell Specific Environment

Comment 23 Randy Perryman 2016-12-02 17:12:00 UTC
Using the following openstack-tripleo-heat-templates-2.0.0-36.el7ost.noarch
----------------------
-e ~/pilot/templates/overcloud/environments/network-isolation.yaml \
-e ~/pilot/templates/overcloud/environments/storage-environment.yaml \
-e ~/pilot/templates/overcloud/environments/puppet-pacemaker.yaml \
-e ~/pilot/templates/overcloud/environments/major-upgrade-aodh.yaml \
-e ~/pilot/templates/dell-environment.yaml \
-e ~/pilot/templates/network-environment.yaml \
---------------------

The templates are copied to  ~/pilot/templates/overcloud.

Comment 24 Randy Perryman 2016-12-02 17:20:09 UTC
Created attachment 1227422 [details]
Config File

Sahara

Comment 25 Randy Perryman 2016-12-02 17:21:09 UTC
Created attachment 1227423 [details]
api-paste

Comment 26 Randy Perryman 2016-12-02 17:33:36 UTC
Just discovered I have not defined:  SaharaApiNetwork: internal_api
in my network-environment.yaml  

Is there a document that explicitly calls out all endpoints that need to be mapped?

Comment 27 Randy Perryman 2016-12-02 17:54:20 UTC
Can be closed not a bug