Description of problem: see upstream bug.
Adding link to review.
Hi, after upgrade we are bitten by the deprecation of hdp. This is fixed in the template there https://bugs.launchpad.net/tripleo/+bug/1611107. Need to be fixed during the upgrade as well, I think. Traceback for reference: 2016-08-19 17:11:18.916 9258 ERROR sahara Traceback (most recent call last): 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/bin/sahara-api", line 10, in <module> 2016-08-19 17:11:18.916 9258 ERROR sahara sys.exit(main()) 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/lib/python2.7/site-packages/sahara/cli/sahara_api.py", line 53, in main 2016-08-19 17:11:18.916 9258 ERROR sahara app = setup_api() 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/lib/python2.7/site-packages/sahara/cli/sahara_api.py", line 43, in setup_api 2016-08-19 17:11:18.916 9258 ERROR sahara server.setup_common(possible_topdir, 'API') 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/lib/python2.7/site-packages/sahara/main.py", line 84, in setup_common 2016-08-19 17:11:18.916 9258 ERROR sahara plugins_base.setup_plugins() 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/lib/python2.7/site-packages/sahara/plugins/base.py", line 163, in setup_plugins 2016-08-19 17:11:18.916 9258 ERROR sahara PLUGINS = PluginManager() 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/lib/python2.7/site-packages/sahara/plugins/base.py", line 85, in __init__ 2016-08-19 17:11:18.916 9258 ERROR sahara self._load_cluster_plugins() 2016-08-19 17:11:18.916 9258 ERROR sahara File "/usr/lib/python2.7/site-packages/sahara/plugins/base.py", line 111, in _load_cluster_plugins 2016-08-19 17:11:18.916 9258 ERROR sahara ", ".join(requested_plugins - loaded_plugins)) 2016-08-19 17:11:18.916 9258 ERROR sahara ConfigurationError: Plugins couldn't be loaded: hdp 2016-08-19 17:11:18.916 9258 ERROR sahara Error ID: a4a4e95a-6384-4af5-95
Even if it is fixed in the template, you still have problems. The template value, if I'm not mistaken, is applied when Sahara is managed by TripleO, which means that you need to pass something like -e environments/services/sahara.yaml BUT when you don't want Sahara anymore after the upgrade, the packages are not going to be removed, but upgraded to the latest version, which tries to run with the old configuration => BOOM. It's the bug described here. So we probably need to also remove Sahara packages and configuration by default on upgrade if it's not needed (i.e. sahara.yaml is not specified). Or re-enabled it by default. Or...?
I was trying the suggested w/a by Tosky (1) added -e sahara.yaml : [stack@undercloud72 ~]$ sudo find / -name sahara.yaml /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml [stack@undercloud72 ~]$ openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml (2) Applied that patch : https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0a3cd4dec3dcae5f8bc94e73436c2c76069762f1 Still got: deploy_status_code : Deployment exited with non-zero status code: 1 Stack overcloud UPDATE_FAILED Heat Stack update failed.
(In reply to Omri Hochman from comment #5) > I was trying the suggested w/a by Tosky It looks that sahara service is running after applying the patch , gnocchi is down. 2016-09-22 14:10:57.320 19813 ERROR gnocchi.cli DBError: (pymysql.err.InternalError) (1054, u"Unknown column 'metric.unit' in 'field list'") [SQL: u'SELECT metric.id AS metric_id, metric.archive_policy_name AS metric_archive_policy_name, metric.created_by_user_id AS metric_created_by_user_id, metric.created_by_project_id AS metric_created_by_project_id, metric.resource_id AS metric_resource_id, metric.name AS metric_name, metric.unit AS metric_unit, metric.status AS metric_status, archive_policy_1.name AS archive_policy_1_name, archive_policy_1.back_window AS archive_policy_1_back_window, archive_policy_1.definition AS archive_policy_1_definition, archive_policy_1.aggregation_methods AS archive_policy_1_aggregation_methods \nFROM metric LEFT OUTER JOIN archive_policy AS archive_policy_1 ON archive_policy_1.name = metric.archive_policy_name \nWHERE metric.status = %s ORDER BY metric.id ASC'] [parameters: ('delete',)] 2016-09-22 14:10:57.320 19813 ERROR gnocchi.cli
Ormi, the gnocchi bug is filled and fixed there https://bugzilla.redhat.com/show_bug.cgi?id=1378497
Deployed RHOS 9 latest Upgraded to RHOS 10 with latest puddle (2016-11-14.1) I no longer see this issue. [stack@undercloud-0 ~]$ ssh heat-admin.2.10 Last login: Tue Nov 15 19:04:39 2016 from gateway [heat-admin@controller-0 ~]$ sudo -i [root@controller-0 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Tue Nov 15 19:08:40 2016 Last change: Tue Nov 15 01:10:37 2016 by root via crm_resource on controller-0 3 nodes and 19 resources configured Online: [ controller-0 controller-1 controller-2 ] Full list of resources: ip-fd00.fd00.fd00.4000..10 (ocf::heartbeat:IPaddr2): Started controller-0 ip-192.0.2.6 (ocf::heartbeat:IPaddr2): Started controller-1 Clone Set: haproxy-clone [haproxy] Started: [ controller-0 controller-1 controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ controller-0 controller-1 controller-2 ] ip-2620.52.0.13b8.5054.ff.fe3e.1 (ocf::heartbeat:IPaddr2): Started controller-2 Clone Set: rabbitmq-clone [rabbitmq] Started: [ controller-0 controller-1 controller-2 ] Master/Slave Set: redis-master [redis] Masters: [ controller-0 ] Slaves: [ controller-1 controller-2 ] ip-fd00.fd00.fd00.3000..10 (ocf::heartbeat:IPaddr2): Started controller-0 ip-fd00.fd00.fd00.2000..10 (ocf::heartbeat:IPaddr2): Started controller-1 ip-fd00.fd00.fd00.2000..11 (ocf::heartbeat:IPaddr2): Started controller-2 openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Apart from pcs, do you see the service running (iirc it should out of pacemaker now)? Can you contact it with something simple like: openstack dataprocessing plugin list ?
After checking the upgraded environment, we found that Sahara services are indeed running on all controllers (directly with systemctl, no more pacemaker), and also Sahara answers to CLI commands (starting from `openstack dataprocessing plugin list`).
See above comments from Luigi. We confirmed this on 2016-11-16.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html