Bug 1387590
Summary: | fence_compute - Fixes for fix_plug/domain_name and nova force_down functionality. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Marian Krcmarik <mkrcmari> | ||||
Component: | fence-agents | Assignee: | Andrew Beekhof <abeekhof> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 7.3 | CC: | abeekhof, cchen, cluster-maint, fdinitto, mkrcmari, mori, mschuppe, oalbrigt, snagar, ushkalim, vfarias | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | fence-agents-4.0.11-53.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1393789 1440487 (view as bug list) | Environment: | |||||
Last Closed: | 2017-08-01 16:10:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1393789, 1440487 | ||||||
Attachments: |
|
Description
Marian Krcmarik
2016-10-21 10:40:13 UTC
For every version (2 -> 2.27) I get: {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', u'media-types': [{u'base': u'application/json', u'type': u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line 104, in list return self._list(version_url, "versions") File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in _list data = body[response_key] KeyError: 'versions' when calling nova.versions.list() (In reply to Andrew Beekhof from comment #3) > For every version (2 -> 2.27) I get: > > {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', > u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, > {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': > u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', > u'media-types': [{u'base': u'application/json', u'type': > u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line > 104, in list > return self._list(version_url, "versions") > File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in > _list > data = body[response_key] > KeyError: 'versions' > > when calling nova.versions.list() I am getting the same reply (traceback) from RHOSP10, It works without problems with older releases (even with some older RHOSP10 puddle), Not sure It's a bug and possibly where exactly. The nova server now returns dictionary with one element which key is called version, It used to return dictionary with one element called "versions" which value used to be a list of version dictionaries.So I guess a bug in nova or change of behaviour? (In reply to Marian Krcmarik from comment #4) > (In reply to Andrew Beekhof from comment #3) > > For every version (2 -> 2.27) I get: > > > > {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', > > u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, > > {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': > > u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', > > u'media-types': [{u'base': u'application/json', u'type': > > u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line > > 104, in list > > return self._list(version_url, "versions") > > File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in > > _list > > data = body[response_key] > > KeyError: 'versions' > > > > when calling nova.versions.list() > > I am getting the same reply (traceback) from RHOSP10, It works without > problems with older releases (even with some older RHOSP10 puddle), Not sure > It's a bug and possibly where exactly. > The nova server now returns dictionary with one element which key is called > version, It used to return dictionary with one element called "versions" > which value used to be a list of version dictionaries.So I guess a bug in > nova or change of behaviour? Maybe just let's create a new nova client instance with specified version 2.11 (The first API version where force_down was introduced) which would be only used for calling nova.services.force_down() and If the call of nova.services.force_down raises novaclient.exceptions.NotAcceptable then fence agent would assume force_down is not supported on that version and skip it. (In reply to Marian Krcmarik from comment #4) > (In reply to Andrew Beekhof from comment #3) > > For every version (2 -> 2.27) I get: > > > > {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', > > u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, > > {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': > > u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', > > u'media-types': [{u'base': u'application/json', u'type': > > u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line > > 104, in list > > return self._list(version_url, "versions") > > File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in > > _list > > data = body[response_key] > > KeyError: 'versions' > > > > when calling nova.versions.list() > > I am getting the same reply (traceback) from RHOSP10, It works without > problems with older releases (even with some older RHOSP10 puddle), Not sure > It's a bug and possibly where exactly. > The nova server now returns dictionary with one element which key is called > version, It used to return dictionary with one element called "versions" > which value used to be a list of version dictionaries.So I guess a bug in > nova or change of behaviour? After much investigation, the reason is that this API call requires a session. One can verify this by changing use_session to 'False' in /usr/lib/python2.7/site-packages/novaclient/shell.py: # Do not use Keystone session for cases with no session support. The # presence of auth_plugin means os_auth_system is present and is not # keystone. use_session = True And re-running: nova version-list It is possible to use the versions.list() call if the client is created as: from novaclient import client from novaclient import api_versions from keystoneauth1 import loading from novaclient.shell import OpenStackComputeShell shell = OpenStackComputeShell() parser = shell.get_base_parser([]) (args, args_list) = parser.parse_known_args([]) keystone_session = ( loading.load_session_from_argparse_arguments(args)) keystone_auth = ( loading.load_auth_from_argparse_arguments(args)) nova = client.Client(api_versions.APIVersion("2.0"), 'admin', None, 'admin', 'https://192.168.24.2:13000/v2.0', session=keystone_session, auth=keystone_auth) But that seems like it would be more fragile, not less Created attachment 1219127 [details]
fix
this patch appears to do the trick
It seems based on the testing of the build with included patch that all the problems were solved except for one when nova compute service remains to be marked as down even though compute node is up and running again after fencing. I created a separate bug for that as agreed with Andrew - https://bugzilla.redhat.com/show_bug.cgi?id=1394418 Hi Andrew, The patch in comment #8 seems to have some problem. Please forgive me if I'm wrong. The version I'm using is fence-agents-compute-4.0.11-47.el7_3.2.x86_64. create_nova_connection() function will talk to overcloud nova when fence-nova starts. But fence-nova can not start with the following error. This will be output after "pcs cluster stop --all" and then "pcs cluster start --all" Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Nova connection failed. ConnectionError: ('Connection aborted.', error(111, 'Connection refused')) ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Nova connection failed. ConnectionError: ('Connection aborted.', error(111, 'Connection refused')) ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Couldn't obtain a supported connection to nova, tried: ['2.11', '2'] ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Please use '-h' for usage ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ ] So I think when fence-nova starts, the openstack cluster is not ready to provide nova service or the floating IP is not ready etc. As a result, fence_compute can not talk to nova and fence-nova can not start. I confirmed that after the cluster starts up, the following script has no error output. from novaclient import client versions = [ "2.11", "2" ] for version in versions: nova = client.Client(version,"admin","jFFG4PzWPmqUaTCVc9FEJTWkJ","admin","http://10.0.0.4:5000/v2.0") try: nova.hypervisors.list() print "ok" except Exception as e: print "Nova connection failed. %s: %s" % (e.__class__.__name__, e) I tried pcs stonith cleanup fence-nova but it seems that the whole cluster is cleaned up not only fence-nova. So as a result this can not help to solve the issue. Best Regards, Chen Hi, By Andrew and Chen's support, I commented out fail_usage() line in /sbin/fence_compute, and then succeeded to start fence-nova, with OSP8 and fence-agents-compute-4.0.11-47.el7_3.2.x86_64. # diff -u /sbin/fence_compute.orig /sbin/fence_compute --- /sbin/fence_compute.orig 2017-02-16 14:37:50.256058816 +0900 +++ /sbin/fence_compute 2017-02-16 14:39:24.897601432 +0900 @@ -332,7 +332,7 @@ except Exception as e: logging.warning("Nova connection failed. %s: %s" % (e.__class__.__name__, e)) - fail_usage("Couldn't obtain a supported connection to nova, tried: %s" % repr(versions)) + #fail_usage("Couldn't obtain a supported connection to nova, tried: %s" % repr(versions)) def define_new_opts(): all_opt["endpoint-type"] = { Verified based on Comment #11 fence-agents-4.0.11-51.el7 (In reply to Udi Shkalim from comment #16) > Verified based on Comment #11 > fence-agents-4.0.11-51.el7 Udi, we may need to create an additional test as we didn't notice that the agent breaks when nova isn't up. Moving back to modified :-( This is the patch we want... We still want to know, but it shouldn't be fatal on its own. All uses of nova include a check for it being set first. diff --git a/fence/agents/compute/fence_compute.py b/fence/agents/compute/fence_compute.py index 0a238b6..bc4cb5b 100644 --- a/fence/agents/compute/fence_compute.py +++ b/fence/agents/compute/fence_compute.py @@ -329,7 +329,7 @@ def create_nova_connection(options): except Exception as e: logging.warning("Nova connection failed. %s: %s" % (e.__class__.__name__, e)) - fail_usage("Couldn't obtain a supported connection to nova, tried: %s" % repr(versions)) + logging.warning("Couldn't obtain a supported connection to nova, tried: %s\n" % repr(versions)) def define_new_opts(): all_opt["endpoint-type"] = { (In reply to Andrew Beekhof from comment #18) > This is the patch we want... New build with the new patch. *** Bug 1430393 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1874 |