Bug 1387590
| Summary: | fence_compute - Fixes for fix_plug/domain_name and nova force_down functionality. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Marian Krcmarik <mkrcmari> | ||||
| Component: | fence-agents | Assignee: | Andrew Beekhof <abeekhof> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.3 | CC: | abeekhof, cchen, cluster-maint, fdinitto, mkrcmari, mori, mschuppe, oalbrigt, snagar, ushkalim, vfarias | ||||
| Target Milestone: | rc | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | fence-agents-4.0.11-53.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1393789 1440487 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-08-01 16:10:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1393789, 1440487 | ||||||
| Attachments: |
|
||||||
For every version (2 -> 2.27) I get:
{u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', u'media-types': [{u'base': u'application/json', u'type': u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line 104, in list
return self._list(version_url, "versions")
File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in _list
data = body[response_key]
KeyError: 'versions'
when calling nova.versions.list()
(In reply to Andrew Beekhof from comment #3) > For every version (2 -> 2.27) I get: > > {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', > u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, > {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': > u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', > u'media-types': [{u'base': u'application/json', u'type': > u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line > 104, in list > return self._list(version_url, "versions") > File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in > _list > data = body[response_key] > KeyError: 'versions' > > when calling nova.versions.list() I am getting the same reply (traceback) from RHOSP10, It works without problems with older releases (even with some older RHOSP10 puddle), Not sure It's a bug and possibly where exactly. The nova server now returns dictionary with one element which key is called version, It used to return dictionary with one element called "versions" which value used to be a list of version dictionaries.So I guess a bug in nova or change of behaviour? (In reply to Marian Krcmarik from comment #4) > (In reply to Andrew Beekhof from comment #3) > > For every version (2 -> 2.27) I get: > > > > {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', > > u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, > > {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': > > u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', > > u'media-types': [{u'base': u'application/json', u'type': > > u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line > > 104, in list > > return self._list(version_url, "versions") > > File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in > > _list > > data = body[response_key] > > KeyError: 'versions' > > > > when calling nova.versions.list() > > I am getting the same reply (traceback) from RHOSP10, It works without > problems with older releases (even with some older RHOSP10 puddle), Not sure > It's a bug and possibly where exactly. > The nova server now returns dictionary with one element which key is called > version, It used to return dictionary with one element called "versions" > which value used to be a list of version dictionaries.So I guess a bug in > nova or change of behaviour? Maybe just let's create a new nova client instance with specified version 2.11 (The first API version where force_down was introduced) which would be only used for calling nova.services.force_down() and If the call of nova.services.force_down raises novaclient.exceptions.NotAcceptable then fence agent would assume force_down is not supported on that version and skip it. (In reply to Marian Krcmarik from comment #4) > (In reply to Andrew Beekhof from comment #3) > > For every version (2 -> 2.27) I get: > > > > {u'version': {u'status': u'CURRENT', u'updated': u'2013-07-23T11:33:21Z', > > u'links': [{u'href': u'https://192.168.24.2:13774/v2.1/', u'rel': u'self'}, > > {u'href': u'http://docs.openstack.org/', u'type': u'text/html', u'rel': > > u'describedby'}], u'min_version': u'2.1', u'version': u'2.38', > > u'media-types': [{u'base': u'application/json', u'type': > > u'application/vnd.openstack.compute+json;version=2.1'}], u'id': u'v2.1'}} > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line > > 104, in list > > return self._list(version_url, "versions") > > File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 255, in > > _list > > data = body[response_key] > > KeyError: 'versions' > > > > when calling nova.versions.list() > > I am getting the same reply (traceback) from RHOSP10, It works without > problems with older releases (even with some older RHOSP10 puddle), Not sure > It's a bug and possibly where exactly. > The nova server now returns dictionary with one element which key is called > version, It used to return dictionary with one element called "versions" > which value used to be a list of version dictionaries.So I guess a bug in > nova or change of behaviour? After much investigation, the reason is that this API call requires a session. One can verify this by changing use_session to 'False' in /usr/lib/python2.7/site-packages/novaclient/shell.py: # Do not use Keystone session for cases with no session support. The # presence of auth_plugin means os_auth_system is present and is not # keystone. use_session = True And re-running: nova version-list It is possible to use the versions.list() call if the client is created as:
from novaclient import client
from novaclient import api_versions
from keystoneauth1 import loading
from novaclient.shell import OpenStackComputeShell
shell = OpenStackComputeShell()
parser = shell.get_base_parser([])
(args, args_list) = parser.parse_known_args([])
keystone_session = ( loading.load_session_from_argparse_arguments(args))
keystone_auth = ( loading.load_auth_from_argparse_arguments(args))
nova = client.Client(api_versions.APIVersion("2.0"), 'admin', None, 'admin', 'https://192.168.24.2:13000/v2.0', session=keystone_session, auth=keystone_auth)
But that seems like it would be more fragile, not less
Created attachment 1219127 [details]
fix
this patch appears to do the trick
It seems based on the testing of the build with included patch that all the problems were solved except for one when nova compute service remains to be marked as down even though compute node is up and running again after fencing. I created a separate bug for that as agreed with Andrew - https://bugzilla.redhat.com/show_bug.cgi?id=1394418 Hi Andrew, The patch in comment #8 seems to have some problem. Please forgive me if I'm wrong. The version I'm using is fence-agents-compute-4.0.11-47.el7_3.2.x86_64. create_nova_connection() function will talk to overcloud nova when fence-nova starts. But fence-nova can not start with the following error. This will be output after "pcs cluster stop --all" and then "pcs cluster start --all" Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Nova connection failed. ConnectionError: ('Connection aborted.', error(111, 'Connection refused')) ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Nova connection failed. ConnectionError: ('Connection aborted.', error(111, 'Connection refused')) ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Couldn't obtain a supported connection to nova, tried: ['2.11', '2'] ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ Please use '-h' for usage ] Feb 16 02:14:17 [12351] overcloud-controller-0.localdomain stonith-ng: warning: log_action: fence_compute[13428] stderr: [ ] So I think when fence-nova starts, the openstack cluster is not ready to provide nova service or the floating IP is not ready etc. As a result, fence_compute can not talk to nova and fence-nova can not start. I confirmed that after the cluster starts up, the following script has no error output. from novaclient import client versions = [ "2.11", "2" ] for version in versions: nova = client.Client(version,"admin","jFFG4PzWPmqUaTCVc9FEJTWkJ","admin","http://10.0.0.4:5000/v2.0") try: nova.hypervisors.list() print "ok" except Exception as e: print "Nova connection failed. %s: %s" % (e.__class__.__name__, e) I tried pcs stonith cleanup fence-nova but it seems that the whole cluster is cleaned up not only fence-nova. So as a result this can not help to solve the issue. Best Regards, Chen Hi,
By Andrew and Chen's support, I commented out fail_usage() line in /sbin/fence_compute, and then succeeded to start fence-nova, with OSP8 and fence-agents-compute-4.0.11-47.el7_3.2.x86_64.
# diff -u /sbin/fence_compute.orig /sbin/fence_compute
--- /sbin/fence_compute.orig 2017-02-16 14:37:50.256058816 +0900
+++ /sbin/fence_compute 2017-02-16 14:39:24.897601432 +0900
@@ -332,7 +332,7 @@
except Exception as e:
logging.warning("Nova connection failed. %s: %s" % (e.__class__.__name__, e))
- fail_usage("Couldn't obtain a supported connection to nova, tried: %s" % repr(versions))
+ #fail_usage("Couldn't obtain a supported connection to nova, tried: %s" % repr(versions))
def define_new_opts():
all_opt["endpoint-type"] = {
Verified based on Comment #11 fence-agents-4.0.11-51.el7 (In reply to Udi Shkalim from comment #16) > Verified based on Comment #11 > fence-agents-4.0.11-51.el7 Udi, we may need to create an additional test as we didn't notice that the agent breaks when nova isn't up. Moving back to modified :-( This is the patch we want...
We still want to know, but it shouldn't be fatal on its own.
All uses of nova include a check for it being set first.
diff --git a/fence/agents/compute/fence_compute.py b/fence/agents/compute/fence_compute.py
index 0a238b6..bc4cb5b 100644
--- a/fence/agents/compute/fence_compute.py
+++ b/fence/agents/compute/fence_compute.py
@@ -329,7 +329,7 @@ def create_nova_connection(options):
except Exception as e:
logging.warning("Nova connection failed. %s: %s" % (e.__class__.__name__, e))
- fail_usage("Couldn't obtain a supported connection to nova, tried: %s" % repr(versions))
+ logging.warning("Couldn't obtain a supported connection to nova, tried: %s\n" % repr(versions))
def define_new_opts():
all_opt["endpoint-type"] = {
(In reply to Andrew Beekhof from comment #18) > This is the patch we want... New build with the new patch. *** Bug 1430393 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1874 |
Description of problem: I have noticed some problems while playing with Instance HA on RHOSP10 with RHEL7.3, some of them prevent the agent to function corretly, some of them are good to have to improve funcionality. The problems which may prevent agent to function well: 1. There is one condition in fix_plug_name(options) which seems to be always true: elif options["--plug"].find(options["--domain"]): find() method returns Index If found otherwise -1, so the correction could be elif options["--domain"] in options["--plug"]: or elif options["--plug"].find(options["--domain"]) > -1: 2. There is a calling of fix_plug_name(options) in main() method, this fix_plug_name(options) method calls fix_domain() method which tries to get hypervisor list by calling nova client. The thing is that nova client instance of object is created after fix_plug_name is called, so fix_plug_name() should be called after creating the nova client or nova client should be created earlier. 3. There is a condition in set_power_status(_, options) method which indicates when the status of node should be set to "on": if options["--action"] == "on": if get_power_status(_, options) == "on": I believe the second condition should have opposite logic: if get_power_status(_, options) != "on": Because we want to set compute to on status when it's not actually on. The problems to improve force_down functionality: 1. One of the main problems is that once nova-compute service is marked as force_down It won't switch by itself after a reboot to "up" status, The service must be unset from force_down status again otherwise even though nova-compute service is running and functioning well The status of service will be down and nova scheduler wont use the compute for booting VMs. It would be nice to place this step probably in nova-compute-wait? 2. Nova APi microversioning is kinda strange to me but If we want to have successful result from calling: nova.services.force_down(options["--plug"], "nova-compute", force_down=False), We need to specify nova version at nova client creation with microversion 2.11 and higher, otherwise we get VersionNotFoundForAPIMethod exception. I did not find any way how nova client could fallback in the case we use higher version of nova api at client creation than server supports, in this case we would get an exception NotAcceptable and range if supported nova api versions. The only way I am able to come up with is to query for supported Microversions of server. Something like: def get_max_api_version(): max_version = None nova = nova_client.Client('2', options["--username"], options["--password"], options["--tenant-name"], options["--auth-url"], insecure=options["--insecure"], region_name=options["--region-name"], endpoint_type=options["--endpoint-type"]) versions = nova.versions.list() for version in versions: if version.status == "CURRENT": max_version = version.version if max_version: return max_version else: return "2" And then use the output of such method as Version for nova client instance used for quering openstack in fence_compute. The method would return 2.3 for RHOS7 (so force_down not supported), 2.12 for RHOS8 (force_down supported), 2.27 for RHOS9 ... Version-Release number of selected component (if applicable): fence-agents-compute-4.0.11-47.el7.x86_64 How reproducible: Always Additional info: If needed we can break the valid problems into separated bugs and use this one as tracker, up to assignee