Bug 1619092 - [InstanceHA] check-run-nova-compute connecting by default to publicURL
Summary: [InstanceHA] check-run-nova-compute connecting by default to publicURL
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z3
: 13.0 (Queens)
Assignee: Michele Baldessari
QA Contact: pkomarov
URL:
Whiteboard:
Depends On: 1623181
Blocks: epmosp13bugs 1637805 1639358
TreeView+ depends on / blocked
 
Reported: 2018-08-20 04:44 UTC by Robin Cernin
Modified: 2022-03-13 15:59 UTC (History)
26 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.4-24.el7ost
Doc Type: Bug Fix
Doc Text:
One of the instance HA scripts connected to the publicURL keystone endpoint. This has now been moved to the internalURL endpoint by default. Additionally, an operator can override this via the '[placement]/valid_interfaces' configuration entry point in nova.conf.
Clone Of:
: 1637805 (view as bug list)
Environment:
Last Closed: 2018-11-13 22:28:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1788584 0 None None None 2018-08-23 09:21:11 UTC
OpenStack gerrit 595903 0 'None' MERGED IHA Default the compute endpoint check script to internal 2020-11-20 11:04:35 UTC
Red Hat Issue Tracker OSP-13692 0 None None None 2022-03-13 15:59:27 UTC
Red Hat Product Errata RHBA-2018:3587 0 None None None 2018-11-13 22:29:36 UTC

Description Robin Cernin 2018-08-20 04:44:51 UTC
Description of problem:

If the architecture design doesn't allow the compute nodes have access to the publicURL, the check-run-nova-compute will fail and mark the container as unhealthy.

https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/extraconfig/tasks/instanceha/check-run-nova-compute#L112-L132

As a workaround it is possible to modify the check-run-nova-compute client.Client to point to internalURL endpoint instead:

if clientargs:
            # OSP < Ocata
            # ArgSpec(args=['version', 'username', 'password', 'project_id', 'auth_url'],
            #         varargs=None,
            #         keywords='kwargs', defaults=(None, None, None, None))
            nova = client.Client(version,
                                 None, # User
                                 None, # Password
                                 None, # Tenant
                                 None, # Auth URL
                                 insecure=options["insecure"],
                                 region_name=options["os_region_name"][0],
                                 session=keystone_session, auth=keystone_auth,
                                 http_log_debug=options.has_key("verbose"),
                                 endpoint_type='internalURL')
        else:
            # OSP >= Ocata
            # ArgSpec(args=['version'], varargs='args', keywords='kwargs', defaults=None)
            nova = client.Client(version,
                                 region_name=options["os_region_name"][0],
                                 session=keystone_session, auth=keystone_auth,
                                 http_log_debug=options.has_key("verbose"),
                                 endpoint_type='internalURL') 

However such configuration is not coming by default, or can't be configured by director. Does it mean that publicURL have to be always reachable on the compute nodes?

Version-Release number of selected component (if applicable):

Queens OSP13

Comment 14 Cody Swanson 2018-09-12 20:49:35 UTC
One thing I've noted with this change is the path to the python executable appears to change between versions.

openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch.rpm

head -n 1 /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/instanceha/check-run-nova-compute
#!/bin/python -utt

openstack-tripleo-heat-templates-8.0.4-24.el7ost.noarch.rpm

head -n 1 /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/instanceha/check-run-nova-compute
#!/usr/bin/python -utt

Not sure if this change is intentional or not, both paths are valid on my RHOS13 undercloud lab system:

[root@undercloud-0 ~]# ls -ld /bin/python /usr/bin/python
lrwxrwxrwx. 1 root root 7 Jul 28 11:37 /bin/python -> python2
lrwxrwxrwx. 1 root root 7 Jul 28 11:37 /usr/bin/python -> python2

I just wanted to highlight this in case it is an unintended regression.

Comment 15 Michele Baldessari 2018-09-12 20:51:57 UTC
(In reply to Cody Swanson from comment #14)
> One thing I've noted with this change is the path to the python executable
> appears to change between versions.
> 
> openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch.rpm
> 
> head -n 1
> /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/instanceha/
> check-run-nova-compute
> #!/bin/python -utt
> 
> openstack-tripleo-heat-templates-8.0.4-24.el7ost.noarch.rpm
> 
> head -n 1
> /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/instanceha/
> check-run-nova-compute
> #!/usr/bin/python -utt
> 
> Not sure if this change is intentional or not, both paths are valid on my
> RHOS13 undercloud lab system:
> 
> [root@undercloud-0 ~]# ls -ld /bin/python /usr/bin/python
> lrwxrwxrwx. 1 root root 7 Jul 28 11:37 /bin/python -> python2
> lrwxrwxrwx. 1 root root 7 Jul 28 11:37 /usr/bin/python -> python2
> 
> I just wanted to highlight this in case it is an unintended regression.

Yeah this is intended (see https://bugzilla.redhat.com/show_bug.cgi?id=1612088): we want /usr/bin/python

Comment 26 Andrew Beekhof 2018-10-03 02:18:58 UTC
> So we fenced it again and this time either did NOT unfence or the logs ran out before the compute came back up.


And the reason we fenced again is that due to the long power-on cycle, the node wasn't ready by the time we tried to connect.  Setting reconnect_interval will also assist with this as it tells the cluster that it should not ry to connect immediately.

Comment 27 Andrew Beekhof 2018-10-08 01:44:24 UTC
Can we get confirmation if the proposed fix was sufficient?

Comment 38 pkomarov 2018-10-14 12:21:34 UTC
Verified,

[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-10-02.1[stack@undercloud-0 ~]$ 


[stack@undercloud-0 ~]$ ansible compute -b -mshell -a'cat /var/lib/nova/instanceha/check-run-nova-compute|grep internalURL'
 [WARNING]: Found both group and host with same name: undercloud

overcloud-novacomputeiha-0 | SUCCESS | rc=0 >>
    nova_endpoint_type = 'internalURL'
    # We default to internalURL but we allow this to be overridden via

overcloud-novacomputeiha-1 | SUCCESS | rc=0 >>
    nova_endpoint_type = 'internalURL'
    # We default to internalURL but we allow this to be overridden via



[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-10-02.1[stack@undercloud-0 ~]$ 
[stack@undercloud-0 ~]$ 

verification as in : 
https://review.openstack.org/#/c/595903/

[stack@undercloud-0 ~]$ . stackrc 
(undercloud) [stack@undercloud-0 ~]$ openstack endpoint list |grep comput
| 1cf4cdfd4f1f4fe59c556283db92a964 | regionOne | nova             | compute                 | True    | internal  | http://192.168.24.1:8774/v2.1                  |
| a5ddeeeb70674d91b200fa407425bae2 | regionOne | nova             | compute                 | True    | admin     | http://192.168.24.1:8774/v2.1                  |
| e263d4f49c324a009fd0ba3822ce3f94 | regionOne | nova             | compute                 | True    | public    | http://192.168.24.1:8774/v2.1                  |
(undercloud) [stack@undercloud-0 ~]$ . overcloudrc 
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list |grep comput
| 1d1681389ac54c448ae08dfec30c2125 | regionOne | nova         | compute        | True    | public    | http://10.0.0.110:8774/v2.1                   |
| 5cc9634f850b4089b5dc2603e93e1eda | regionOne | nova         | compute        | True    | internal  | http://172.17.1.10:8774/v2.1                  |
| a4b33f4b4ae94d2ca6faba941d5e7024 | regionOne | nova         | compute        | True    | admin     | http://172.17.1.10:8774/v2.1                  |
(overcloud) [stack@undercloud-0 ~]$ openstack endpoint list |grep comput|grep internal|sed 's@.*//@@g'|sed 's@:8774.*@@g'
172.17.1.10
(overcloud) [stack@undercloud-0 ~]$ export internal_api_ip=`openstack endpoint list |grep comput|grep internal|sed 's@.*//@@g'|sed 's@:8774.*@@g'`
echo $(overcloud) [stack@undercloud-0 ~]$ echo $internal_api_ip
172.17.1.10
(overcloud) [stack@undercloud-0 ~]$ ansible compute -b -mshell -a"tcpdump -c 10 -i any -nn host $internal_api_ip and port 8774"
 [WARNING]: Found both group and host with same name: undercloud

overcloud-novacomputeiha-0 | SUCCESS | rc=0 >>
12:16:37.185523 ethertype IPv4, IP 172.17.1.10.8774 > 172.17.1.17.57724: Flags [F.], seq 566107819, ack 4214161515, win 243, options [nop,nop,TS val 6267312 ecr 4294735945], length 0
12:16:37.185523 IP 172.17.1.10.8774 > 172.17.1.17.57724: Flags [F.], seq 0, ack 1, win 243, options [nop,nop,TS val 6267312 ecr 4294735945], length 0
12:16:37.197940 IP 172.17.1.17.57724 > 172.17.1.10.8774: Flags [F.], seq 1, ack 1, win 259, options [nop,nop,TS val 4294745915 ecr 6267312], length 0
12:16:37.198887 IP 172.17.1.17.57728 > 172.17.1.10.8774: Flags [S], seq 3810977952, win 29200, options [mss 1460,sackOK,TS val 4294745916 ecr 0,nop,wscale 7], length 0
12:16:37.200734 ethertype IPv4, IP 172.17.1.10.8774 > 172.17.1.17.57724: Flags [.], ack 2, win 243, options [nop,nop,TS val 6267327 ecr 4294745915], length 0
12:16:37.200757 ethertype IPv4, IP 172.17.1.10.8774 > 172.17.1.17.57728: Flags [S.], seq 3928995206, ack 3810977953, win 28960, options [mss 1460,sackOK,TS val 6267327 ecr 4294745916,nop,wscale 7], length 0
12:16:37.200734 IP 172.17.1.10.8774 > 172.17.1.17.57724: Flags [.], ack 2, win 243, options [nop,nop,TS val 6267327 ecr 4294745915], length 0
12:16:37.200757 IP 172.17.1.10.8774 > 172.17.1.17.57728: Flags [S.], seq 3928995206, ack 3810977953, win 28960, options [mss 1460,sackOK,TS val 6267327 ecr 4294745916,nop,wscale 7], length 0
12:16:37.200847 IP 172.17.1.17.57728 > 172.17.1.10.8774: Flags [.], ack 1, win 229, options [nop,nop,TS val 4294745918 ecr 6267327], length 0
12:16:37.201044 IP 172.17.1.17.57728 > 172.17.1.10.8774: Flags [P.], seq 1:471, ack 1, win 229, options [nop,nop,TS val 4294745918 ecr 6267327], length 470tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
10 packets captured
10 packets received by filter
0 packets dropped by kernel

Comment 43 errata-xmlrpc 2018-11-13 22:28:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3587


Note You need to log in before you can comment on or make changes to this bug.