Bug 1525550 - composable networks VIP is not brought up on nodes (OSP-12)
Summary: composable networks VIP is not brought up on nodes (OSP-12)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z2
: 12.0 (Pike)
Assignee: Bob Fournier
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-13 14:45 UTC by Bob Fournier
Modified: 2018-05-10 20:04 UTC (History)
10 users (show)

Fixed In Version: puppet-tripleo-7.4.8-2.el7ost openstack-tripleo-heat-templates-7.0.9-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1531593 (view as bug list)
Environment:
Last Closed: 2018-03-28 17:27:14 UTC
Target Upstream Version:


Attachments (Terms of Use)
sos from controller (15.86 MB, application/x-xz)
2018-05-03 04:42 UTC, Chris Janiszewski
no flags Details
templates used for deployment (22.69 KB, application/x-gzip)
2018-05-03 04:44 UTC, Chris Janiszewski
no flags Details
sosreport-undercloud-parta (15.00 MB, application/x-xz)
2018-05-03 04:45 UTC, Chris Janiszewski
no flags Details
sosreport-undercloud-partb (10.86 MB, application/octet-stream)
2018-05-03 04:46 UTC, Chris Janiszewski
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 532912 None MERGED Add composable network VIPs for puppet configuration 2020-02-12 18:16:55 UTC
OpenStack gerrit 532913 None MERGED Configure VIPs for all networks including composable networks 2020-02-12 18:16:55 UTC
Red Hat Product Errata RHBA-2018:0607 None None None 2018-03-28 17:28:17 UTC

Description Bob Fournier 2017-12-13 14:45:20 UTC
Description of problem:

When using Ironic in the overcloud in conjunction with a custom network created in network_dsata.yaml, it was found that the VIP was created successfully but was not added to the interface on the node.

This is the VIP that was created for the OcProvisioning network:
(undercloud) [stack@host01 ~]$ openstack port show oc_provisioning_virtual_ip -c fixed_ips
+-----------+----------------------------------------------------------------------------+
| Field     | Value                                                                      |
+-----------+----------------------------------------------------------------------------+
| fixed_ips | ip_address='172.21.2.10', subnet_id='30fac020-2702-41ad-b478-37c3d6d0b580' |
+-----------+----------------------------------------------------------------------------+

On the controller node that uses this network only a single IP associated with the network is brought up, not the VIP.

11: vlan205: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether ee:be:ca:e2:1c:39 brd ff:ff:ff:ff:ff:ff
    inet 172.21.2.18/24 brd 172.21.2.255 scope global vlan205
       valid_lft forever preferred_lft forever
    inet6 fe80::ecbe:caff:fee2:1c39/64 scope link 
       valid_lft forever preferred_lft forever

i.e. VIP 172.21.2.10 is not on this interface

Compare this to a non-custom network which has the VIP, 172.23.3.19 is the VIP
for the StorageMgmt network:
13: vlan2001: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 7a:1a:e3:30:26:a9 brd ff:ff:ff:ff:ff:ff
    inet 172.23.3.18/24 brd 172.23.3.255 scope global vlan2001
       valid_lft forever preferred_lft forever
    inet 172.23.3.19/32 brd 172.23.3.255 scope global vlan2001
       valid_lft forever preferred_lft forever
    inet6 fe80::781a:e3ff:fe30:26a9/64 scope link 
       valid_lft forever preferred_lft forever

This configuration is using haproxy and this is how the VIP, again for StorageMgmt is assigned to interface:

16:47:53 localhost journal: #033[mNotice: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy_bundle/Tripleo::Pacemaker::Haproxy_with_vip[haproxy_and_storage_mgmt_vip]/Pacemaker::Resource::Ip[storage_mgmt_vip]/Pcmk_resource[ip-172.23.3.19]/ensure: created#033[0m
Dec 11 16:47:53 localhost journal: #033[0;32mInfo: Pacemaker::Resource::Ip[storage_mgmt_vip]: Unscheduling all events on Pacemaker::Resource::Ip[storage_mgmt_vip]#033[0m
Dec 11 16:47:53 localhost IPaddr2(ip-172.23.3.19)[78806]: INFO: Adding inet address 172.23.3.19/32 with broadcast address 172.23.3.255 to device vlan2001
Dec 11 16:47:53 localhost IPaddr2(ip-172.23.3.19)[78806]: INFO: Bringing device vlan2001 up

The haproxy code in puppet-triplet only uses the standard isolated networks and does not have a mechanism for custom networks - https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/pacemaker/haproxy.pp#L140


Version-Release number of selected component (if applicable):

puddle 12.0-20171129.1

puppet-tripleo-7.4.3-11.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-17.el7ost.noarch


How reproducible: Every time


Steps to Reproduce:

New network in network_data.yaml
 # custom network for Overcloud provisioning
- name: OcProvisioning 
  name_lower: oc_provisioning 
  vip: true
  ip_subnet: '172.21.2.0/24'
  allocation_pools: [{'start': '172.21.2.10', 'end': '172.21.2.200'}]
  ipv6_subnet: 'fd00:fd00:fd00:7000::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:7000::10', 'end': 'fd00:fd00:fd00:7000:ffff:ffff:ffff:fffe'}]

Its using Vlan 205
 OcProvisioningNetworkVlanID: 205

Its added it for the Controller in roles_data.yaml
  networks:
   <snip>
    - OcProvisioning

Its added to ServiceNetMap:
ServiceNetMap:
     IronicApiNetwork: oc_provisioning # changed from ctlplane
     IronicNetwork: oc_provisioning # changed from ctlplane

After OC deployment the network was created fine and the IP was added to 
the overcloud-controller node:
11: vlan205: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether ee:be:ca:e2:1c:39 brd ff:ff:ff:ff:ff:ff
    inet 172.21.2.18/24 brd 172.21.2.255 scope global vlan205
       valid_lft forever preferred_lft forever
    inet6 fe80::ecbe:caff:fee2:1c39/64 scope link 
       valid_lft forever preferred_lft forever

Actual results:

The VIP, 172.21.2.1 in this case, should be added to the vlan205 interface on the controller, but its not.

11: vlan205: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether ee:be:ca:e2:1c:39 brd ff:ff:ff:ff:ff:ff
    inet 172.21.2.18/24 brd 172.21.2.255 scope global vlan205
       valid_lft forever preferred_lft forever
    inet6 fe80::ecbe:caff:fee2:1c39/64 scope link 
       valid_lft forever preferred_lft forever

Expected results:

VIP added to vlan205 interface on controller.


Additional info:

Comment 1 Bob Fournier 2018-01-05 15:37:57 UTC
Upstream patches are here:
https://review.openstack.org/#/c/531037/
https://review.openstack.org/#/c/531036/

When merged they must be backported to OSP-12.

Comment 5 mlammon 2018-03-14 22:00:10 UTC
Installed latest osp 12 2018-03-10.1

Env:
[stack@undercloud-0 ~]$ rpm -qa | grep puppet-tripleo
puppet-tripleo-7.4.8-4.el7ost.noarch

To verify https://bugzilla.redhat.com/show_bug.cgi?id=1525550

Verified that puppet creates a table for network_virtual_ips on controller:
[heat-admin@controller-0 ~]$ sudo cat /etc/puppet/hieradata/vip_data.json
...
    "network_virtual_ips": {
        "internal_api": {
            "index": 1,
            "ip_address": "172.17.1.12"
        },
        "storage": {
            "index": 2,
            "ip_address": "172.17.3.11"
        },
        "storage_mgmt": {
            "index": 3,
            "ip_address": "172.17.4.15"
        }
    },

Verified that VIP is correctly configured on controller:
[heat-admin@controller-0 ~]$ ip a | grep -B 4 172.17.1.12
9: vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether b2:d1:3c:78:99:eb brd ff:ff:ff:ff:ff:ff
    inet 172.17.1.20/24 brd 172.17.1.255 scope global vlan20
       valid_lft forever preferred_lft forever
    inet 172.17.1.12/32 brd 172.17.1.255 scope global vlan20

Verified rpm exceeds Fixed In Version
(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep puppet-tripleo
puppet-tripleo-7.4.8-4.el7ost.noarch

Comment 8 errata-xmlrpc 2018-03-28 17:27:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0607

Comment 9 Chris Janiszewski 2018-05-03 04:30:04 UTC
I am hitting this issue even though I seems to have required rpms:

(undercloud) [stack@undercloud ~]$ rpm -qa | grep puppet-tripleo
puppet-tripleo-7.4.8-5.el7ost.noarch


sudo cat /etc/puppet/hieradata/vip_data.json
    "network_virtual_ips": {
        "custombm": {
            "index": 4,
            "ip_address": "172.31.10.14"
        },
        "internal_api": {
            "index": 1,
            "ip_address": "172.31.1.14"
        },
        "storage": {
            "index": 2,
            "ip_address": "172.31.3.14"
        },
        "storage_mgmt": {
            "index": 3,
            "ip_address": "172.31.4.14"
        }
    },

pcs status doesn't show that custom vip:

[root@chrisj-controller-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: chrisj-controller-0 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum
Last updated: Thu May  3 04:26:21 2018
Last change: Thu May  3 04:11:20 2018 by root via cibadmin on chrisj-controller-0

4 nodes configured
17 resources configured

Online: [ chrisj-controller-0 ]
GuestOnline: [ galera-bundle-0@chrisj-controller-0 rabbitmq-bundle-0@chrisj-controller-0 redis-bundle-0@chrisj-controller-0 ]

Full list of resources:

 Docker container: rabbitmq-bundle [172.31.0.10:8787/rhosp12/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0    (ocf::heartbeat:rabbitmq-cluster):      Started chrisj-controller-0
 Docker container: galera-bundle [172.31.0.10:8787/rhosp12/openstack-mariadb:pcmklatest]
   galera-bundle-0      (ocf::heartbeat:galera):        Master chrisj-controller-0
 Docker container: redis-bundle [172.31.0.10:8787/rhosp12/openstack-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Master chrisj-controller-0
 ip-172.31.0.40 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.8.20 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.1.15 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.1.14 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.3.14 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.4.14 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 Docker container: haproxy-bundle [172.31.0.10:8787/rhosp12/openstack-haproxy:pcmklatest]
   haproxy-bundle-docker-0      (ocf::heartbeat:docker):        Started chrisj-controller-0
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started chrisj-controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


I am also unable to ping it:
[root@chrisj-controller-0 ~]# ping -c 1 172.31.10.14
PING 172.31.10.14 (172.31.10.14) 56(84) bytes of data.
From 172.31.10.28 icmp_seq=1 Destination Host Unreachable

--- 172.31.10.14 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

The only difference I can see is I have defined the custom VIP in here:
  CustomBMVirtualFixedIPs: [{'ip_address':'172.31.10.14'}]

Any ideas?

Comment 10 Chris Janiszewski 2018-05-03 04:42:43 UTC
Created attachment 1430421 [details]
sos from controller

Comment 11 Chris Janiszewski 2018-05-03 04:44:01 UTC
Created attachment 1430422 [details]
templates used for deployment

Comment 12 Chris Janiszewski 2018-05-03 04:45:55 UTC
Created attachment 1430423 [details]
sosreport-undercloud-parta

Comment 13 Chris Janiszewski 2018-05-03 04:46:31 UTC
Created attachment 1430424 [details]
sosreport-undercloud-partb

Comment 14 Bob Fournier 2018-05-03 14:27:23 UTC
Chris - have you tried it without the  CustomBMVirtualFixedIPs setting?  In our testing this worked fine with the custom network in network_data.yaml and "vip: true" although in that case the vip came from the allocation range.

Looking through the controller sosreport it looks like the VIP was not created:
8: vlan320    inet 172.31.10.28/24 brd 172.31.10.255 scope global vlan320\       valid_lft forever preferred_lft forever
8: vlan320    inet6 fe80::984f:17ff:feb7:eda6/64 scope link \       valid_lft forever preferred_lft forever

resulting in:
containers/nova/nova-compute.log:2018-05-03 04:15:04.530 1 ERROR ironicclient.common.http [req-22a4ade9-d6a7-418a-bdf8-abef5e876740 - - - - -] Error contacting Ironic server: Unable to establish connection to http://172.31.10.14:6385/v1/nodes/detail: HTTPConnectionPool(host='172.31.10.14', port=6385): Max retries exceeded with url: /v1/nodes/detail (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fa44d1f5590>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',)). Attempt 61 of 61: ConnectFailure: Unable to establish connection to http://172.31.10.14:6385/v1/nodes/detail: HTTPConnectionPool(host='172.31.10.14', port=6385): Max retries exceeded with url: /v1/nodes/detail (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fa44d1f5590>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',))

Comment 15 Dan Sneddon 2018-05-03 17:45:12 UTC
(In reply to Chris Janiszewski from comment #9)
> I am hitting this issue even though I seems to have required rpms:

Chris, looking through your network templates, it appears that you are not instantiating the CustomBM network correctly. Looking at network-isolation.yaml, I see the other networks, but not CustomBM:

  OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/external.yaml
  OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml
  OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml
  OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt.yaml
  OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/vip.yaml

You will need to instantiate the CustomBM network, ports, and VIP in the network-isolation.yaml, otherwise the network won't be included in the deployment:

  OS::TripleO::Network::CustomBM: /usr/share/openstack-tripleo-heat-templates/network/custombm.yaml
  OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/custombm.yaml
  OS::TripleO::Controller::Ports::CustomBMPort: /usr/share/openstack-tripleo-heat-templates/network/ports/custombm.yaml

Comment 16 Chris Janiszewski 2018-05-03 20:17:27 UTC
Thanks guys,

I have worked this issue around by just creating vip directly with pcs using:
pcs resource create ip-172.31.10.14 ocf:heartbeat:IPaddr2 \
    ip=172.31.10.14 cidr_netmask=32 nic=vlan320 \
    op monitor interval=30s

I need to demo this environment tomorrow but after that I'll attempt this again with steps provided by Dan in Comment #15. If that doesn't work I'll try Bob's suggestion.

Comment 17 Chris Janiszewski 2018-05-04 13:53:55 UTC
(In reply to Dan Sneddon from comment #15)
> Chris, looking through your network templates, it appears that you are not
> instantiating the CustomBM network correctly. Looking at
> network-isolation.yaml, I see the other networks, but not CustomBM:


I forgot to include my deploy script.

(undercloud) [stack@undercloud ~]$ cat deploy.sh 
#!/bin/bash
source ~/stackrc
cd ~/
time openstack overcloud deploy  --templates --stack chrisj \
  -r /home/stack/templates/roles_data.yaml \
  -n /home/stack/templates/network_data.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml \
  -e /home/stack/templates/network-environment.yaml \
  -e /home/stack/templates/ceph-custom-config.yaml \
  -e /home/stack/templates/enable-tls.yaml \
  -e /home/stack/templates/ExtraConfig.yaml \
  -e /home/stack/templates/inject-trust-anchor.yaml \
  -e /home/stack/templates/inject-trust-anchor-hiera.yaml \
  -e /home/stack/templates/fernet.yaml \
  -e /home/stack/templates/deployment-artifacts.yaml \
  -e /home/stack/templates/docker-registry.yaml \
  -e /home/stack/templates/logging-environment.yaml \
  -e /home/stack/templates/monitoring-environment.yaml \
  -e /home/stack/templates/collectd-environment.yaml

As you can see I am not using the network-isolation.yaml from my templates directory .. but the generic one in /usr/share/openstack-tripleo-heat-templates/environments/ (which btw doesn't exist).
I was hoping the deployment would generate one with the right ports, but it hasn't. Nevertheless the assignment of the VLANs and subnets to physical ports worked properly. Sorry for the confusion of leaving this wrong network-isolation.yaml file in my custom templates.

Comment 18 Dan Sneddon 2018-05-07 21:11:56 UTC
(In reply to Chris Janiszewski from comment #17)

> As you can see I am not using the network-isolation.yaml from my templates
> directory .. but the generic one in
> /usr/share/openstack-tripleo-heat-templates/environments/ (which btw doesn't
> exist).
> I was hoping the deployment would generate one with the right ports, but it
> hasn't. Nevertheless the assignment of the VLANs and subnets to physical
> ports worked properly. Sorry for the confusion of leaving this wrong
> network-isolation.yaml file in my custom templates.

Chris, it appears to me from looking at the SOS reports that a VIP of 172.31.10.14 is getting assigned via the tripleo-heat-templates. I even see some evidence that HAProxy is trying to host a listener on that IP, although the IP doesn't show up in `ip addr`:

sos_commands/process/lsof_-b_M_-n_-l:
haproxy    76700           42454   24u     IPv4             462191       0t0        TCP 172.31.10.14:6385 (LISTEN)

sos_commands/networking/netstat_-W_-neopa:
tcp        0      0 172.31.10.14:6385       0.0.0.0:*               LISTEN      0          462191     76700/haproxy        off (0.00/0/0)


Looking at the output of sos_commands/pacemaker/pcs_status, I see the other VIPs, but not the custom VIP:

 ip-172.31.0.40 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.8.20 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.1.15 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.1.14 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.3.14 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0
 ip-172.31.4.14 (ocf::heartbeat:IPaddr2):       Started chrisj-controller-0

So it looks to me like the pacemaker config is not working correctly for custom networks. There isn't enough data in the sos report to fully diagnose the issue, so I'm going to take your templates and deploy a similar config in a test lab.

Comment 19 Dan Sneddon 2018-05-08 07:59:37 UTC
I could not reproduce these symptoms. I used the same network_data.yaml, and copied the settings from the network-environment.yaml. The VIP deployed correctly for me.

Comment 20 Chris Janiszewski 2018-05-10 20:04:32 UTC
Just to conclude this issue. In order for this functionality to work it's not just puppet-tripleo-7.4.8-4.el7ost.noarch that is required, but also overcloud images need to be updated to latest version. That was a missing piece on my end.

I can also confirm assigned VIP in parameter_defaults also worked. In my case:
  CustomBMVirtualFixedIPs: [{'ip_address':'172.31.10.14'}]

I no longer have this issue with the latest images. Thanks for the help.


Note You need to log in before you can comment on or make changes to this bug.