Bug 1400308

Summary: VIP on the storage management network gets created even though the controllers are not connected to this network
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: Emilien Macchi <emacchi>
Status: CLOSED NOTABUG QA Contact: Omri Hochman <ohochman>
Severity: medium Docs Contact:
Priority: medium    
Version: 11.0 (Ocata)CC: amaumene, bnemec, dbecker, emacchi, mburns, mcornea, morazi, rhel-osp-director-maint
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-19 21:13:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marius Cornea 2016-11-30 21:12:08 UTC
Description of problem:

VIP on the storage management network gets created even though the controllers are not connected to this network so the pcs resource fails to start.

I'm doing a deployment with composable roles where the controllers are not connected to the storage managemnt network and the ports on this network are nooped:

  ## Disable unused network from the preexisting Controller role ##
  OS::TripleO::Controller::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  StorageMgmtNetCidr: 10.0.1.0/25
  StorageMgmtAllocationPools: [{'start': '10.0.1.10', 'end': '10.0.1.100'}]
  StorageMgmtNetworkVlanID: 301


Nevertheless on the controller nodes I can see a pcs resource corresponding to this network which obviously couldn't start:

 ip-10.0.1.14	(ocf::heartbeat:IPaddr2):	Stopped
Failed Actions:
* ip-10.0.1.14_start_0 on poc-controller-2 'unknown error' (1): call=39, status=complete, exitreason='Unable to find nic or netmask.',
    last-rc-change='Wed Nov 30 19:40:04 2016', queued=0ms, exec=38ms
* ip-10.0.1.14_start_0 on poc-controller-0 'unknown error' (1): call=39, status=complete, exitreason='Unable to find nic or netmask.',
    last-rc-change='Wed Nov 30 19:40:03 2016', queued=0ms, exec=35ms
* ip-10.0.1.14_start_0 on poc-controller-1 'unknown error' (1): call=35, status=complete, exitreason='Unable to find nic or netmask.',
    last-rc-change='Wed Nov 30 19:40:04 2016', queued=0ms, exec=39ms


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with controllers not connected to the storage management network
2. Check PCS status

Actual results:
There is a VIP created on that network which is stopped.

Expected results:
There should be no VIP resource created as the controllers ports on the storage manegemnt network were specifically nooped.

Additional info:
Below are the controller role and the network environment file that I used:

- name: Controller
  CountDefault: 1
  ServicesDefault:
    - OS::TripleO::Services::CACerts
    - OS::TripleO::Services::CephClient
    - OS::TripleO::Services::CinderVolume
    - OS::TripleO::Services::Core
    - OS::TripleO::Services::Kernel
    - OS::TripleO::Services::MySQL
    - OS::TripleO::Services::RabbitMQ
    - OS::TripleO::Services::HAproxy
    - OS::TripleO::Services::Memcached
    - OS::TripleO::Services::Pacemaker
    - OS::TripleO::Services::Redis
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::Snmp
    - OS::TripleO::Services::Timezone
    - OS::TripleO::Services::TripleoPackages
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::VipHosts


resource_registry:
  ## Preexisting roles nic configs ##
  OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/compute.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/controller.yaml

  ## Disable unused network from the preexisting Controller role ##
  OS::TripleO::Controller::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Controller::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml

  ##  ServiceAPI role nic configs and enabled networks ##
  OS::TripleO::ServiceAPI::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/serviceapi.yaml
  OS::TripleO::ServiceAPI::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management.yaml
  OS::TripleO::ServiceAPI::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml
  OS::TripleO::ServiceAPI::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml

  ## Networker role nic configs and enabled networks ##
  OS::TripleO::Networker::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/networker.yaml
  OS::TripleO::Networker::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/tenant.yaml
  OS::TripleO::Networker::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management.yaml
  OS::TripleO::Networker::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml

  ## Telemetry role nic configs and enabled networks ##
  OS::TripleO::Telemetry::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/telemetry.yaml
  OS::TripleO::Telemetry::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management.yaml
  OS::TripleO::Telemetry::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml

  ## ComputeNFV role nic configs and enabled networks ##
  OS::TripleO::ComputeNFV::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/computenfv.yaml
  OS::TripleO::ComputeNFV::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/tenant.yaml
  OS::TripleO::ComputeNFV::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management.yaml
  OS::TripleO::ComputeNFV::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml
  OS::TripleO::ComputeNFV::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml

  ## CephMON role nic configs and enabled networks ##
  OS::TripleO::CephMON::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/cephmon.yaml
  OS::TripleO::CephMON::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management.yaml
  OS::TripleO::CephMON::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml

  ## CephOSD role nic configs and enabled networks ##
  OS::TripleO::CephOSD::Net::SoftwareConfig: /home/stack/openstack_deployment/nic-configs/cephosd.yaml
  OS::TripleO::CephOSD::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management.yaml
  OS::TripleO::CephOSD::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml
  OS::TripleO::CephOSD::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt.yaml


parameter_defaults:
  InternalApiNetCidr: 10.0.0.0/25
  InternalApiAllocationPools: [{'start': '10.0.0.10', 'end': '10.0.0.100'}]
  InternalApiNetworkVlanID: 200

  StorageNetCidr: 10.0.0.128/25
  StorageAllocationPools: [{'start': '10.0.0.138', 'end': '10.0.0.200'}]
  StorageNetworkVlanID: 300

  StorageMgmtNetCidr: 10.0.1.0/25
  StorageMgmtAllocationPools: [{'start': '10.0.1.10', 'end': '10.0.1.100'}]
  StorageMgmtNetworkVlanID: 301

  ExternalNetCidr: 172.16.18.0/25
  ExternalAllocationPools: [{'start': '172.16.18.25', 'end': '172.16.18.100'}]
  ExternalInterfaceDefaultRoute: 172.16.18.126
  ExternalNetworkVlanID: 100

  TenantNetCidr: 10.0.1.128/25
  TenantAllocationPools: [{'start': '10.0.1.138', 'end': '10.0.1.200'}]

  ManagementNetCidr: 172.16.17.128/25
  ManagementAllocationPools: [{'start': '172.16.17.181', 'end': '172.16.17.230'}]
  ManagementInterfaceDefaultRoute: 172.16.17.254

  ControlPlaneSubnetCidr: "25"
  ControlPlaneDefaultRoute: 192.168.0.1

  EC2MetadataIp: 192.168.0.1
  DnsServers: ["172.16.17.254","172.16.17.254"]
  NtpServer: ["clock.redhat.com","clock.redhat.com"]

nic config:
[stack@undercloud-0 ~]$ cat openstack_deployment/nic-configs/controller.yaml 
heat_template_version: 2015-04-30

parameters:
  ControlPlaneIp:
    default: ''
    description: IP address/subnet on the ctlplane network
    type: string
  ExternalIpSubnet:
    default: ''
    description: IP address/subnet on the external network
    type: string
  InternalApiIpSubnet:
    default: ''
    description: IP address/subnet on the internal API network
    type: string
  StorageIpSubnet:
    default: ''
    description: IP address/subnet on the storage network
    type: string
  StorageMgmtIpSubnet:
    default: ''
    description: IP address/subnet on the storage mgmt network
    type: string
  TenantIpSubnet:
    default: ''
    description: IP address/subnet on the tenant network
    type: string
  ManagementIpSubnet: # Only populated when including environments/network-management.yaml
    default: ''
    description: IP address/subnet on the management network
    type: string
  ExternalNetworkVlanID:
    default: 10
    description: Vlan ID for the external network traffic.
    type: number
  InternalApiNetworkVlanID:
    default: 20
    description: Vlan ID for the internal_api network traffic.
    type: number
  StorageNetworkVlanID:
    default: 30
    description: Vlan ID for the storage network traffic.
    type: number
  StorageMgmtNetworkVlanID:
    default: 40
    description: Vlan ID for the storage mgmt network traffic.
    type: number
  TenantNetworkVlanID:
    default: 50
    description: Vlan ID for the tenant network traffic.
    type: number
  ManagementNetworkVlanID:
    default: 60
    description: Vlan ID for the management network traffic.
    type: number
  ManagementInterfaceDefaultRoute:
    default: '10.0.1.1'
    description: default route for the external network
    type: string
  ExternalInterfaceDefaultRoute:
    default: '10.0.0.1'
    description: default route for the external network
    type: string
  ControlPlaneSubnetCidr: # Override this via parameter_defaults
    default: '24'
    description: The subnet CIDR of the control plane network.
    type: string
  ControlPlaneDefaultRoute: # Override this via parameter_defaults
    description: The default route of the control plane network.
    type: string
  DnsServers: # Override this via parameter_defaults
    default: []
    description: A list of DNS servers (2 max for some implementations) that will be added to resolv.conf.
    type: comma_delimited_list
  EC2MetadataIp: # Override this via parameter_defaults
    description: The IP address of the EC2 metadata server.
    type: string

resources:
  OsNetConfigImpl:
    properties:
      config:
        os_net_config:
          network_config:
            -
              type: interface
              name: nic1
              use_dhcp: false
              dns_servers: {get_param: DnsServers}
              addresses:
                -
                  ip_netmask:
                    list_join:
                      - '/'
                      - - {get_param: ControlPlaneIp}
                        - {get_param: ControlPlaneSubnetCidr}
              routes:
                -
                  ip_netmask: 169.254.169.254/32
                  next_hop: {get_param: EC2MetadataIp}
            -
              type: ovs_bridge
              name: {get_input: bridge_name}
              use_dhcp: false
              members:
                -
                  type: interface
                  name: nic2
                  primary: true
                -
                  type: vlan
                  vlan_id: {get_param: ExternalNetworkVlanID}
                  dns_servers: {get_param: DnsServers}
                  addresses:
                  -
                    ip_netmask: {get_param: ExternalIpSubnet}
                  routes:
                    -
                      default: true
                      next_hop: {get_param: ExternalInterfaceDefaultRoute}

            -
              type: ovs_bridge
              name: br-infra
              use_dhcp: false
              members:
                -
                  type: interface
                  name: nic3
                  use_dhcp: false
                  primary: true
                -
                  type: vlan
                  vlan_id: {get_param: InternalApiNetworkVlanID}
                  addresses:
                    -
                      ip_netmask: {get_param: InternalApiIpSubnet}

            -
              type: ovs_bridge
              name: br-storage
              use_dhcp: false
              members:
                -
                  type: interface
                  name: nic4
                  use_dhcp: false
                  primary: true
                -
                  type: vlan
                  vlan_id: {get_param: StorageNetworkVlanID}
                  addresses:
                    -
                      ip_netmask: {get_param: StorageIpSubnet}

          # Uncomment when including environments/network-management.yaml
            -
              type: interface
              name: nic5
              use_dhcp: false
              use_dhcpv6: false
              addresses:
                -
                  ip_netmask: {get_param: ManagementIpSubnet}

      group: os-apply-config
    type: OS::Heat::StructuredConfig

outputs:
  OS::stack_id:
    description: The OsNetConfigImpl resource.
    value: {get_resource: OsNetConfigImpl}

Comment 1 Emilien Macchi 2016-12-12 17:17:40 UTC
Marius, could you please report the bug in launchpad/tripleo and link it to here?
It would help to dispatch the bug upstream.

Thanks

Comment 2 Marius Cornea 2016-12-13 09:04:56 UTC
(In reply to Emilien Macchi from comment #1)
> Marius, could you please report the bug in launchpad/tripleo and link it to
> here?
> It would help to dispatch the bug upstream.
> 
> Thanks

Sure, I added it.

Comment 4 Ben Nemec 2017-07-21 17:00:01 UTC
I've been looking into this, and my investigation has raised a few questions:

1) VIPs are managed by the keepalived service, which is not included in the controller role on this bug.  Is that a problem?

2) Does it make sense to split VIPs across multiple roles?  In particular since we only have examples of keepalived running on a single role, I'm unsure whether it makes sense to have a network enabled but not attached to the role where VIPs are being managed.  This is assuming the storage_mgmt network is still enabled in this deployment, of course.  If not, then the VIP port in network-isolation.yaml also needs to be noop'd, which I believe would fix this as well.

I've added pidone as a secondary DFG since VIPs are pacemaker managed and we might need their input on what is allowed here.

Comment 5 Alexandre Maumené 2018-01-18 01:45:23 UTC
Hi all,

This quite an old bug and it seems everybody forgot about it. Anyway, when you disable the VIP like that:
OS::TripleO::Network::Ports::StorageMgmtVipPort: ../my_templates/network/ports/noop.yaml

and of course the StorageMgmt port like that:
OS::TripleO::Controller::Ports::StorageMgmtPort: ../my_templates/network/ports/noop.yaml

No VIP is configured on pcs. I think this bug can be close.

Comment 6 Ben Nemec 2018-01-19 21:13:55 UTC
Closing per the previous comment.