Bug 1674078 - [Hackfest]OSP director unable to deploy OCP without CNS on multinode setup
Summary: [Hackfest]OSP director unable to deploy OCP without CNS on multinode setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z2
: 14.0 (Rocky)
Assignee: Martin André
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-09 00:50 UTC by wasantha gamage
Modified: 2019-10-23 06:39 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154866.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-30 17:51:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1811664 0 None None None 2019-02-11 15:03:24 UTC
Red Hat Product Errata RHBA-2019:0878 0 None None None 2019-04-30 17:51:23 UTC

Description wasantha gamage 2019-02-09 00:50:10 UTC
Description of problem:
OSP director deployed OCP fails on multinode (3x master, 3x infra and 3x worker)setup when CNS is not enabled. I have removed CNS by not including Glusterfs service in OCP infra and worker nodes as well as exclude /usr/share/openstack-tripleo-heat-templates/environments/openshift-cns.yaml

It seems  /usr/share/ansible/openshift-ansible/roles/openshift_hosted/tasks/storage/glusterfs.yml is still run even without any gluster configs.

Version-Release number of selected component (if applicable):
RHOSP 14

How reproducible:


Steps to Reproduce:
1. Deploy OCP without CNS enabled on a 3x master, 3x infra and 3x worker setup.
Deploy will fail as shown below.


Actual results:
   "",
    "PLAY [Create Hosted Resources - registry storage] ******************************",
    "",
    "TASK [Gathering Facts] *********************************************************",
    "\u001b[0;32mok: [openshift-openshiftmaster-1]\u001b[0m",
    "",
    "TASK [openshift_hosted : include_tasks] ****************************************",
    "\u001b[0;36mincluded: /usr/share/ansible/openshift-ansible/roles/openshift_hosted/tasks/storage/glusterfs.yml for openshift-openshiftmaster-1\u001b[0m",
    "",
    "TASK [openshift_hosted : Get registry DeploymentConfig] ************************",
    "\u001b[0;32mok: [openshift-openshiftmaster-1]\u001b[0m",
    "",
    "TASK [openshift_hosted : Wait for registry pods] *******************************",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (60 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (59 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (58 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (57 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (56 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (55 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (54 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (53 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (52 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (51 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (50 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (49 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (48 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (47 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (46 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (45 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (44 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (43 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (42 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (41 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (40 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (39 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (38 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (37 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (36 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (35 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (34 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (33 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (32 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (31 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (30 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (29 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (28 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (27 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (26 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (25 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (24 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (23 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (22 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (21 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (20 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (19 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (18 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (17 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (16 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (15 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (14 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (13 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (12 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (11 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (10 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (9 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (8 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (7 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (6 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (5 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (4 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (3 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (2 retries left).\u001b[0m",
    "\u001b[1;30mFAILED - RETRYING: Wait for registry pods (1 retries left).\u001b[0m",
    "\u001b[0;31mfatal: [openshift-openshiftmaster-1]: FAILED! => {\"attempts\": 60, \"changed\": false, \"results\": {\"cmd\": \"/bin/oc get pod --selector=docker-registry=default -o json -n default\", \"results\": [{\"apiVersion\": \"v1\", \"items\": [], \"kind\": \"List\", \"metadata\": {\"resourceVersion\": \"\", \"selfLink\": \"\"}}], \"returncode\": 0}, \"state\": \"list\"}\u001b[0m",
    "",
    "PLAY RECAP *********************************************************************",
    "\u001b[0;32mlocalhost\u001b[0m                  : \u001b[0;32mok=22  \u001b[0m changed=0    unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftinfra-0\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftinfra-1\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftinfra-2\u001b[0m : \u001b[0;32mok=178 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftmaster-0\u001b[0m : \u001b[0;32mok=357 \u001b[0m \u001b[0;33mchanged=153 \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;31mopenshift-openshiftmaster-1\u001b[0m : \u001b[0;32mok=653 \u001b[0m \u001b[0;33mchanged=270 \u001b[0m unreachable=0    \u001b[0;31mfailed=1   \u001b[0m",
    "\u001b[0;33mopenshift-openshiftmaster-2\u001b[0m : \u001b[0;32mok=357 \u001b[0m \u001b[0;33mchanged=153 \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftworker-0\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftworker-1\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "\u001b[0;33mopenshift-openshiftworker-2\u001b[0m : \u001b[0;32mok=177 \u001b[0m \u001b[0;33mchanged=73  \u001b[0m unreachable=0    failed=0   ",
    "",
    "",
    "INSTALLER STATUS ***************************************************************",
    "\u001b[0;32mInitialization              : Complete (0:02:04)\u001b[0m",
    "\u001b[0;32mHealth Check                : Complete (0:00:53)\u001b[0m",
    "\u001b[0;32mNode Bootstrap Preparation  : Complete (0:09:04)\u001b[0m",
    "\u001b[0;32metcd Install                : Complete (0:01:30)\u001b[0m",
    "\u001b[0;32mMaster Install              : Complete (0:08:13)\u001b[0m",
    "\u001b[0;32mMaster Additional Install   : Complete (0:06:18)\u001b[0m",
    "\u001b[0;32mNode Join                   : Complete (0:01:49)\u001b[0m",
    "\u001b[0;31mHosted Install              : In Progress (0:12:02)\u001b[0m",
    "\tThis phase can be restarted by running: playbooks/openshift-hosted/config.yml",
    "",
    "",
    "Failure summary:",
    "",
    "",
    "  1. Hosts:    openshift-openshiftmaster-1",
    "     Play:     Create Hosted Resources - registry storage",
    "     Task:     Wait for registry pods",
    "     Message:  \u001b[0;31mFailed without returning a message.\u001b[0m"


Expected results:

Deployment continue with ephemeral storage for registry.

Additional info:

###Deploy command ####
(undercloud) [stack@undercloud templates]$ cat ../scripts/openshift-deploy.sh 
#!/bin/bash

exec openstack overcloud deploy \
        --stack openshift \
        --timeout 90 \
        --verbose \
        --templates \
        -r /home/stack/wasantha/templates/openshift_roles_data.yaml \
        -n /home/stack/wasantha/templates/network_data_openshift.yaml \
        -e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
        -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
        -e /usr/share/openstack-tripleo-heat-templates/environments/openshift.yaml \
        -e /home/stack/wasantha/templates/openshift_env.yaml \
        -e /home/stack/wasantha/templates/containers-prepare-parameter.yaml \
        -e /home/stack/wasantha/templates/node-count.yaml \
        -e /home/stack/wasantha/templates/rhsm.yaml 
(undercloud) [stack@undercloud templates]$ 

### OCP roles ######

(undercloud) [stack@undercloud templates]$ cat openshift_roles_data.yaml 
###############################################################################
# File generated by TripleO
###############################################################################
###############################################################################
# Role: OpenShiftMaster                                                            #
###############################################################################
- name: OpenShiftMaster
  description: |
    OpenShiftMaster role
  CountDefault: 1
  RoleParametersDefault:
    OpenShiftNodeGroupName: 'node-config-master'
    DockerSkipUpdateReconfiguration: true
  tags:
    - primary
    - controller
    - openshift
  networks:
    - External
    - InternalApi
    - Storage
  # For systems with both IPv4 and IPv6, you may specify a gateway network for
  # each, such as ['ControlPlane', 'External']
  default_route_networks: ['External']
  ServicesDefault:
    - OS::TripleO::Services::ContainerImagePrepare
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::HAproxy
    - OS::TripleO::Services::Keepalived
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::OpenShift::Master
    - OS::TripleO::Services::Rhsm
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::TripleoFirewall
    - OS::TripleO::Services::TripleoPackages
###############################################################################
# Role: OpenShiftWorker                                                            #
###############################################################################
- name: OpenShiftWorker
  description: |
    OpenShiftWorker role
  CountDefault: 1
  RoleParametersDefault:
    OpenShiftNodeGroupName: 'node-config-compute'
    DockerSkipUpdateReconfiguration: true
  tags:
    - openshift
  networks:
    - External
    - InternalApi
    - Storage
  # For systems with both IPv4 and IPv6, you may specify a gateway network for
  # each, such as ['ControlPlane', 'External']
  #default_route_networks: ['ControlPlane']
  default_route_networks: ['External']
  ServicesDefault:
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::OpenShift::Worker
    - OS::TripleO::Services::Rhsm
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::TripleoFirewall
###############################################################################
# Role: OpenShiftInfra                                                        #
###############################################################################
- name: OpenShiftInfra
  description: |
    OpenShiftInfra role, a specialized worker that only runs infra pods.
  CountDefault: 1
  RoleParametersDefault:
    OpenShiftNodeGroupName: 'node-config-infra'
    DockerSkipUpdateReconfiguration: true
  tags:
    - openshift
  networks:
    - External
    - InternalApi
    - Storage
  # For systems with both IPv4 and IPv6, you may specify a gateway network for
  # each, such as ['ControlPlane', 'External']
  #default_route_networks: ['ControlPlane']
  default_route_networks: ['External']
  ServicesDefault:
    - OS::TripleO::Services::Docker
    - OS::TripleO::Services::Ntp
    - OS::TripleO::Services::OpenShift::Infra
    - OS::TripleO::Services::Rhsm
    - OS::TripleO::Services::Sshd
    - OS::TripleO::Services::TripleoFirewall
(undercloud) [stack@undercloud templates]$ 


### OCP environment

undercloud) [stack@undercloud templates]$ cat openshift_env.yaml 
resource_registry:
  OS::TripleO::OpenShiftMaster::Net::SoftwareConfig: /home/stack/wasantha/templates/nic-configs/master-nic.yaml
  OS::TripleO::OpenShiftWorker::Net::SoftwareConfig: /home/stack/wasantha/templates/nic-configs/worker-nic.yaml
  OS::TripleO::OpenShiftInfra::Net::SoftwareConfig: /home/stack/wasantha/templates/nic-configs/infra-nic.yaml
parameter_defaults:
  # by default Director assigns the VIP random from the allocation pool
  # by using the FixedIPs we can set the VIPs to predictable IPs before starting the deployment
  
  CloudName: openshift.localdomain
  PublicVirtualFixedIPs: [{'ip_address':'192.168.122.150'}]
#  ExternalNetCidr: 192.168.122.0/24
#  ExternalAllocationPools: [{'start': '192.168.122.100', 'end': '192.168.122.150'}]
#  ExternalInterfaceDefaultRoute: 192.168.122.1
  
  CloudNameInternal: internal.openshift.localdomain
  InternalApiVirtualFixedIPs: [{'ip_address':'172.17.1.150'}]
  
  CloudDomain: openshift.localdomain
  
  ## Required for CNS deployments only
#  OpenShiftInfraParameters:
#    OpenShiftGlusterDisks:
#      - /dev/vdb
#  
#  ## Required for CNS deployments only
#  OpenShiftWorkerParameters:
#    OpenShiftGlusterDisks:
#      - /dev/vdb
  
  NtpServer: ["clock.redhat.com","clock2.redhat.com"]
  
  ControlPlaneDefaultRoute: 172.16.0.1
  EC2MetadataIp: 172.16.0.1
  ControlPlaneSubnetCidr: 24
  
  # The DNS server below should have entries for resolving {internal,public,apps}.openshift.localdomain names
  DnsServers:
     - 192.168.122.1
     - 8.8.8.8
  
  OpenShiftGlobalVariables:
    openshift_master_identity_providers:
    - name: 'htpasswd_auth'
      login: 'true'
      challenge: 'true'
      kind: 'HTPasswdPasswordIdentityProvider'
    openshift_master_htpasswd_users:
      sysadmin: '$apr1$n4fQwl1x$zLwDsPAZoxQ.O/VL0AIQA.'
      admin: '$apr1$7cvR/xPO$Ih3qDOIlyLuplSIRI1iCQ1'
    #openshift_master_cluster_hostname should match the CloudNameInternal parameter
    openshift_master_cluster_hostname: internal.openshift.localdomain
    #openshift_master_cluster_public_hostname should match the CloudName parameter
    openshift_master_cluster_public_hostname: public.openshift.localdomain
    openshift_master_default_subdomain: apps.openshift.localdomain
    # skip memory check because of virtual env
    openshift_disable_check: memory_availability
(undercloud) [stack@undercloud templates]$ 



#### Node count #####
(undercloud) [stack@undercloud templates]$ cat node-count.yaml 
parameter_defaults:
  OpenShiftMasterCount: 3
  OvercloudOpenShiftMasterFlavor: m1.OpenShiftMaster

  OpenShiftInfraCount: 3
  OvercloudOpenShiftInfraFlavor: m1.OpenShiftInfra

  OpenShiftWorkerCount: 3
  OvercloudOpenShiftWorkerFlavor: m1.OpenShiftWorker
(undercloud) [stack@undercloud templates]$

Comment 1 John Trowbridge 2019-02-11 15:03:25 UTC
This should be fixed by https://bugs.launchpad.net/tripleo/+bug/1811664 ... it was likely not in the puddle you were testing.

Comment 10 errata-xmlrpc 2019-04-30 17:51:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0878


Note You need to log in before you can comment on or make changes to this bug.