Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1539691

Summary:	3.9.0-0.31.0 - web console pod does not start because master is not schedulable
Product:	OpenShift Container Platform	Reporter:	Mike Fiedler <mifiedle>
Component:	Installer	Assignee:	Vadim Rutkovsky <vrutkovs>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Johnny Liu <jialiu>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, jiajliu, jokerman, mmccomas, wmeng
Target Milestone:	---
Target Release:	3.9.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-13 12:17:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mike Fiedler 2018-01-29 13:05:07 UTC

Description of problem:

OpenShift and openshift-ansible 3.9.0-0.31.0

During the install the webconsole pod is stuck in Pending with the following issue:
Events:                          
  Type     Reason            Age               From               Message                                                              
  ----     ------            ----              ----               -------                                                              
  Warning  FailedScheduling  5s (x13 over 2m)  default-scheduler  0/5 nodes are available: 1 NodeUnschedulable, 4 MatchNodeSelector. 

Making the master schedulable allows the install to proceed.

see also:  https://bugzilla.redhat.com/show_bug.cgi?id=1535673


Version-Release number of the following components:

openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm
openshift-ansible-docs-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm
openshift-ansible-playbooks-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm
openshift-ansible-roles-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm

How reproducible:  Always

Steps to Reproduce:
1.  Use openshift-ansible 3.9.0-0.31.0 to install Openshift 3.9.0-0.31.0

Actual results:

The check for the web console to be running is stuck until I manually made the master schedulable.

TASK [openshift_web_console : Verify that the web console is running] **********
Monday 29 January 2018  12:50:43 +0000 (0:00:01.414)       0:11:37.128 ******** 
FAILED - RETRYING: Verify that the web console is running (120 retries left).
FAILED - RETRYING: Verify that the web console is running (119 retries left).
FAILED - RETRYING: Verify that the web console is running (118 retries left).
FAILED - RETRYING: Verify that the web console is running (117 retries left).
FAILED - RETRYING: Verify that the web console is running (116 retries left).
FAILED - RETRYING: Verify that the web console is running (115 retries left).
FAILED - RETRYING: Verify that the web console is running (114 retries left).

Inventory (some info redacted) 

[OSEv3:children]
masters
nodes

etcd





[OSEv3:vars]

#The following parameters is used by post-actions
iaas_name=AWS
use_rpm_playbook=true
openshift_playbook_rpm_repos=[{'id': 'aos-playbook-rpm', 'name': 'aos-playbook-rpm', 'baseurl': 'http://download.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.9/latest/x86_64/os', 'enabled': 1, 'gpgcheck': 0}]




update_is_images_url=registry.reg-aws.openshift.com:443











#The following parameters is used by openshift-ansible
ansible_ssh_user=root




openshift_cloudprovider_kind=aws

openshift_cloudprovider_aws_access_key=<redacted>


openshift_cloudprovider_aws_secret_key=<redacted>












openshift_master_default_subdomain_enable=true
openshift_master_default_subdomain=apps.0129-os8.qe.rhcloud.com




openshift_auth_type=allowall

openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]



openshift_release=v3.9
openshift_deployment_type=openshift-enterprise
openshift_cockpit_deployer_prefix=registry.reg-aws.openshift.com:443/openshift3/
oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
oreg_auth_user={{ lookup('env','REG_AUTH_USER') }}
oreg_auth_password={{ lookup('env','REG_AUTH_PASSWORD') }}
openshift_docker_additional_registries=registry.reg-aws.openshift.com:443
openshift_docker_insecure_registries=registry.reg-aws.openshift.com:443
openshift_service_catalog_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ose-
ansible_service_broker_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ose-
ansible_service_broker_image_tag=v3.9
template_service_broker_prefix=registry.reg-aws.openshift.com:443/openshift3/ose-
template_service_broker_version=v3.9
openshift_web_console_prefix=registry.reg-aws.openshift.com:443/openshift3/ose-
openshift_enable_service_catalog=true
osm_cockpit_plugins=['cockpit-kubernetes']
osm_use_cockpit=false
openshift_docker_options=--log-opt max-size=100M --log-opt max-file=3 --signature-verification=false
use_cluster_metrics=true
openshift_master_cluster_method=native
openshift_master_dynamic_provisioning_enabled=true
openshift_hosted_router_registryurl=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
openshift_hosted_registry_registryurl=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
osm_default_node_selector=region=primary
openshift_registry_selector="region=infra,zone=default"
openshift_hosted_router_selector="region=infra,zone=default"
openshift_disable_check=disk_availability,memory_availability,package_availability,docker_image_availability,docker_storage,package_version
openshift_master_portal_net=172.24.0.0/14
openshift_portal_net=172.24.0.0/14
osm_cluster_network_cidr=172.20.0.0/14
osm_host_subnet_length=9
openshift_node_kubelet_args={"pods-per-core": ["0"], "max-pods": ["510"], "image-gc-high-threshold": ["80"], "image-gc-low-threshold": ["70"]}
debug_level=2
openshift_set_hostname=true
openshift_override_hostname_check=true
os_sdn_network_plugin_name=redhat/openshift-ovs-networkpolicy
openshift_hosted_router_replicas=1
openshift_hosted_registry_storage_kind=object
openshift_hosted_registry_storage_provider=s3
openshift_hosted_registry_storage_s3_accesskey=<redacted>
openshift_hosted_registry_storage_s3_secretkey=<redacted>
openshift_hosted_registry_storage_s3_bucket=aoe-svt-test
openshift_hosted_registry_storage_s3_region=us-west-2
openshift_hosted_registry_replicas=1
openshift_hosted_prometheus_deploy=true
openshift_prometheus_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_prometheus_image_version=v3.9
openshift_prometheus_proxy_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_prometheus_proxy_image_version=v3.9
openshift_prometheus_alertmanager_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_prometheus_alertmanager_image_version=v3.9
openshift_prometheus_alertbuffer_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_prometheus_alertbuffer_image_version=v3.9
openshift_metrics_install_metrics=false
openshift_metrics_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_metrics_image_version=v3.9
openshift_metrics_cassandra_storage_type=dynamic
openshift_metrics_cassandra_pvc_size=25Gi
openshift_logging_install_logging=false
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_logging_image_version=v3.9
openshift_logging_storage_kind=dynamic
openshift_logging_es_pvc_size=50Gi
openshift_logging_es_pvc_dynamic=true
openshift_clusterid=mffiedler-39
openshift_image_tag=v3.9.0-0.31.0




[lb]


[etcd]
ec2-54-149-171-156.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-171-156.us-west-2.compute.amazonaws.com


[masters]
ec2-54-149-171-156.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-171-156.us-west-2.compute.amazonaws.com



[nodes]
ec2-54-149-171-156.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-171-156.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_scheduleable=false

ec2-54-149-182-141.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-182-141.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"

ec2-54-149-182-141.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-182-141.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"

ec2-34-217-73-171.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-34-217-73-171.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
ec2-54-213-250-6.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-213-250-6.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
ec2-34-209-72-237.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-34-209-72-237.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"



Expected results:

Successful install

Comment 1 Scott Dodson 2018-01-29 13:39:02 UTC

Need to make sure that masters are by default schedulable and I think this would be addressed.

Comment 2 Vadim Rutkovsky 2018-01-30 10:33:01 UTC

Created PR https://github.com/openshift/openshift-ansible/pull/6932

Comment 3 Scott Dodson 2018-02-01 15:16:38 UTC

Lets add a check to ensure that if the console is deployed that masters are not openshift_schedulable=false. openshift_sanitize_inventory is likely a good place for this.

Comment 4 Vadim Rutkovsky 2018-02-01 17:04:57 UTC

(In reply to Scott Dodson from comment #3)
> Lets add a check to ensure that if the console is deployed that masters are
> not openshift_schedulable=false. openshift_sanitize_inventory is likely a
> good place for this.

Created https://github.com/openshift/openshift-ansible/pull/6984 to address this

Comment 5 Vadim Rutkovsky 2018-02-02 17:51:33 UTC

Fix is available in openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7

Comment 6 Johnny Liu 2018-02-05 08:48:06 UTC

"Taint master nodes" task is not merged into openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch yet.

Comment 7 Johnny Liu 2018-02-05 09:45:15 UTC

After go through the code, seem like the PR would introduce some other issues.

1. service catalog would have no available node to deploy.
By default, installer would label the 1st master node with "openshift-infra=apiserver", once taint is added for all masters, then service catalog daemonset would fail to deploy pod.

2. By default, installer would deploy logging fluentd via daemonset on all nodes, also including master node, once train is added for all master nodes, that means no fluentd pod is running on master nodes, logging can not collect log from there.

Comment 8 Vadim Rutkovsky 2018-02-05 09:57:05 UTC

(In reply to Johnny Liu from comment #7)
> After go through the code, seem like the PR would introduce some other
> issues.
> 
> 1. service catalog would have no available node to deploy.
> By default, installer would label the 1st master node with
> "openshift-infra=apiserver", once taint is added for all masters, then
> service catalog daemonset would fail to deploy pod.
> 
> 2. By default, installer would deploy logging fluentd via daemonset on all
> nodes, also including master node, once train is added for all master nodes,
> that means no fluentd pod is running on master nodes, logging can not
> collect log from there.

Good points, these would be discussed. Sounds like service catalog and logging templates should add tolerations too

(In reply to Johnny Liu from comment #6)
> "Taint master nodes" task is not merged into
> openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch yet.

Correct, tainting masters is still being discussed and is out of scope of this issue.

Comment 9 liujia 2018-02-23 05:52:52 UTC

@Vadim Rutkovsky 
Upgrade git the issue related with schedulable master that some app pods are scheduled on master node after upgrade. I think this is not expected result. 

# oc get node
NAME                               STATUS    ROLES     AGE       VERSION
qe-jliu-r-master-etcd-1            Ready     master    2h        v1.9.1+a0ce1bc657
qe-jliu-r-node-registry-router-1   Ready     <none>    2h        v1.9.1+a0ce1bc657


# oc get pod -o wide --all-namespaces |grep master
default                 registry-console-2-hlgln         1/1       Running     0          1h        10.129.0.4    qe-jliu-r-master-etcd-1
install-test            mongodb-1-psr9l                  1/1       Running     0          1h        10.129.0.5    qe-jliu-r-master-etcd-1
install-test            nodejs-mongodb-example-1-k56zh   1/1       Running     0          1h        10.129.0.18   qe-jliu-r-master-etcd-1
openshift-web-console   webconsole-54877f6577-g7tb8      1/1       Running     0          1h        10.129.0.2    qe-jliu-r-master-etcd-1
test                    mysql-1-ptblc                    1/1       Running     0          1h        10.129.0.19   qe-jliu-r-master-etcd-1

Not sure if the issue is in the scope of this bug, or should I track it in a new bug?

Comment 10 Vadim Rutkovsky 2018-02-23 09:58:58 UTC

(In reply to liujia from comment #9)
> @Vadim Rutkovsky 
> Upgrade git the issue related with schedulable master that some app pods are
> scheduled on master node after upgrade. I think this is not expected result. 

Right, that's certainly not expected

> Not sure if the issue is in the scope of this bug, or should I track it in a
> new bug?

Lets file a new bug for this (and move this one in VERIFIED), as it gets pretty complex to track it. The new bug should be a blocker for 3.9

Comment 11 Johnny Liu 2018-03-01 02:34:25 UTC

Verified this bug with openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch, and PASS.

Now master is schedulable, web console could be deployed successfully.