Description of problem: OpenShift and openshift-ansible 3.9.0-0.31.0 During the install the webconsole pod is stuck in Pending with the following issue: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 5s (x13 over 2m) default-scheduler 0/5 nodes are available: 1 NodeUnschedulable, 4 MatchNodeSelector. Making the master schedulable allows the install to proceed. see also: https://bugzilla.redhat.com/show_bug.cgi?id=1535673 Version-Release number of the following components: openshift-ansible-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm openshift-ansible-docs-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm openshift-ansible-playbooks-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm openshift-ansible-roles-3.9.0-0.31.0.git.0.e0a0ad8.el7.noarch.rpm How reproducible: Always Steps to Reproduce: 1. Use openshift-ansible 3.9.0-0.31.0 to install Openshift 3.9.0-0.31.0 Actual results: The check for the web console to be running is stuck until I manually made the master schedulable. TASK [openshift_web_console : Verify that the web console is running] ********** Monday 29 January 2018 12:50:43 +0000 (0:00:01.414) 0:11:37.128 ******** FAILED - RETRYING: Verify that the web console is running (120 retries left). FAILED - RETRYING: Verify that the web console is running (119 retries left). FAILED - RETRYING: Verify that the web console is running (118 retries left). FAILED - RETRYING: Verify that the web console is running (117 retries left). FAILED - RETRYING: Verify that the web console is running (116 retries left). FAILED - RETRYING: Verify that the web console is running (115 retries left). FAILED - RETRYING: Verify that the web console is running (114 retries left). Inventory (some info redacted) [OSEv3:children] masters nodes etcd [OSEv3:vars] #The following parameters is used by post-actions iaas_name=AWS use_rpm_playbook=true openshift_playbook_rpm_repos=[{'id': 'aos-playbook-rpm', 'name': 'aos-playbook-rpm', 'baseurl': 'http://download.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.9/latest/x86_64/os', 'enabled': 1, 'gpgcheck': 0}] update_is_images_url=registry.reg-aws.openshift.com:443 #The following parameters is used by openshift-ansible ansible_ssh_user=root openshift_cloudprovider_kind=aws openshift_cloudprovider_aws_access_key=<redacted> openshift_cloudprovider_aws_secret_key=<redacted> openshift_master_default_subdomain_enable=true openshift_master_default_subdomain=apps.0129-os8.qe.rhcloud.com openshift_auth_type=allowall openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}] openshift_release=v3.9 openshift_deployment_type=openshift-enterprise openshift_cockpit_deployer_prefix=registry.reg-aws.openshift.com:443/openshift3/ oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version} oreg_auth_user={{ lookup('env','REG_AUTH_USER') }} oreg_auth_password={{ lookup('env','REG_AUTH_PASSWORD') }} openshift_docker_additional_registries=registry.reg-aws.openshift.com:443 openshift_docker_insecure_registries=registry.reg-aws.openshift.com:443 openshift_service_catalog_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ose- ansible_service_broker_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ose- ansible_service_broker_image_tag=v3.9 template_service_broker_prefix=registry.reg-aws.openshift.com:443/openshift3/ose- template_service_broker_version=v3.9 openshift_web_console_prefix=registry.reg-aws.openshift.com:443/openshift3/ose- openshift_enable_service_catalog=true osm_cockpit_plugins=['cockpit-kubernetes'] osm_use_cockpit=false openshift_docker_options=--log-opt max-size=100M --log-opt max-file=3 --signature-verification=false use_cluster_metrics=true openshift_master_cluster_method=native openshift_master_dynamic_provisioning_enabled=true openshift_hosted_router_registryurl=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version} openshift_hosted_registry_registryurl=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version} osm_default_node_selector=region=primary openshift_registry_selector="region=infra,zone=default" openshift_hosted_router_selector="region=infra,zone=default" openshift_disable_check=disk_availability,memory_availability,package_availability,docker_image_availability,docker_storage,package_version openshift_master_portal_net=172.24.0.0/14 openshift_portal_net=172.24.0.0/14 osm_cluster_network_cidr=172.20.0.0/14 osm_host_subnet_length=9 openshift_node_kubelet_args={"pods-per-core": ["0"], "max-pods": ["510"], "image-gc-high-threshold": ["80"], "image-gc-low-threshold": ["70"]} debug_level=2 openshift_set_hostname=true openshift_override_hostname_check=true os_sdn_network_plugin_name=redhat/openshift-ovs-networkpolicy openshift_hosted_router_replicas=1 openshift_hosted_registry_storage_kind=object openshift_hosted_registry_storage_provider=s3 openshift_hosted_registry_storage_s3_accesskey=<redacted> openshift_hosted_registry_storage_s3_secretkey=<redacted> openshift_hosted_registry_storage_s3_bucket=aoe-svt-test openshift_hosted_registry_storage_s3_region=us-west-2 openshift_hosted_registry_replicas=1 openshift_hosted_prometheus_deploy=true openshift_prometheus_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_prometheus_image_version=v3.9 openshift_prometheus_proxy_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_prometheus_proxy_image_version=v3.9 openshift_prometheus_alertmanager_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_prometheus_alertmanager_image_version=v3.9 openshift_prometheus_alertbuffer_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_prometheus_alertbuffer_image_version=v3.9 openshift_metrics_install_metrics=false openshift_metrics_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_metrics_image_version=v3.9 openshift_metrics_cassandra_storage_type=dynamic openshift_metrics_cassandra_pvc_size=25Gi openshift_logging_install_logging=false openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_logging_image_version=v3.9 openshift_logging_storage_kind=dynamic openshift_logging_es_pvc_size=50Gi openshift_logging_es_pvc_dynamic=true openshift_clusterid=mffiedler-39 openshift_image_tag=v3.9.0-0.31.0 [lb] [etcd] ec2-54-149-171-156.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-171-156.us-west-2.compute.amazonaws.com [masters] ec2-54-149-171-156.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-171-156.us-west-2.compute.amazonaws.com [nodes] ec2-54-149-171-156.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-171-156.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_scheduleable=false ec2-54-149-182-141.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-182-141.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" ec2-54-149-182-141.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-149-182-141.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" ec2-34-217-73-171.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-34-217-73-171.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}" ec2-54-213-250-6.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-54-213-250-6.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}" ec2-34-209-72-237.us-west-2.compute.amazonaws.com ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/home/slave3/workspace/Launch Environment Flexy/private/config/keys/id_rsa_perf" openshift_public_hostname=ec2-34-209-72-237.us-west-2.compute.amazonaws.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}" Expected results: Successful install
Need to make sure that masters are by default schedulable and I think this would be addressed.
Created PR https://github.com/openshift/openshift-ansible/pull/6932
Lets add a check to ensure that if the console is deployed that masters are not openshift_schedulable=false. openshift_sanitize_inventory is likely a good place for this.
(In reply to Scott Dodson from comment #3) > Lets add a check to ensure that if the console is deployed that masters are > not openshift_schedulable=false. openshift_sanitize_inventory is likely a > good place for this. Created https://github.com/openshift/openshift-ansible/pull/6984 to address this
Fix is available in openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7
"Taint master nodes" task is not merged into openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch yet.
After go through the code, seem like the PR would introduce some other issues. 1. service catalog would have no available node to deploy. By default, installer would label the 1st master node with "openshift-infra=apiserver", once taint is added for all masters, then service catalog daemonset would fail to deploy pod. 2. By default, installer would deploy logging fluentd via daemonset on all nodes, also including master node, once train is added for all master nodes, that means no fluentd pod is running on master nodes, logging can not collect log from there.
(In reply to Johnny Liu from comment #7) > After go through the code, seem like the PR would introduce some other > issues. > > 1. service catalog would have no available node to deploy. > By default, installer would label the 1st master node with > "openshift-infra=apiserver", once taint is added for all masters, then > service catalog daemonset would fail to deploy pod. > > 2. By default, installer would deploy logging fluentd via daemonset on all > nodes, also including master node, once train is added for all master nodes, > that means no fluentd pod is running on master nodes, logging can not > collect log from there. Good points, these would be discussed. Sounds like service catalog and logging templates should add tolerations too (In reply to Johnny Liu from comment #6) > "Taint master nodes" task is not merged into > openshift-ansible-3.9.0-0.36.0.git.0.da68f13.el7.noarch yet. Correct, tainting masters is still being discussed and is out of scope of this issue.
@Vadim Rutkovsky Upgrade git the issue related with schedulable master that some app pods are scheduled on master node after upgrade. I think this is not expected result. # oc get node NAME STATUS ROLES AGE VERSION qe-jliu-r-master-etcd-1 Ready master 2h v1.9.1+a0ce1bc657 qe-jliu-r-node-registry-router-1 Ready <none> 2h v1.9.1+a0ce1bc657 # oc get pod -o wide --all-namespaces |grep master default registry-console-2-hlgln 1/1 Running 0 1h 10.129.0.4 qe-jliu-r-master-etcd-1 install-test mongodb-1-psr9l 1/1 Running 0 1h 10.129.0.5 qe-jliu-r-master-etcd-1 install-test nodejs-mongodb-example-1-k56zh 1/1 Running 0 1h 10.129.0.18 qe-jliu-r-master-etcd-1 openshift-web-console webconsole-54877f6577-g7tb8 1/1 Running 0 1h 10.129.0.2 qe-jliu-r-master-etcd-1 test mysql-1-ptblc 1/1 Running 0 1h 10.129.0.19 qe-jliu-r-master-etcd-1 Not sure if the issue is in the scope of this bug, or should I track it in a new bug?
(In reply to liujia from comment #9) > @Vadim Rutkovsky > Upgrade git the issue related with schedulable master that some app pods are > scheduled on master node after upgrade. I think this is not expected result. Right, that's certainly not expected > Not sure if the issue is in the scope of this bug, or should I track it in a > new bug? Lets file a new bug for this (and move this one in VERIFIED), as it gets pretty complex to track it. The new bug should be a blocker for 3.9
Verified this bug with openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch, and PASS. Now master is schedulable, web console could be deployed successfully.