1563399 – Pods cannot be scheduled in post-install namespaces without adding node-selector annotation to namespace

Bug 1563399 - Pods cannot be scheduled in post-install namespaces without adding node-selector annotation to namespace

Summary: Pods cannot be scheduled in post-install namespaces without adding node-selec...

Keywords:
Status:	CLOSED DUPLICATE of bug 1567028
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Vadim Rutkovsky
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-03 20:31 UTC by Bill DeCoste
Modified:	2018-04-13 12:21 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-13 12:21:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Bill DeCoste 2018-04-03 20:31:11 UTC

Description of problem:

With a standard OCP3.9 install on a new RHEL VM using "atomic-openshift-installer install" onto a combined master+node. Create a new project and attempt to deploy a pod to that project. The pod will get stuck in Pending and master logs state that the pod cannot be scheduled


Version-Release number of selected component (if applicable):


How reproducible:
Has occurred on my only install and also reported by Gary Lamperillo 


Steps to Reproduce:
1. Install clean OCP3.9
2. oc new-project demo
3. deploy any image to demo namespace

Actual results:
Pod will get stuck in Pending and master logs state that the pod cannot be scheduled


Expected results:
Pod is scheduled and created


Additional info:
Pods can be deployed to the default project but not any project created post-install. Comparing the default project and a test project the former contains the following annotation: openshift.io/node-selector: ""

The latter project does not. When I add the missing annotation to the new project via 'oc edit project test' then the pods are scheduled and created as expected.

-----------------------------------------------------------

[root@ocp3x-master ~]# oc version
oc v3.9.14
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ocp3x-master.example.com:8443
openshift v3.9.14
kubernetes v1.9.1+a0ce1bc657

-------------------------------------------------------------

[root@ocp3x-master ~]# yum info atomic-openshift-utils
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
Name        : atomic-openshift-utils
Arch        : noarch
Version     : 3.9.14
Release     : 1.git.3.c62bc34.el7
Size        : 151 k
Repo        : installed
From repo   : rhel-7-server-ose-3.9-rpms
Summary     : Atomic OpenShift Utilities
URL         : https://github.com/openshift/openshift-ansible
License     : ASL 2.0
Description : Atomic OpenShift Utilities includes
            :  - atomic-openshift-installer
            :  - other utilities

------------------------------------------------------------------


root@ocp3x-master ~]# cat .config/openshift/installer.cfg.yml 
ansible_callback_facts_yaml: /root/.config/openshift/.ansible/callback_facts.yaml
ansible_inventory_path: /root/.config/openshift/hosts
ansible_log_path: /tmp/ansible.log
deployment:
  ansible_ssh_user: root
  hosts:
  - connect_to: ocp3x-master.example.com
    hostname: ocp3x-master.example.com
    ip: 192.168.122.3
    node_labels: '{''region'': ''infra''}'
    public_hostname: ocp3x-master.example.com
    public_ip: 192.168.122.3
    roles:
    - master
    - etcd
    - node
    - storage
  master_routingconfig_subdomain: example.com
  openshift_disable_check: memory_availability,disk_availability,docker_storage
  openshift_enable_service_catalog: 'False'
  openshift_master_cluster_hostname: None
  openshift_master_cluster_public_hostname: None
  proxy_exclude_hosts: ''
  proxy_http: ''
  proxy_https: ''
  roles:
    etcd: {}
    master: {}
    node: {}
    storage: {}
variant: openshift-enterprise
variant_version: '3.9'
version: v2

Comment 1 Scott Dodson 2018-04-03 20:52:57 UTC

This is happening because we only label non master, non infra nodes as node-role.kubernetes.io/compute=true and then set the default node selector to match that.

In the special case where all nodes are masters or all nodes are infra that's not going to work. I think we should special case those and apply the label 'node-role.kubernetes.io/compute=true' to the list of labels we set for the installer.

Comment 2 Brenton Leanhardt 2018-04-03 20:54:24 UTC

Just to be clear, this is only happening because of the default atomic-openshift-installer behavior.  This is not something production environments would ever hit.

Comment 3 Gary Lamperillo 2018-04-04 20:45:18 UTC

(In reply to Brenton Leanhardt from comment #2)
> Just to be clear, this is only happening because of the default
> atomic-openshift-installer behavior.  This is not something production
> environments would ever hit.

Will it be fixed in OCP 3.9?  Right now, a single Master/Node VM install is not an option.

Comment 4 Scott Dodson 2018-04-13 12:21:12 UTC

Bug 1567028 is same root cause, marking this as a dupe of that one as that bug and it's 3.10 counterpart already have a PR that's in progress.

*** This bug has been marked as a duplicate of bug 1567028 ***

Note You need to log in before you can comment on or make changes to this bug.