Bug 1274239 - Old Pods cann't be deleted after change the project defaultNodeSelector
Old Pods cann't be deleted after change the project defaultNodeSelector
Status: CLOSED CURRENTRELEASE
Product: OpenShift Container Platform
Classification: Red Hat
Component: Kubernetes (Show other bugs)
3.0.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Andy Goldstein
Jianwei Hou
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-22 06:42 EDT by Anping Li
Modified: 2015-11-23 09:26 EST (History)
5 users (show)

See Also:
Fixed In Version: atomic-openshift-3.0.2.905-0.git.0.85d6f88.el7aos
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-23 09:26:08 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Anping Li 2015-10-22 06:42:15 EDT
Description of problem:


Version-Release number of selected component (if applicable):
openshift version
openshift v3.0.2.902
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2

How reproducible:
always

Steps to Reproduce:
After change the default project region from infra to primary. old route/reigstry pods are in Terminating status and can't be deleted. 
The new route pod can't be deployed to new region, so router is out of service.

Steps:
1 The default project defaultNodeSelector= 'region=infra'
   "annotations": {
                   "openshift.io/node-selector": "region=infra",
    <----skip----> }

2 Create docker-registry and router
3 Change the default project defaultNPuddle AtomicOpenShift/3.1/2015-10-21.4odeSelector to 'region=primary'
4 delete docker-registry and router pods
5 checking the pod status and /var/log/messages
6. oc delete all --all and create docker-registry and router again. These pods still can't be deleted and router can be deployed.


Actual results:

[root@openshift-141 ~]# oc get pods
NAME                      READY     STATUS        RESTARTS   AGE
docker-registry-1-jo0js   1/1       Terminating   0          1h
docker-registry-1-q7mz4   1/1       Running       0          7m
router-1-deploy           0/1       Error         0          6m
router-1-wrmry            1/1       Terminating   0          1h
[root@openshift-141 ~]# 

2) Error was reported in /var/log/messagesPod can't be deleted after change the project defaultNodeSelectorPuddle AtomicOpenShift/3.1/2015-10-21.4

Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.327125391+08:00" level=info msg="GET /containers/json?all=1"
Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.327352834+08:00" level=info msg="GET /containers/json?all=1"
Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.327518611+08:00" level=info msg="GET /containers/json"
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.336541    2299 manager.go:108] Failed to updated pod status: error updating status for pod "router-1-wrmry_default": pods "router-1-wrmry" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.343719    2299 manager.go:108] Failed to updated pod status: error updating status for pod "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.350993    2299 manager.go:108] Failed to updated pod status: error updating status for pod "router-1-wrmry_default": pods "router-1-wrmry" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.359496    2299 manager.go:108] Failed to updated pod status: error updating status for pod "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.426992898+08:00" level=info msg="GET /containers/json


Expected Results
New router pod can be created, old pods can be deleted.
Comment 1 Andy Goldstein 2015-10-29 11:52:38 EDT
This doesn't appear to be a router issue. Reassigning to cluster infra team.
Comment 2 Andy Goldstein 2015-10-29 12:19:38 EDT
The only issue I see here is the following:

1. label a node with region=infra
2. set project's default node selector to region=infra
3. create a pod with node selector region=infra
4. change project's default node selector to region=primary
5. delete pod

At this point, the pod stays in terminating, and the error message is Failed to updated pod status: error updating status for pod "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is forbidden: pod node label selector conflicts with its project node label selector.

I am able to create a new pod by setting its node selector to region=primary (as long as there is a node with that label).

I will look into the deletion issue.
Comment 3 Andy Goldstein 2015-10-29 14:22:40 EDT
https://github.com/openshift/origin/pull/5500
Comment 4 Andy Goldstein 2015-10-29 14:24:35 EDT
PR is in the merge queue.
Comment 5 openshift-github-bot 2015-10-30 00:14:27 EDT
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/fe6e529d591478d712c0c361aeabe4691b1f44e4
Only run pod nodeenv admission on create

Only run pod nodeenv admission on create. Don't run it on update. Fixes
the following scenario:

1. label a node with region=infra
2. set project's default node selector to region=infra
3. create a pod with node selector region=infra
4. change project's default node selector to region=primary
5. try to delete pod

Without this fix, the nodeenv admission plugin will reject a pod update
with this error:

    Failed to updated pod status: error updating status for pod
    "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is
    forbidden: pod node label selector conflicts with its project node
    label selector.

The end result is the pod remains in the Terminating phase, instead of
being deleted.

Fixes bug 1274239
Comment 6 Anping Li 2015-10-30 00:56:39 EDT
The bugs was opened against on Enterprise, Was it merged into OSE3.1?
Comment 7 Andy Goldstein 2015-10-30 06:52:51 EDT
It's not in OSE yet. I'll put together a PR for that.
Comment 8 Andy Goldstein 2015-10-30 09:14:11 EDT
Moving back to modified until this is in the next OSE build. Scott will update when that happens.
Comment 10 Anping Li 2015-11-02 01:01:59 EST
The pod can be deleted after change default selector. so move to Verified.
Comment 11 Brenton Leanhardt 2015-11-23 09:26:08 EST
This fix is available in OpenShift Enterprise 3.1.

Note You need to log in before you can comment on or make changes to this bug.