Bug 1274239 - Old Pods cann't be deleted after change the project defaultNodeSelector
Summary: Old Pods cann't be deleted after change the project defaultNodeSelector
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.0.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Andy Goldstein
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-22 10:42 UTC by Anping Li
Modified: 2015-11-23 14:26 UTC (History)
5 users (show)

Fixed In Version: atomic-openshift-3.0.2.905-0.git.0.85d6f88.el7aos
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-23 14:26:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Anping Li 2015-10-22 10:42:15 UTC
Description of problem:


Version-Release number of selected component (if applicable):
openshift version
openshift v3.0.2.902
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2

How reproducible:
always

Steps to Reproduce:
After change the default project region from infra to primary. old route/reigstry pods are in Terminating status and can't be deleted. 
The new route pod can't be deployed to new region, so router is out of service.

Steps:
1 The default project defaultNodeSelector= 'region=infra'
   "annotations": {
                   "openshift.io/node-selector": "region=infra",
    <----skip----> }

2 Create docker-registry and router
3 Change the default project defaultNPuddle AtomicOpenShift/3.1/2015-10-21.4odeSelector to 'region=primary'
4 delete docker-registry and router pods
5 checking the pod status and /var/log/messages
6. oc delete all --all and create docker-registry and router again. These pods still can't be deleted and router can be deployed.


Actual results:

[root@openshift-141 ~]# oc get pods
NAME                      READY     STATUS        RESTARTS   AGE
docker-registry-1-jo0js   1/1       Terminating   0          1h
docker-registry-1-q7mz4   1/1       Running       0          7m
router-1-deploy           0/1       Error         0          6m
router-1-wrmry            1/1       Terminating   0          1h
[root@openshift-141 ~]# 

2) Error was reported in /var/log/messagesPod can't be deleted after change the project defaultNodeSelectorPuddle AtomicOpenShift/3.1/2015-10-21.4

Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.327125391+08:00" level=info msg="GET /containers/json?all=1"
Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.327352834+08:00" level=info msg="GET /containers/json?all=1"
Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.327518611+08:00" level=info msg="GET /containers/json"
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.336541    2299 manager.go:108] Failed to updated pod status: error updating status for pod "router-1-wrmry_default": pods "router-1-wrmry" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.343719    2299 manager.go:108] Failed to updated pod status: error updating status for pod "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.350993    2299 manager.go:108] Failed to updated pod status: error updating status for pod "router-1-wrmry_default": pods "router-1-wrmry" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 atomic-openshift-node: W1022 17:01:30.359496    2299 manager.go:108] Failed to updated pod status: error updating status for pod "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is forbidden: pod node label selector conflicts with its project node label selector
Oct 22 17:01:30 openshift-141 docker: time="2015-10-22T17:01:30.426992898+08:00" level=info msg="GET /containers/json


Expected Results
New router pod can be created, old pods can be deleted.

Comment 1 Andy Goldstein 2015-10-29 15:52:38 UTC
This doesn't appear to be a router issue. Reassigning to cluster infra team.

Comment 2 Andy Goldstein 2015-10-29 16:19:38 UTC
The only issue I see here is the following:

1. label a node with region=infra
2. set project's default node selector to region=infra
3. create a pod with node selector region=infra
4. change project's default node selector to region=primary
5. delete pod

At this point, the pod stays in terminating, and the error message is Failed to updated pod status: error updating status for pod "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is forbidden: pod node label selector conflicts with its project node label selector.

I am able to create a new pod by setting its node selector to region=primary (as long as there is a node with that label).

I will look into the deletion issue.

Comment 3 Andy Goldstein 2015-10-29 18:22:40 UTC
https://github.com/openshift/origin/pull/5500

Comment 4 Andy Goldstein 2015-10-29 18:24:35 UTC
PR is in the merge queue.

Comment 5 openshift-github-bot 2015-10-30 04:14:27 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/fe6e529d591478d712c0c361aeabe4691b1f44e4
Only run pod nodeenv admission on create

Only run pod nodeenv admission on create. Don't run it on update. Fixes
the following scenario:

1. label a node with region=infra
2. set project's default node selector to region=infra
3. create a pod with node selector region=infra
4. change project's default node selector to region=primary
5. try to delete pod

Without this fix, the nodeenv admission plugin will reject a pod update
with this error:

    Failed to updated pod status: error updating status for pod
    "docker-registry-1-jo0js_default": pods "docker-registry-1-jo0js" is
    forbidden: pod node label selector conflicts with its project node
    label selector.

The end result is the pod remains in the Terminating phase, instead of
being deleted.

Fixes bug 1274239

Comment 6 Anping Li 2015-10-30 04:56:39 UTC
The bugs was opened against on Enterprise, Was it merged into OSE3.1?

Comment 7 Andy Goldstein 2015-10-30 10:52:51 UTC
It's not in OSE yet. I'll put together a PR for that.

Comment 8 Andy Goldstein 2015-10-30 13:14:11 UTC
Moving back to modified until this is in the next OSE build. Scott will update when that happens.

Comment 10 Anping Li 2015-11-02 06:01:59 UTC
The pod can be deleted after change default selector. so move to Verified.

Comment 11 Brenton Leanhardt 2015-11-23 14:26:08 UTC
This fix is available in OpenShift Enterprise 3.1.


Note You need to log in before you can comment on or make changes to this bug.