Bug 1273799 - Failed to deleted the pod after remove the pod manifest from specified path on node
Failed to deleted the pod after remove the pod manifest from specified path o...
Status: CLOSED NOTABUG
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
3.1.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Paul Weil
chaoyang
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-21 05:23 EDT by Meng Bo
Modified: 2015-10-29 08:39 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-29 08:39:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Meng Bo 2015-10-21 05:23:19 EDT
Description of problem:
Config the node-config.yaml to enable the pod manifest config:
podManifestConfig:
  Path: "/tmp/manifest"
  FileCheckIntervalSeconds: 30

Add a pod manifest to the /tmp/manifest and wait for 30s, the pod gets created. And remove the pod manifest from above path, the pod cannot be deleted unless restart the openshift-node service.

Version-Release number of selected component (if applicable):
openshift v3.0.2.901-61-g568adb6
kubernetes v1.1.0-alpha.1-653-g86b4e77


How reproducible:
always

Steps to Reproduce:
1. Modify the node-config.yaml to enable the podManifestConfig and restart the node
2. Add pod manifest to the path
{
  "kind": "Pod",
  "apiVersion":"v1",   
  "metadata": {
        "name": "hello-pod-2",
        "labels": {
                "name": "hello-pod-2"
        }
  },
  "spec": {
      "containers": [{
        "name": "hello-pod-2",
        "image": "bmeng/hello-openshift",
        "securityContext": {
                    "privileged": false,
                    "seLinuxOptions": {
                        "level": "s0:c1,c0"
                    },
                    "runAsUser": 1000000000
                }

      }]
  }
}

3. Remove the pod manifest after the pod gets created

Actual results:
The pod cannot be deleted after the pod manifest deleted.

Expected results:
The pod should be removed.

Additional info:
This works well with latest origin env.

Node logs after remove the manifest:

Oct 21 17:12:43 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: I1021 17:12:43.230593   18507 kubelet.go:1920] SyncLoop (REMOVE): "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default"
Oct 21 17:12:53 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: E1021 17:12:53.351779   18507 kubelet.go:1367] Failed creating a mirror pod "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default": pods "hello-pod-2-openshift-140.lab.eng.nay.redhat.com" already exists
Oct 21 17:13:03 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: E1021 17:13:03.366189   18507 kubelet.go:1367] Failed creating a mirror pod "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default": pods "hello-pod-2-openshift-140.lab.eng.nay.redhat.com" already exists
Oct 21 17:13:13 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: E1021 17:13:13.418657   18507 kubelet.go:1367] Failed creating a mirror pod "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default": pods "hello-pod-2-openshift-140.lab.eng.nay.redhat.com" already exists
Oct 21 17:13:23 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: E1021 17:13:23.416387   18507 kubelet.go:1367] Failed creating a mirror pod "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default": pods "hello-pod-2-openshift-140.lab.eng.nay.redhat.com" already exists
Oct 21 17:13:33 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: E1021 17:13:33.491462   18507 kubelet.go:1367] Failed creating a mirror pod "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default": pods "hello-pod-2-openshift-140.lab.eng.nay.redhat.com" already exists
Oct 21 17:13:43 openshift-140.lab.eng.nay.redhat.com atomic-openshift-node[18507]: E1021 17:13:43.452702   18507 kubelet.go:1367] Failed creating a mirror pod "hello-pod-2-openshift-140.lab.eng.nay.redhat.com_default": pods "hello-pod-2-openshift-140.lab.eng.nay.redhat.com" already exists
Comment 2 Paul Weil 2015-10-28 09:20:17 EDT
Bo,

I see in the first comment you note 

"Additional info:
This works well with latest origin env."

Does this mean you cannot reproduce this in the latest build?
Comment 3 Meng Bo 2015-10-29 05:04:29 EDT
@pweil, I reported this with build openshift v3.0.2.901-61-g568adb6 on OSE, and try this with latest Origin build, should be version v1.0.6-9xx which cannot be reproduced.

And I tried it on today's AtomicOpenShift build v3.0.2.903-114-g2849767 also cannot reproduce this. Maybe the issue was fixed by some other code changes.

But there are some logs which look weird to me:


Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.562323   30873 kubelet.go:1975] SyncLoop (REMOVE, "file"): "hello-pod-openshift-155.lab.eng.nay.redhat.com_default"
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.569821   30873 kubelet.go:1787] Killing unwanted pod "hello-pod-openshift-155.lab.eng.nay.redhat.com"
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.571157   30873 manager.go:1414] Killing container "d92be618532d4ad1617d0a3c89cc94e053826697e9b4c8f630f09306bfceacef /" with 30 second grace period
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.601346   30873 kubelet.go:1972] SyncLoop (UPDATE, "api"): "hello-pod-openshift-155.lab.eng.nay.redhat.com_default"
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.601404   30873 kubelet.go:1975] SyncLoop (REMOVE, "api"): "hello-pod-openshift-155.lab.eng.nay.redhat.com_default"
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.633708   30873 manager.go:1446] Container "d92be618532d4ad1617d0a3c89cc94e053826697e9b4c8f630f09306bfceacef /" exited after 62.522984ms
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com ovs-vsctl[34097]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port veth454fc07
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: E1029 16:57:40.739252   30873 manager.go:1337] Failed tearing down the infra container: exit status 1
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.740856   30873 manager.go:1414] Killing container "e05067660e4a6686cd5f794ef4161575c8ba244790f8ab8928e36659d2754ead /" with 30 second grace period
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:40.920967   30873 manager.go:1446] Container "e05067660e4a6686cd5f794ef4161575c8ba244790f8ab8928e36659d2754ead /" exited after 180.064709ms
Oct 29 16:57:40 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: E1029 16:57:40.921050   30873 kubelet.go:1790] Failed killing the pod "hello-pod-openshift-155.lab.eng.nay.redhat.com": failed to delete containers ([exit status 1])
Oct 29 16:57:42 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:42.644486   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Oct 29 16:57:43 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:43.644514   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Oct 29 16:57:45 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:45.644491   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Oct 29 16:57:49 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:49.644616   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Oct 29 16:57:54 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:54.117127   30873 helpers.go:96] Unable to get network stats from pid 33873: couldn't read network stats: failure opening /proc/33873/net/dev: open /proc/33873/net/dev: no such file or directory
Oct 29 16:57:55 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:55.117102   30873 helpers.go:96] Unable to get network stats from pid 33873: couldn't read network stats: failure opening /proc/33873/net/dev: open /proc/33873/net/dev: no such file or directory
Oct 29 16:57:57 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:57.117114   30873 helpers.go:96] Unable to get network stats from pid 33873: couldn't read network stats: failure opening /proc/33873/net/dev: open /proc/33873/net/dev: no such file or directory
Oct 29 16:57:57 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:57:57.644490   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Oct 29 16:58:01 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:58:01.117104   30873 helpers.go:96] Unable to get network stats from pid 33873: couldn't read network stats: failure opening /proc/33873/net/dev: open /proc/33873/net/dev: no such file or directory
Oct 29 16:58:09 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:58:09.117138   30873 helpers.go:96] Unable to get network stats from pid 33873: couldn't read network stats: failure opening /proc/33873/net/dev: open /proc/33873/net/dev: no such file or directory
Oct 29 16:58:12 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:58:12.644576   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Oct 29 16:58:13 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[30873]: I1029 16:58:13.644460   30873 helpers.go:96] Unable to get network stats from pid 33944: couldn't read network stats: failure opening /proc/33944/net/dev: open /proc/33944/net/dev: no such file or directory
Comment 4 Paul Weil 2015-10-29 08:39:52 EDT
Thanks Bo.  I will close this issue then.  The log that you're showing looks like it is occurring in the upstream cadvisor code.  Jimmi Dyson would be the correct person to help with that.  Please open a new issue if you're still experience the /proc errors.

Note You need to log in before you can comment on or make changes to this bug.