Bug 1476433

Summary:

[free-stg] repeated invocations of oadm drain unable to drain node

Product:

OpenShift Container Platform

Reporter:

Justin Pierce <jupierce>

Component:

Containers

Assignee:

Antonio Murdaca <amurdaca>

Status:

CLOSED DUPLICATE

QA Contact:

DeShuai Ma <dma>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

3.6.0

CC:

aos-bugs, eparis, jhonce, jligon, jokerman, jupierce, mmccomas, mwoodson, sdodson, sjenning

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-01 15:19:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1460729

Bug Blocks:

Attachments:

Description	Flags
Backtrace in readable form	none

Description Justin Pierce 2017-07-29 01:55:16 UTC

Description of problem:
Following recommended workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1473777 , I was killing oadm drain operations after they hung for 

Version-Release number of selected component (if applicable):
v3.6.172.0.1

How reproducible:
Unknown

Steps to Reproduce:
1. Repeated oadm drain operations on a cluster
2.
3.

Actual results:
oadm hangs while draining a particular node. Even if the operation is aborted with a kill PID, when openshift ansible retries the operation, it will hang again. 

Expected results:
oadm should always reliably succeed if there is capacity on the cluster.


Additional info:
Jenkins log: https://atomic-e2e-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/aos-cd/job/cluster%252Foperation/286/console

Look at the end of the job and search on "oadm drain ip-172-31-75-193.us-east-2.compute.internal" . At least 6 attempts, each running 20 minutes or more all failed to drain the node. The reason the oadm drain operation terminates at all is that I was running a kill on the associated PID on the master after waiting sufficient time for each operation to complete. The node never did drain.


[root@free-stg-master-03fb6 ~]# oadm manage-node ip-172-31-75-193.us-east-2.compute.internal --list-pods 

Listing matched pods on node: ip-172-31-75-193.us-east-2.compute.internal

NAME               READY     STATUS        RESTARTS   AGE
oso-clamd-57h3b    1/2       Error         1150       32d
mongodb-8-jj7xp    0/1       Terminating   0          8d
database-2-5fcz6   1/1       Terminating   0          8d



Docker journal from failing node: http://file.rdu.redhat.com/~jupierce/share/free-stg-node-compute-c5a63-docker-drain-fail.log



In the log, lots of entries like:
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.188649580Z" level=info msg="Container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 failed to exit within 30 seconds of signal 15 - using the force"
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.189330600Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found"
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.382396874Z" level=info msg="{Action=stop, LoginUID=4294967295, PID=22560}"
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.385072466Z" level=error msg="Handler for POST /v1.24/containers/4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048/stop?t=10 returned error: Container 4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048 is already stopped"
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.385133638Z" level=error msg="Handler for POST /v1.24/containers/4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048/stop returned error: Container 4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048 is already stopped"
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.523115223Z" level=info msg="{Action=stop, LoginUID=4294967295, PID=22560}"
Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.524525391Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found"
Jul 28 22:28:32 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:32.560705254Z" level=info msg="Container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d failed to exit within 30 seconds of signal 15 - using the force"




[root@free-stg-node-compute-c5a63 ~]# docker ps
CONTAINER ID        IMAGE                                                                                                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
631352cbb397        172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9   "/bin/sh -c /usr/loca"   4 days ago          Up 4 days                               k8s_oso-clamd-update_oso-clamd-57h3b_management-infra_456773b5-5a85-11e7-ba4c-02306c0cdc4b_12
5b193c9c4c72        registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec              "container-entrypoint"   4 days ago          Up 4 days                               k8s_mysql_database-2-5fcz6_yasun-1-1_f17b1c10-6d80-11e7-91e2-02306c0cdc4b_0
ffcccf58ec0c        registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece            "container-entrypoint"   4 days ago          Up 4 days                               k8s_mongodb_mongodb-8-jj7xp_xyz_f1c9f7b6-6d80-11e7-91e2-02306c0cdc4b_0
d1ebd69692fa        registry.ops.openshift.com/openshift3/ose-pod:v3.6.152.0                                                                             "/usr/bin/pod"           4 days ago          Up 4 days                               k8s_POD_oso-clamd-57h3b_management-infra_456773b5-5a85-11e7-ba4c-02306c0cdc4b_12
a74b18a71be0        registry.ops.openshift.com/ops/oso-rhel7-host-monitoring:stg                                                                         "/bin/sh -c /usr/loca"   9 days ago          Up 9 days                               oso-rhel7-host-monitoring



Stack trace of oadm drain interrupted with SIGABRT:

WARNING: Deleting pods with local storage: oso-clamd-57h3b; Ignoring DaemonSet-managed pods: oso-clamd-57h3b\nSIGABRT: abort\nPC=0x463551 m=0\n\ngoroutine 0 [idle]:\nruntime.futex(0xc031eb0, 0x0, 0x0, 0x0, 0xc400000000, 0x3600000035, 0x0, 0x0, 0x7ffdd4464528, 0x411f82, ...)\n\t/usr/lib/golang/src/runtime/sys_linux_amd64.s:387 +0x21\nruntime.futexsleep(0xc031eb0, 0x0, 0xffffffffffffffff)\n\t/usr/lib/golang/src/runtime/os_linux.go:45 +0x62\nruntime.notesleep(0xc031eb0)\n\t/usr/lib/golang/src/runtime/lock_futex.go:145 +0x82\nruntime.stopm()\n\t/usr/lib/golang/src/runtime/proc.go:1594 +0xad\nruntime.findrunnable(0xc420019500, 0x0)\n\t/usr/lib/golang/src/runtime/proc.go:2021 +0x228\nruntime.schedule()\n\t/usr/lib/golang/src/runtime/proc.go:2120 +0x14c\nruntime.goexit0(0xc4210f6820)\n\t/usr/lib/golang/src/runtime/proc.go:2257 +0x180\nruntime.mcall(0x7ffdd44646d0)\n\t/usr/lib/golang/src/runtime/asm_amd64.s:240 +0x5b\n\ngoroutine 1 [select, 1 minutes]:\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods(0xc42101f970, 0xc4210f3800, 0x2, 0x2, 0xc4201df5d0, 0xe, 0xc42012b650, 0x0, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:526 +0x44d\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).deleteOrEvictPods(0xc42101f970, 0xc4210f3800, 0x2, 0x2, 0x0, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:480 +0xfd\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).deleteOrEvictPodsSimple(0xc42101f970, 0x1, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:255 +0x85\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).RunDrain(0xc42101f970, 0x0, 0xc4210f9a90)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:242 +0x5b\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.NewCmdDrain.func1(0xc4201d0fc0, 0xc4210f9a90, 0x1, 0x5)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:183 +0x81\ngithub.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc4201d0fc0, 0xc4210f99f0, 0x5, 0x5, 0xc4201d0fc0, 0xc4210f99f0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:603 +0x439\ngithub.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420442240, 0xc420028008, 0xc420028018, 0xc420442240)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:689 +0x367\ngithub.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420442240, 0x4, 0xc420442240)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:648 +0x2b\nmain.main()\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/cmd/openshift/openshift.go:36 +0x196\n\ngoroutine 17 [syscall, 1 minutes, locked to thread]:\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:2086 +0x1\n\ngoroutine 5 [chan receive]:\ngithub.com/openshift/origin/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0xc0305e0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/golang/glog/glog.go:879 +0x7a\ncreated by github.com/openshift/origin/vendor/github.com/golang/glog.init.1\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/golang/glog/glog.go:410 +0x21d\n\ngoroutine 34 [syscall, 1 minutes]:\nos/signal.signal_recv(0xc42049bfa8)\n\t/usr/lib/golang/src/runtime/sigqueue.go:116 +0x157\nos/signal.loop()\n\t/usr/lib/golang/src/os/signal/signal_unix.go:22 +0x22\ncreated by os/signal.init.1\n\t/usr/lib/golang/src/os/signal/signal_unix.go:28 +0x41\n\ngoroutine 160 [select]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc4201945a0, 0x3b9aca00, 0x7fffffffffffffff, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:314 +0x1fd\ncreated by github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:328 +0x91\n\ngoroutine 96 [chan receive]:\ngithub.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc42100b920)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x94\ncreated by github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xd2\n\ngoroutine 110 [chan receive]:\ngithub.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc42103e0a0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x94\ncreated by github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xd2\n\ngoroutine 121 [select]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x7c572c8, 0x12a05f200, 0x0, 0x1809301, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:108 +0x16e\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0x7c572c8, 0x12a05f200, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52 +0x4d\ncreated by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/logs.InitLogs\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/logs/logs.go:50 +0xa8\n\ngoroutine 128 [IO wait]:\nnet.runtime_pollWait(0x7fe0773daf70, 0x72, 0x6)\n\t/usr/lib/golang/src/runtime/netpoll.go:160 +0x59\nnet.(*pollDesc).wait(0xc4210a8d80, 0x72, 0xc420036808, 0xc420010180)\n\t/usr/lib/golang/src/net/fd_poll_runtime.go:73 +0x38\nnet.(*pollDesc).waitRead(0xc4210a8d80, 0xa39aa80, 0xc420010180)\n\t/usr/lib/golang/src/net/fd_poll_runtime.go:78 +0x34\nnet.(*netFD).Read(0xc4210a8d20, 0xc421354000, 0x4000, 0x4000, 0x0, 0xa39aa80, 0xc420010180)\n\t/usr/lib/golang/src/net/fd_unix.go:243 +0x1a1\nnet.(*conn).Read(0xc420259540, 0xc421354000, 0x4000, 0x4000, 0x0, 0x0, 0x0)\n\t/usr/lib/golang/src/net/net.go:173 +0x70\ncrypto/tls.(*block).readFromUntil(0xc42121e300, 0xa390b00, 0xc420259540, 0x5, 0xc420259540, 0x0)\n\t/usr/lib/golang/src/crypto/tls/conn.go:476 +0x91\ncrypto/tls.(*Conn).readRecord(0xc4211e8380, 0x7c65b17, 0xc4211e8488, 0x0)\n\t/usr/lib/golang/src/crypto/tls/conn.go:578 +0xc4\ncrypto/tls.(*Conn).Read(0xc4211e8380, 0xc4210e5000, 0x1000, 0x1000, 0x0, 0x0, 0x0)\n\t/usr/lib/golang/src/crypto/tls/conn.go:1113 +0x116\nbufio.(*Reader).fill(0xc4211a8480)\n\t/usr/lib/golang/src/bufio/bufio.go:97 +0x10c\nbufio.(*Reader).Read(0xc4211a8480, 0xc420fa5d78, 0x9, 0x9, 0x0, 0x441ae0, 0x42e5cc)\n\t/usr/lib/golang/src/bufio/bufio.go:209 +0x1bc\nio.ReadAtLeast(0xa36f0c0, 0xc4211a8480, 0xc420fa5d78, 0x9, 0x9, 0x9, 0xc421aee580, 0xc420036d10, 0x1edd245)\n\t/usr/lib/golang/src/io/io.go:307 +0xa4\nio.ReadFull(0xa36f0c0, 0xc4211a8480, 0xc420fa5d78, 0x9, 0x9, 0xc421ae8de0, 0x50c7bd5, 0xa)\n\t/usr/lib/golang/src/io/io.go:325 +0x58\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.readFrameHeader(0xc420fa5d78, 0x9, 0x9, 0xa36f0c0, 0xc4211a8480, 0x0, 0xc400000000, 0x28f2827, 0xc420036f80)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/frame.go:237 +0x7b\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc420fa5d40, 0xc421b2aea0, 0x0, 0x0, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/frame.go:464 +0xa4\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.(*clientConnReadLoop).run(0xc420036f80, 0x7c59ef8, 0xc420498790)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:1233 +0x97\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.(*ClientConn).readLoop(0xc421150000)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:1170 +0xa6\ncreated by github.com/openshift/origin/vendor/golang.org/x/net/http2.(*Transport).NewClientConn\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:492 +0x89b\n\ngoroutine 156 [chan receive]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc42142df00, 0xc42121e360, 0xc42006c120, 0x1, 0x10)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:270 +0x5d\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal(0xc42142df00, 0xc42121e360, 0x0, 0xc42142df00)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:187 +0x41\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc42142df00, 0xc42121e360, 0xc42142df00, 0xc42121e360)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:212 +0x5a\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x3b9aca00, 0x7fffffffffffffff, 0xc42121e360, 0x413aa8, 0x300)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:201 +0x4d\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).waitForDelete(0xc42101f970, 0xc420b05500, 0x1, 0x1, 0x3b9aca00, 0x7fffffffffffffff, 0x1, 0xc42012b650, 0xa, 0xc4201fc9ca, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:583 +0x134\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods.func1(0xc42101f970, 0xc4201df5d0, 0xe, 0xc42012b650, 0x0, 0x0, 0x0, 0x0, 0xc4201fc9b0, 0xf, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:508 +0x365\ncreated by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:514 +0x19b\n\ngoroutine 142 [select]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc420889f20, 0x3b9aca00, 0x7fffffffffffffff, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:314 +0x1fd\ncreated by github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:328 +0x91\n\ngoroutine 157 [chan receive]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc4214213c0, 0xc42153c030, 0xc42006c120, 0x1, 0x10)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:270 +0x5d\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal(0xc4214213c0, 0xc42153c030, 0x0, 0xc4214213c0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:187 +0x41\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc4214213c0, 0xc42153c030, 0xc4214213c0, 0xc42153c030)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:212 +0x5a\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x3b9aca00, 0x7fffffffffffffff, 0xc42153c030, 0x413aa8, 0x300)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:201 +0x4d\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).waitForDelete(0xc42101f970, 0xc4211e3800, 0x1, 0x1, 0x3b9aca00, 0x7fffffffffffffff, 0x1, 0xc42012b650, 0xb, 0xc4201fcf40, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:583 +0x134\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods.func1(0xc42101f970, 0xc4201df5d0, 0xe, 0xc42012b650, 0x0, 0x0, 0x0, 0x0, 0xc4201fcf20, 0x10, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:508 +0x365\ncreated by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:514 +0x19b\n\nrax    0xca\nrbx    0xc030b10\nrcx    0xffffffffffffffff\nrdx    0x0\nrdi    0xc031eb0\nrsi    0x0\nrbp    0x7ffdd44644f8\nrsp    0x7ffdd44644b0\nr8     0x0\nr9     0x0\nr10    0x0\nr11    0x286\nr12    0x7c4bd5a\nr13    0xf4\nr14    0x58\nr15    0x4\nrip    0x463551\nrflags 0x286\ncs     0x33\nfs     0x0\ngs     0x0

Comment 1 Justin Pierce 2017-07-29 01:57:55 UTC

Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1473777, but killing the process and retrying does not clear the issue.

Comment 2 Justin Pierce 2017-07-29 02:05:56 UTC

Describe of pods still associated with the node:

[root@free-stg-master-03fb6 ~]# oc get pods --all-namespaces | grep -e oso-clamd-57h3b -e mongodb-8-jj7xp -e database-2-5fcz6
management-infra         oso-clamd-57h3b                    1/2       Error              1153       32d
xyz                      mongodb-8-jj7xp                    0/1       Terminating        0          8d
yasun-1-1                database-2-5fcz6                   1/1       Terminating        0          8d
[root@free-stg-master-03fb6 ~]# oc describe -n management-infra oso-clamd-57h3b
the server doesn't have a resource type "oso-clamd-57h3b"
[root@free-stg-master-03fb6 ~]# oc describe -n management-infra pod oso-clamd-57h3b
Name:			oso-clamd-57h3b
Namespace:		management-infra
Security Policy:	privileged
Node:			ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193
Start Time:		Mon, 26 Jun 2017 15:36:50 +0000
Labels:			name=oso-clamd
Annotations:		kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"management-infra","name":"oso-clamd","uid":"1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7",...
			openshift.io/scc=privileged
Status:			Running
IP:			10.128.5.52
Controllers:		DaemonSet/oso-clamd
Containers:
  oso-clamd:
    Container ID:	docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72
    Image:		172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest
    Image ID:		docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd@sha256:90ba009c48d28494aac15240f41d311a5384db3e97649f5eeffc1cefb055b9e6
    Port:		
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	1
      Started:		Sat, 29 Jul 2017 01:59:17 +0000
      Finished:		Sat, 29 Jul 2017 01:59:26 +0000
    Ready:		False
    Restart Count:	1141
    Environment:
      OO_PAUSE_ON_START:	false
    Mounts:
      /var/lib/clamav from clamsigs (ro)
      /var/run/clamd.scan from clamsock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from clamd-deployer-token-dwzg1 (ro)
  oso-clamd-update:
    Container ID:	docker://631352cbb39726531fab65c7b5621a513d570e47035a4a0ba8e049f8d730b713
    Image:		172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest
    Image ID:		docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9
    Port:		
    State:		Running
      Started:		Mon, 24 Jul 2017 21:35:45 +0000
    Last State:		Terminated
      Reason:		Error
      Message:		Error on reading termination log /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: open /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: no such file or directory
      Exit Code:	1
      Started:		Wed, 19 Jul 2017 23:14:02 +0000
      Finished:		Mon, 24 Jul 2017 20:22:48 +0000
    Ready:		True
    Restart Count:	12
    Environment:
      OO_PAUSE_ON_START:	false
    Mounts:
      /token from token (rw)
      /usr/bin/oc from usr-bin-oc (rw)
      /var/lib/clamav from clamsigs (rw)
      /var/run/clamd.scan from clamsock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from clamd-deployer-token-dwzg1 (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  clamsock:
    Type:	HostPath (bare host directory volume)
    Path:	/var/run/clamd.scan
  clamsigs:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  token:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	clamd-image-watcher-token
    Optional:	false
  usr-bin-oc:
    Type:	HostPath (bare host directory volume)
    Path:	/usr/bin/oc
  clamd-deployer-token-dwzg1:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	clamd-deployer-token-dwzg1
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	type=compute
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----							-------------			--------	------		-------
  3d		1h		19107	kubelet, ip-172-31-75-193.us-east-2.compute.internal	spec.containers{oso-clamd}	Warning		BackOff		Back-off restarting failed container
  3d		37m		847	kubelet, ip-172-31-75-193.us-east-2.compute.internal	spec.containers{oso-clamd}	Normal		Pulling		pulling image "172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest"
  3d		2m		19475	kubelet, ip-172-31-75-193.us-east-2.compute.internal					Warning		FailedSync	Error syncing pod
[root@free-stg-master-03fb6 ~]# oc describe pod mongodb-8-jj7xp -n xyz
Name:				mongodb-8-jj7xp
Namespace:			xyz
Security Policy:		restricted
Node:				ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193
Start Time:			Thu, 20 Jul 2017 19:23:48 +0000
Labels:				deployment=mongodb-8
				deploymentconfig=mongodb
				name=mongodb
Annotations:			kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"xyz","name":"mongodb-8","uid":"3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b","...
				kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container mongodb; cpu limit for container mongodb
				openshift.io/deployment-config.latest-version=8
				openshift.io/deployment-config.name=mongodb
				openshift.io/deployment.name=mongodb-8
				openshift.io/scc=restricted
Status:				Terminating (expires Fri, 28 Jul 2017 21:24:55 +0000)
Termination Grace Period:	30s
IP:				
Controllers:			ReplicationController/mongodb-8
Containers:
  mongodb:
    Container ID:	docker://ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d
    Image:		registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
    Image ID:		docker-pullable://registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
    Port:		27017/TCP
    State:		Running
      Started:		Mon, 24 Jul 2017 21:02:39 +0000
    Ready:		False
    Restart Count:	0
    Limits:
      cpu:	1
      memory:	512Mi
    Requests:
      cpu:	60m
      memory:	307Mi
    Liveness:	tcp-socket :27017 delay=30s timeout=1s period=10s #success=1 #failure=3
    Readiness:	exec [/bin/sh -i -c mongo 127.0.0.1:27017/$MONGODB_DATABASE -u $MONGODB_USER -p $MONGODB_PASSWORD --eval="quit()"] delay=3s timeout=1s period=10s #success=1 #failure=3
    Environment:
      MONGODB_USER:		<set to the key 'database-user' in secret 'nodejs-mongo-persistent'>		Optional: false
      MONGODB_PASSWORD:		<set to the key 'database-password' in secret 'nodejs-mongo-persistent'>	Optional: false
      MONGODB_DATABASE:		sampledb
      MONGODB_ADMIN_PASSWORD:	<set to the key 'database-admin-password' in secret 'nodejs-mongo-persistent'>	Optional: false
    Mounts:
      /var/lib/mongodb/data from mongodb-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-rbzg6 (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  mongodb-data:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	mongodb
    ReadOnly:	false
  default-token-rbzg6:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-rbzg6
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	type=compute
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath			Type		Reason	Message
  ---------	--------	-----	----							-------------			--------	------	-------
  4h		4s		519	kubelet, ip-172-31-75-193.us-east-2.compute.internal	spec.containers{mongodb}	Normal		Killing	Killing container with id docker://mongodb:Need to kill Pod
[root@free-stg-master-03fb6 ~]# oc describe pod database-2-5fcz6 -n yasun-1-1 
Name:				database-2-5fcz6
Namespace:			yasun-1-1
Security Policy:		restricted
Node:				ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193
Start Time:			Thu, 20 Jul 2017 19:23:48 +0000
Labels:				deployment=database-2
				deploymentconfig=database
				name=database
Annotations:			kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"yasun-1-1","name":"database-2","uid":"3cc07aeb-6cd6-11e7-bb9e-02306c0c...
				kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container mysql; cpu limit for container mysql
				openshift.io/deployment-config.latest-version=2
				openshift.io/deployment-config.name=database
				openshift.io/deployment.name=database-2
				openshift.io/scc=restricted
Status:				Terminating (expires Fri, 28 Jul 2017 21:24:56 +0000)
Termination Grace Period:	30s
IP:				
Controllers:			ReplicationController/database-2
Containers:
  mysql:
    Container ID:	docker://5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24
    Image:		registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
    Image ID:		docker-pullable://registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
    Port:		3306/TCP
    State:		Running
      Started:		Mon, 24 Jul 2017 21:08:44 +0000
    Ready:		True
    Restart Count:	0
    Limits:
      cpu:	1
      memory:	512Mi
    Requests:
      cpu:	60m
      memory:	307Mi
    Liveness:	tcp-socket :3306 delay=30s timeout=1s period=10s #success=1 #failure=3
    Readiness:	exec [/bin/sh -i -c MYSQL_PWD='aE6hTcmx' mysql -h 127.0.0.1 -u userVCI -D sampledb -e 'SELECT 1'] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      MYSQL_USER:	<set to the key 'database-user' in secret 'dancer-mysql-persistent'>		Optional: false
      MYSQL_PASSWORD:	<set to the key 'database-password' in secret 'dancer-mysql-persistent'>	Optional: false
      MYSQL_DATABASE:	sampledb
    Mounts:
      /var/lib/mysql/data from database-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qqx38 (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	True 
  PodScheduled 	True 
Volumes:
  database-data:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	database
    ReadOnly:	false
  default-token-qqx38:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-qqx38
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	type=compute
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath		Type		Reason	Message
  ---------	--------	-----	----							-------------		--------	------	-------
  4h		37s		519	kubelet, ip-172-31-75-193.us-east-2.compute.internal	spec.containers{mysql}	Normal		Killing	Killing container with id docker://mysql:Need to kill Pod
[root@free-stg-master-03fb6 ~]#



Yaml for pods:
[root@free-stg-master-03fb6 ~]# oc get pod database-2-5fcz6 -n yasun-1-1 -o=yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"yasun-1-1","name":"database-2","uid":"3cc07aeb-6cd6-11e7-bb9e-02306c0cdc4b","apiVersion":"v1","resourceVersion":"34780432"}}
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
      mysql; cpu limit for container mysql'
    openshift.io/deployment-config.latest-version: "2"
    openshift.io/deployment-config.name: database
    openshift.io/deployment.name: database-2
    openshift.io/scc: restricted
  creationTimestamp: 2017-07-20T19:23:43Z
  deletionGracePeriodSeconds: 30
  deletionTimestamp: 2017-07-28T21:24:56Z
  generateName: database-2-
  labels:
    deployment: database-2
    deploymentconfig: database
    name: database
  name: database-2-5fcz6
  namespace: yasun-1-1
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: database-2
    uid: 3cc07aeb-6cd6-11e7-bb9e-02306c0cdc4b
  resourceVersion: "36769986"
  selfLink: /api/v1/namespaces/yasun-1-1/pods/database-2-5fcz6
  uid: f17b1c10-6d80-11e7-91e2-02306c0cdc4b
spec:
  containers:
  - env:
    - name: MYSQL_USER
      valueFrom:
        secretKeyRef:
          key: database-user
          name: dancer-mysql-persistent
    - name: MYSQL_PASSWORD
      valueFrom:
        secretKeyRef:
          key: database-password
          name: dancer-mysql-persistent
    - name: MYSQL_DATABASE
      value: sampledb
    image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: 3306
      timeoutSeconds: 1
    name: mysql
    ports:
    - containerPort: 3306
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - /bin/sh
        - -i
        - -c
        - MYSQL_PWD='aE6hTcmx' mysql -h 127.0.0.1 -u userVCI -D sampledb -e 'SELECT
          1'
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 60m
        memory: 307Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - NET_RAW
        - SETGID
        - SETUID
        - SYS_CHROOT
      privileged: false
      runAsUser: 1005000000
      seLinuxOptions:
        level: s0:c71,c15
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mysql/data
      name: database-data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-qqx38
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-24g9f
  nodeName: ip-172-31-75-193.us-east-2.compute.internal
  nodeSelector:
    type: compute
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1005000000
    seLinuxOptions:
      level: s0:c71,c15
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: database-data
    persistentVolumeClaim:
      claimName: database
  - name: default-token-qqx38
    secret:
      defaultMode: 420
      secretName: default-token-qqx38
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-07-20T19:23:48Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-07-25T22:22:13Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-07-20T19:23:43Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24
    image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
    imageID: docker-pullable://registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
    lastState: {}
    name: mysql
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-07-24T21:08:44Z
  hostIP: 172.31.75.193
  phase: Running
  qosClass: Burstable
  startTime: 2017-07-20T19:23:48Z


[root@free-stg-master-03fb6 ~]# oc get pod mongodb-8-jj7xp -n xyz -o=yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"xyz","name":"mongodb-8","uid":"3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b","apiVersion":"v1","resourceVersion":"34780694"}}
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
      mongodb; cpu limit for container mongodb'
    openshift.io/deployment-config.latest-version: "8"
    openshift.io/deployment-config.name: mongodb
    openshift.io/deployment.name: mongodb-8
    openshift.io/scc: restricted
  creationTimestamp: 2017-07-20T19:23:43Z
  deletionGracePeriodSeconds: 30
  deletionTimestamp: 2017-07-28T21:24:55Z
  generateName: mongodb-8-
  labels:
    deployment: mongodb-8
    deploymentconfig: mongodb
    name: mongodb
  name: mongodb-8-jj7xp
  namespace: xyz
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: mongodb-8
    uid: 3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b
  resourceVersion: "36770066"
  selfLink: /api/v1/namespaces/xyz/pods/mongodb-8-jj7xp
  uid: f1c9f7b6-6d80-11e7-91e2-02306c0cdc4b
spec:
  containers:
  - env:
    - name: MONGODB_USER
      valueFrom:
        secretKeyRef:
          key: database-user
          name: nodejs-mongo-persistent
    - name: MONGODB_PASSWORD
      valueFrom:
        secretKeyRef:
          key: database-password
          name: nodejs-mongo-persistent
    - name: MONGODB_DATABASE
      value: sampledb
    - name: MONGODB_ADMIN_PASSWORD
      valueFrom:
        secretKeyRef:
          key: database-admin-password
          name: nodejs-mongo-persistent
    image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: 27017
      timeoutSeconds: 1
    name: mongodb
    ports:
    - containerPort: 27017
      protocol: TCP
    readinessProbe:
      exec:
        command:
        - /bin/sh
        - -i
        - -c
        - mongo 127.0.0.1:27017/$MONGODB_DATABASE -u $MONGODB_USER -p $MONGODB_PASSWORD
          --eval="quit()"
      failureThreshold: 3
      initialDelaySeconds: 3
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "1"
        memory: 512Mi
      requests:
        cpu: 60m
        memory: 307Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - NET_RAW
        - SETGID
        - SETUID
        - SYS_CHROOT
      privileged: false
      runAsUser: 1000130000
      seLinuxOptions:
        level: s0:c11,c10
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mongodb/data
      name: mongodb-data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-rbzg6
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-2zp9s
  nodeName: ip-172-31-75-193.us-east-2.compute.internal
  nodeSelector:
    type: compute
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000130000
    seLinuxOptions:
      level: s0:c11,c10
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: mongodb-data
    persistentVolumeClaim:
      claimName: mongodb
  - name: default-token-rbzg6
    secret:
      defaultMode: 420
      secretName: default-token-rbzg6
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-07-20T19:23:48Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-07-25T22:22:10Z
    message: 'containers with unready status: [mongodb]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-07-20T19:23:43Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d
    image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
    imageID: docker-pullable://registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
    lastState: {}
    name: mongodb
    ready: false
    restartCount: 0
    state:
      running:
        startedAt: 2017-07-24T21:02:39Z
  hostIP: 172.31.75.193
  phase: Running
  qosClass: Burstable
  startTime: 2017-07-20T19:23:48Z


[root@free-stg-master-03fb6 ~]# oc get -n management-infra pod oso-clamd-57h3b -o=yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"management-infra","name":"oso-clamd","uid":"1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7","apiVersion":"extensions","resourceVersion":"22329955"}}
    openshift.io/scc: privileged
  creationTimestamp: 2017-06-26T15:36:49Z
  generateName: oso-clamd-
  labels:
    name: oso-clamd
  name: oso-clamd-57h3b
  namespace: management-infra
  ownerReferences:
  - apiVersion: extensions/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: oso-clamd
    uid: 1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7
  resourceVersion: "36799993"
  selfLink: /api/v1/namespaces/management-infra/pods/oso-clamd-57h3b
  uid: 456773b5-5a85-11e7-ba4c-02306c0cdc4b
spec:
  containers:
  - env:
    - name: OO_PAUSE_ON_START
      value: "false"
    image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest
    imagePullPolicy: Always
    name: oso-clamd
    resources: {}
    securityContext:
      capabilities:
        drop:
        - AUDIT_WRITE
        - CHOWN
        - DAC_OVERRIDE
        - FOWNER
        - FSETID
        - KILL
        - MKNOD
        - NET_BIND_SERVICE
        - NET_RAW
        - SETFCAP
        - SETGID
        - SETPCAP
        - SETUID
        - SYS_CHROOT
      privileged: true
      runAsUser: 10000
      seLinuxOptions:
        level: s0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/clamav
      name: clamsigs
      readOnly: true
    - mountPath: /var/run/clamd.scan
      name: clamsock
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: clamd-deployer-token-dwzg1
      readOnly: true
  - env:
    - name: OO_PAUSE_ON_START
      value: "false"
    image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest
    imagePullPolicy: Always
    name: oso-clamd-update
    resources: {}
    securityContext:
      privileged: false
      runAsUser: 10001
      seLinuxOptions:
        level: s0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/clamav
      name: clamsigs
    - mountPath: /token
      name: token
    - mountPath: /usr/bin/oc
      name: usr-bin-oc
    - mountPath: /var/run/clamd.scan
      name: clamsock
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: clamd-deployer-token-dwzg1
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: clamd-deployer-dockercfg-zffvz
  nodeName: ip-172-31-75-193.us-east-2.compute.internal
  nodeSelector:
    type: compute
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seLinuxOptions:
      level: s0
  serviceAccount: clamd-deployer
  serviceAccountName: clamd-deployer
  terminationGracePeriodSeconds: 10
  volumes:
  - hostPath:
      path: /var/run/clamd.scan
    name: clamsock
  - emptyDir: {}
    name: clamsigs
  - name: token
    secret:
      defaultMode: 420
      secretName: clamd-image-watcher-token
  - hostPath:
      path: /usr/bin/oc
    name: usr-bin-oc
  - name: clamd-deployer-token-dwzg1
    secret:
      defaultMode: 420
      secretName: clamd-deployer-token-dwzg1
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-06-26T15:36:50Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-07-29T01:59:26Z
    message: 'containers with unready status: [oso-clamd]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-06-26T15:36:56Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72
    image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest
    imageID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd@sha256:90ba009c48d28494aac15240f41d311a5384db3e97649f5eeffc1cefb055b9e6
    lastState:
      terminated:
        containerID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72
        exitCode: 1
        finishedAt: 2017-07-29T01:59:26Z
        reason: Error
        startedAt: 2017-07-29T01:59:17Z
    name: oso-clamd
    ready: false
    restartCount: 1141
    state:
      waiting:
        message: Back-off 5m0s restarting failed container=oso-clamd pod=oso-clamd-57h3b_management-infra(456773b5-5a85-11e7-ba4c-02306c0cdc4b)
        reason: CrashLoopBackOff
  - containerID: docker://631352cbb39726531fab65c7b5621a513d570e47035a4a0ba8e049f8d730b713
    image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest
    imageID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9
    lastState:
      terminated:
        containerID: docker://6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419
        exitCode: 1
        finishedAt: 2017-07-24T20:22:48Z
        message: 'Error on reading termination log /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2:
          open /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2:
          no such file or directory'
        reason: Error
        startedAt: 2017-07-19T23:14:02Z
    name: oso-clamd-update
    ready: true
    restartCount: 12
    state:
      running:
        startedAt: 2017-07-24T21:35:45Z
  hostIP: 172.31.75.193
  phase: Running
  podIP: 10.128.5.52
  qosClass: BestEffort
  startTime: 2017-06-26T15:36:50Z

Comment 3 Eric Paris 2017-07-31 15:03:50 UTC

Created attachment 1307158 [details]
Backtrace in readable form

Comment 4 Eric Paris 2017-07-31 15:24:14 UTC

Unlike the previously referenced issues this one does not appear to be an oadm drain issue. It is either a docker or a kubelet issue. This is yet another 'pods stuck terminating' problem. Which can have nearly infinite causes.

Comment 5 Seth Jennings 2017-07-31 16:43:16 UTC

Quick triage:

database-2-5fcz6
5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24

Container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 failed to exit within 30 seconds of signal 15 - using the force
container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found

mongodb-8-jj7xp
ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d

Container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d failed to exit within 30 seconds of signal 15 - using the force
container kill failed because of 'container not found' or 'no such process': Cannot kill container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d: rpc error: code = 2 desc = containerd: container not found

oso-clamd-57h3b
6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419

Error removing mounted layer 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419: failed to remove device 6af5352dd32d0c60a7bdf47dd956af1cd8f9175263cfae113b7cd666d89324db:Device is Busy
Handler for DELETE /v1.24/containers/6a8654f423a0 returned error: Driver devicemapper failed to remove root filesystem 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419: failed to remove device 6af5352dd32d0c60a7bdf47dd956af1cd8f9175263cfae113b7cd666d89324db:Device is Busy


database-2-5fcz6 and mongodb-8-jj7xp removal is failing to remove because containerd reports the containers don't exist

oso-clamd-57h3b removal is failing because of a busy mount point

Comment 6 Seth Jennings 2017-07-31 17:43:01 UTC

The docker log is mostly a loop of the same errors over and over.

There are three types:

There are 34 unique device mapper ids that result in recurring "Error removing mounted layer" errors.

There are 48 unique container ids that result in recurring "No such container" errors.

database-2-5fcz6 and mongodb-8-jj7xp are only two that result in recurring "Cannot kill container ... containerd: container not found" messages.

While the first two error types generate a ton of logging noise and are messy, I think the last one with "containerd: container not found" is indicative of the issue.

I imagine that containerd is successfully deleting the container but the kubelet isn't getting notification of the success and thus continues trying to delete the container and containerd doesn't know about it anymore.

Not sure how that would happen yet.

Comment 7 Jhon Honce 2017-07-31 19:17:16 UTC

Which build of docker are you running?

$ rpm -q -a |grep docker


Thanks!

Comment 8 Eric Paris 2017-07-31 19:20:49 UTC

Why would you make that comment private?

# rpm -qa \*docker\*
docker-client-1.12.6-30.git97ba2c0.el7.x86_64
docker-common-1.12.6-30.git97ba2c0.el7.x86_64
docker-1.12.6-30.git97ba2c0.el7.x86_64
docker-rhel-push-plugin-1.12.6-30.git97ba2c0.el7.x86_64
atomic-openshift-docker-excluder-3.6.170-1.git.0.9eec78e.el7.noarch

Comment 9 Seth Jennings 2017-07-31 20:02:04 UTC

I didn't notice before that 'docker ps' _does_ show the containers of which containerd has no knowledge.  I'm not sure if the kubelet can be blamed if docker itself is also under the impression that the container still exists.

Unfortunately, the logs don't go back far enough to see what happened the first time the container stop/delete was attempted.

Assigning to containers to figure out why there is a disconnect between docker and containerd.  Might also be a dup, but I'll let Containers decide.

Comment 10 Eric Paris 2017-07-31 20:10:27 UTC

Antonio, does this not look like a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1460729 ?

Comment 11 Antonio Murdaca 2017-08-01 15:03:22 UTC

(In reply to Eric Paris from comment #10)
> Antonio, does this not look like a dup of
> https://bugzilla.redhat.com/show_bug.cgi?id=1460729 ?

From my POV, this bug is the same as the bug you linked previously.
For reference, that bug was caused by building docker using a wrong docker and containerd commit.
I referenced the commits here https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c41

Containerd, specifically, was missing this commit https://github.com/projectatomic/containerd/commit/988be67cb9f5cf55a2591ab50ba25f4d9055b3ac

I backported that containerd commit from upstream, then I aligned docker commit to f55a11849bb4b654c511f0d9cfe7e25e833c2bed (which is part of a DevMapper fixes wihch should have gone together, but for some reason we built from a different commit).

The combination of the above, and more importantly the containerd backport fixed this issue as shown in https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c54 (which is indeed VERIFIED now).

Comment 12 Antonio Murdaca 2017-08-01 15:08:21 UTC

The commits listed in https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c41 (referenced bug) are here:

docker: https://github.com/projectatomic/docker/commit/f55a11849bb4b654c511f0d9cfe7e25e833c2bed
containerd: https://github.com/projectatomic/containerd/commit/fa8fb3d455e1baf716f3131581f0ed8b07c573a6

Comment 13 Jeff Ligon 2017-08-01 15:19:34 UTC


*** This bug has been marked as a duplicate of bug 1460729 ***