Description of problem: Following recommended workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1473777 , I was killing oadm drain operations after they hung for Version-Release number of selected component (if applicable): v3.6.172.0.1 How reproducible: Unknown Steps to Reproduce: 1. Repeated oadm drain operations on a cluster 2. 3. Actual results: oadm hangs while draining a particular node. Even if the operation is aborted with a kill PID, when openshift ansible retries the operation, it will hang again. Expected results: oadm should always reliably succeed if there is capacity on the cluster. Additional info: Jenkins log: https://atomic-e2e-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/aos-cd/job/cluster%252Foperation/286/console Look at the end of the job and search on "oadm drain ip-172-31-75-193.us-east-2.compute.internal" . At least 6 attempts, each running 20 minutes or more all failed to drain the node. The reason the oadm drain operation terminates at all is that I was running a kill on the associated PID on the master after waiting sufficient time for each operation to complete. The node never did drain. [root@free-stg-master-03fb6 ~]# oadm manage-node ip-172-31-75-193.us-east-2.compute.internal --list-pods Listing matched pods on node: ip-172-31-75-193.us-east-2.compute.internal NAME READY STATUS RESTARTS AGE oso-clamd-57h3b 1/2 Error 1150 32d mongodb-8-jj7xp 0/1 Terminating 0 8d database-2-5fcz6 1/1 Terminating 0 8d Docker journal from failing node: http://file.rdu.redhat.com/~jupierce/share/free-stg-node-compute-c5a63-docker-drain-fail.log In the log, lots of entries like: Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.188649580Z" level=info msg="Container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 failed to exit within 30 seconds of signal 15 - using the force" Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.189330600Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found" Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.382396874Z" level=info msg="{Action=stop, LoginUID=4294967295, PID=22560}" Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.385072466Z" level=error msg="Handler for POST /v1.24/containers/4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048/stop?t=10 returned error: Container 4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048 is already stopped" Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.385133638Z" level=error msg="Handler for POST /v1.24/containers/4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048/stop returned error: Container 4f4bc438133d900ce455a969f5764a3888f12ada0c4cfd7e024b81eda4b67048 is already stopped" Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.523115223Z" level=info msg="{Action=stop, LoginUID=4294967295, PID=22560}" Jul 28 22:28:23 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:23.524525391Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found" Jul 28 22:28:32 ip-172-31-75-193.us-east-2.compute.internal dockerd-current[87955]: time="2017-07-28T22:28:32.560705254Z" level=info msg="Container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d failed to exit within 30 seconds of signal 15 - using the force" [root@free-stg-node-compute-c5a63 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 631352cbb397 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9 "/bin/sh -c /usr/loca" 4 days ago Up 4 days k8s_oso-clamd-update_oso-clamd-57h3b_management-infra_456773b5-5a85-11e7-ba4c-02306c0cdc4b_12 5b193c9c4c72 registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec "container-entrypoint" 4 days ago Up 4 days k8s_mysql_database-2-5fcz6_yasun-1-1_f17b1c10-6d80-11e7-91e2-02306c0cdc4b_0 ffcccf58ec0c registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece "container-entrypoint" 4 days ago Up 4 days k8s_mongodb_mongodb-8-jj7xp_xyz_f1c9f7b6-6d80-11e7-91e2-02306c0cdc4b_0 d1ebd69692fa registry.ops.openshift.com/openshift3/ose-pod:v3.6.152.0 "/usr/bin/pod" 4 days ago Up 4 days k8s_POD_oso-clamd-57h3b_management-infra_456773b5-5a85-11e7-ba4c-02306c0cdc4b_12 a74b18a71be0 registry.ops.openshift.com/ops/oso-rhel7-host-monitoring:stg "/bin/sh -c /usr/loca" 9 days ago Up 9 days oso-rhel7-host-monitoring Stack trace of oadm drain interrupted with SIGABRT: WARNING: Deleting pods with local storage: oso-clamd-57h3b; Ignoring DaemonSet-managed pods: oso-clamd-57h3b\nSIGABRT: abort\nPC=0x463551 m=0\n\ngoroutine 0 [idle]:\nruntime.futex(0xc031eb0, 0x0, 0x0, 0x0, 0xc400000000, 0x3600000035, 0x0, 0x0, 0x7ffdd4464528, 0x411f82, ...)\n\t/usr/lib/golang/src/runtime/sys_linux_amd64.s:387 +0x21\nruntime.futexsleep(0xc031eb0, 0x0, 0xffffffffffffffff)\n\t/usr/lib/golang/src/runtime/os_linux.go:45 +0x62\nruntime.notesleep(0xc031eb0)\n\t/usr/lib/golang/src/runtime/lock_futex.go:145 +0x82\nruntime.stopm()\n\t/usr/lib/golang/src/runtime/proc.go:1594 +0xad\nruntime.findrunnable(0xc420019500, 0x0)\n\t/usr/lib/golang/src/runtime/proc.go:2021 +0x228\nruntime.schedule()\n\t/usr/lib/golang/src/runtime/proc.go:2120 +0x14c\nruntime.goexit0(0xc4210f6820)\n\t/usr/lib/golang/src/runtime/proc.go:2257 +0x180\nruntime.mcall(0x7ffdd44646d0)\n\t/usr/lib/golang/src/runtime/asm_amd64.s:240 +0x5b\n\ngoroutine 1 [select, 1 minutes]:\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods(0xc42101f970, 0xc4210f3800, 0x2, 0x2, 0xc4201df5d0, 0xe, 0xc42012b650, 0x0, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:526 +0x44d\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).deleteOrEvictPods(0xc42101f970, 0xc4210f3800, 0x2, 0x2, 0x0, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:480 +0xfd\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).deleteOrEvictPodsSimple(0xc42101f970, 0x1, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:255 +0x85\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).RunDrain(0xc42101f970, 0x0, 0xc4210f9a90)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:242 +0x5b\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.NewCmdDrain.func1(0xc4201d0fc0, 0xc4210f9a90, 0x1, 0x5)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:183 +0x81\ngithub.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc4201d0fc0, 0xc4210f99f0, 0x5, 0x5, 0xc4201d0fc0, 0xc4210f99f0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:603 +0x439\ngithub.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420442240, 0xc420028008, 0xc420028018, 0xc420442240)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:689 +0x367\ngithub.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420442240, 0x4, 0xc420442240)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:648 +0x2b\nmain.main()\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/cmd/openshift/openshift.go:36 +0x196\n\ngoroutine 17 [syscall, 1 minutes, locked to thread]:\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:2086 +0x1\n\ngoroutine 5 [chan receive]:\ngithub.com/openshift/origin/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0xc0305e0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/golang/glog/glog.go:879 +0x7a\ncreated by github.com/openshift/origin/vendor/github.com/golang/glog.init.1\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/golang/glog/glog.go:410 +0x21d\n\ngoroutine 34 [syscall, 1 minutes]:\nos/signal.signal_recv(0xc42049bfa8)\n\t/usr/lib/golang/src/runtime/sigqueue.go:116 +0x157\nos/signal.loop()\n\t/usr/lib/golang/src/os/signal/signal_unix.go:22 +0x22\ncreated by os/signal.init.1\n\t/usr/lib/golang/src/os/signal/signal_unix.go:28 +0x41\n\ngoroutine 160 [select]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc4201945a0, 0x3b9aca00, 0x7fffffffffffffff, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:314 +0x1fd\ncreated by github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:328 +0x91\n\ngoroutine 96 [chan receive]:\ngithub.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc42100b920)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x94\ncreated by github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xd2\n\ngoroutine 110 [chan receive]:\ngithub.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc42103e0a0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x94\ncreated by github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xd2\n\ngoroutine 121 [select]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x7c572c8, 0x12a05f200, 0x0, 0x1809301, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:108 +0x16e\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0x7c572c8, 0x12a05f200, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:52 +0x4d\ncreated by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/logs.InitLogs\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/logs/logs.go:50 +0xa8\n\ngoroutine 128 [IO wait]:\nnet.runtime_pollWait(0x7fe0773daf70, 0x72, 0x6)\n\t/usr/lib/golang/src/runtime/netpoll.go:160 +0x59\nnet.(*pollDesc).wait(0xc4210a8d80, 0x72, 0xc420036808, 0xc420010180)\n\t/usr/lib/golang/src/net/fd_poll_runtime.go:73 +0x38\nnet.(*pollDesc).waitRead(0xc4210a8d80, 0xa39aa80, 0xc420010180)\n\t/usr/lib/golang/src/net/fd_poll_runtime.go:78 +0x34\nnet.(*netFD).Read(0xc4210a8d20, 0xc421354000, 0x4000, 0x4000, 0x0, 0xa39aa80, 0xc420010180)\n\t/usr/lib/golang/src/net/fd_unix.go:243 +0x1a1\nnet.(*conn).Read(0xc420259540, 0xc421354000, 0x4000, 0x4000, 0x0, 0x0, 0x0)\n\t/usr/lib/golang/src/net/net.go:173 +0x70\ncrypto/tls.(*block).readFromUntil(0xc42121e300, 0xa390b00, 0xc420259540, 0x5, 0xc420259540, 0x0)\n\t/usr/lib/golang/src/crypto/tls/conn.go:476 +0x91\ncrypto/tls.(*Conn).readRecord(0xc4211e8380, 0x7c65b17, 0xc4211e8488, 0x0)\n\t/usr/lib/golang/src/crypto/tls/conn.go:578 +0xc4\ncrypto/tls.(*Conn).Read(0xc4211e8380, 0xc4210e5000, 0x1000, 0x1000, 0x0, 0x0, 0x0)\n\t/usr/lib/golang/src/crypto/tls/conn.go:1113 +0x116\nbufio.(*Reader).fill(0xc4211a8480)\n\t/usr/lib/golang/src/bufio/bufio.go:97 +0x10c\nbufio.(*Reader).Read(0xc4211a8480, 0xc420fa5d78, 0x9, 0x9, 0x0, 0x441ae0, 0x42e5cc)\n\t/usr/lib/golang/src/bufio/bufio.go:209 +0x1bc\nio.ReadAtLeast(0xa36f0c0, 0xc4211a8480, 0xc420fa5d78, 0x9, 0x9, 0x9, 0xc421aee580, 0xc420036d10, 0x1edd245)\n\t/usr/lib/golang/src/io/io.go:307 +0xa4\nio.ReadFull(0xa36f0c0, 0xc4211a8480, 0xc420fa5d78, 0x9, 0x9, 0xc421ae8de0, 0x50c7bd5, 0xa)\n\t/usr/lib/golang/src/io/io.go:325 +0x58\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.readFrameHeader(0xc420fa5d78, 0x9, 0x9, 0xa36f0c0, 0xc4211a8480, 0x0, 0xc400000000, 0x28f2827, 0xc420036f80)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/frame.go:237 +0x7b\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc420fa5d40, 0xc421b2aea0, 0x0, 0x0, 0x0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/frame.go:464 +0xa4\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.(*clientConnReadLoop).run(0xc420036f80, 0x7c59ef8, 0xc420498790)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:1233 +0x97\ngithub.com/openshift/origin/vendor/golang.org/x/net/http2.(*ClientConn).readLoop(0xc421150000)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:1170 +0xa6\ncreated by github.com/openshift/origin/vendor/golang.org/x/net/http2.(*Transport).NewClientConn\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:492 +0x89b\n\ngoroutine 156 [chan receive]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc42142df00, 0xc42121e360, 0xc42006c120, 0x1, 0x10)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:270 +0x5d\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal(0xc42142df00, 0xc42121e360, 0x0, 0xc42142df00)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:187 +0x41\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc42142df00, 0xc42121e360, 0xc42142df00, 0xc42121e360)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:212 +0x5a\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x3b9aca00, 0x7fffffffffffffff, 0xc42121e360, 0x413aa8, 0x300)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:201 +0x4d\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).waitForDelete(0xc42101f970, 0xc420b05500, 0x1, 0x1, 0x3b9aca00, 0x7fffffffffffffff, 0x1, 0xc42012b650, 0xa, 0xc4201fc9ca, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:583 +0x134\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods.func1(0xc42101f970, 0xc4201df5d0, 0xe, 0xc42012b650, 0x0, 0x0, 0x0, 0x0, 0xc4201fc9b0, 0xf, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:508 +0x365\ncreated by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:514 +0x19b\n\ngoroutine 142 [select]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc420889f20, 0x3b9aca00, 0x7fffffffffffffff, 0xc42006c120)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:314 +0x1fd\ncreated by github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:328 +0x91\n\ngoroutine 157 [chan receive]:\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc4214213c0, 0xc42153c030, 0xc42006c120, 0x1, 0x10)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:270 +0x5d\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal(0xc4214213c0, 0xc42153c030, 0x0, 0xc4214213c0)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:187 +0x41\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc4214213c0, 0xc42153c030, 0xc4214213c0, 0xc42153c030)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:212 +0x5a\ngithub.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x3b9aca00, 0x7fffffffffffffff, 0xc42153c030, 0x413aa8, 0x300)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:201 +0x4d\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).waitForDelete(0xc42101f970, 0xc4211e3800, 0x1, 0x1, 0x3b9aca00, 0x7fffffffffffffff, 0x1, 0xc42012b650, 0xb, 0xc4201fcf40, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:583 +0x134\ngithub.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods.func1(0xc42101f970, 0xc4201df5d0, 0xe, 0xc42012b650, 0x0, 0x0, 0x0, 0x0, 0xc4201fcf20, 0x10, ...)\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:508 +0x365\ncreated by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.(*DrainOptions).evictPods\n\t/builddir/build/BUILD/atomic-openshift-git-0.c989645/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/drain.go:514 +0x19b\n\nrax 0xca\nrbx 0xc030b10\nrcx 0xffffffffffffffff\nrdx 0x0\nrdi 0xc031eb0\nrsi 0x0\nrbp 0x7ffdd44644f8\nrsp 0x7ffdd44644b0\nr8 0x0\nr9 0x0\nr10 0x0\nr11 0x286\nr12 0x7c4bd5a\nr13 0xf4\nr14 0x58\nr15 0x4\nrip 0x463551\nrflags 0x286\ncs 0x33\nfs 0x0\ngs 0x0
Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1473777, but killing the process and retrying does not clear the issue.
Describe of pods still associated with the node: [root@free-stg-master-03fb6 ~]# oc get pods --all-namespaces | grep -e oso-clamd-57h3b -e mongodb-8-jj7xp -e database-2-5fcz6 management-infra oso-clamd-57h3b 1/2 Error 1153 32d xyz mongodb-8-jj7xp 0/1 Terminating 0 8d yasun-1-1 database-2-5fcz6 1/1 Terminating 0 8d [root@free-stg-master-03fb6 ~]# oc describe -n management-infra oso-clamd-57h3b the server doesn't have a resource type "oso-clamd-57h3b" [root@free-stg-master-03fb6 ~]# oc describe -n management-infra pod oso-clamd-57h3b Name: oso-clamd-57h3b Namespace: management-infra Security Policy: privileged Node: ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193 Start Time: Mon, 26 Jun 2017 15:36:50 +0000 Labels: name=oso-clamd Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"management-infra","name":"oso-clamd","uid":"1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7",... openshift.io/scc=privileged Status: Running IP: 10.128.5.52 Controllers: DaemonSet/oso-clamd Containers: oso-clamd: Container ID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72 Image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest Image ID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd@sha256:90ba009c48d28494aac15240f41d311a5384db3e97649f5eeffc1cefb055b9e6 Port: State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Sat, 29 Jul 2017 01:59:17 +0000 Finished: Sat, 29 Jul 2017 01:59:26 +0000 Ready: False Restart Count: 1141 Environment: OO_PAUSE_ON_START: false Mounts: /var/lib/clamav from clamsigs (ro) /var/run/clamd.scan from clamsock (rw) /var/run/secrets/kubernetes.io/serviceaccount from clamd-deployer-token-dwzg1 (ro) oso-clamd-update: Container ID: docker://631352cbb39726531fab65c7b5621a513d570e47035a4a0ba8e049f8d730b713 Image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest Image ID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9 Port: State: Running Started: Mon, 24 Jul 2017 21:35:45 +0000 Last State: Terminated Reason: Error Message: Error on reading termination log /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: open /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: no such file or directory Exit Code: 1 Started: Wed, 19 Jul 2017 23:14:02 +0000 Finished: Mon, 24 Jul 2017 20:22:48 +0000 Ready: True Restart Count: 12 Environment: OO_PAUSE_ON_START: false Mounts: /token from token (rw) /usr/bin/oc from usr-bin-oc (rw) /var/lib/clamav from clamsigs (rw) /var/run/clamd.scan from clamsock (rw) /var/run/secrets/kubernetes.io/serviceaccount from clamd-deployer-token-dwzg1 (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: clamsock: Type: HostPath (bare host directory volume) Path: /var/run/clamd.scan clamsigs: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: token: Type: Secret (a volume populated by a Secret) SecretName: clamd-image-watcher-token Optional: false usr-bin-oc: Type: HostPath (bare host directory volume) Path: /usr/bin/oc clamd-deployer-token-dwzg1: Type: Secret (a volume populated by a Secret) SecretName: clamd-deployer-token-dwzg1 Optional: false QoS Class: BestEffort Node-Selectors: type=compute Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 3d 1h 19107 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{oso-clamd} Warning BackOff Back-off restarting failed container 3d 37m 847 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{oso-clamd} Normal Pulling pulling image "172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest" 3d 2m 19475 kubelet, ip-172-31-75-193.us-east-2.compute.internal Warning FailedSync Error syncing pod [root@free-stg-master-03fb6 ~]# oc describe pod mongodb-8-jj7xp -n xyz Name: mongodb-8-jj7xp Namespace: xyz Security Policy: restricted Node: ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193 Start Time: Thu, 20 Jul 2017 19:23:48 +0000 Labels: deployment=mongodb-8 deploymentconfig=mongodb name=mongodb Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"xyz","name":"mongodb-8","uid":"3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b","... kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container mongodb; cpu limit for container mongodb openshift.io/deployment-config.latest-version=8 openshift.io/deployment-config.name=mongodb openshift.io/deployment.name=mongodb-8 openshift.io/scc=restricted Status: Terminating (expires Fri, 28 Jul 2017 21:24:55 +0000) Termination Grace Period: 30s IP: Controllers: ReplicationController/mongodb-8 Containers: mongodb: Container ID: docker://ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d Image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece Image ID: docker-pullable://registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece Port: 27017/TCP State: Running Started: Mon, 24 Jul 2017 21:02:39 +0000 Ready: False Restart Count: 0 Limits: cpu: 1 memory: 512Mi Requests: cpu: 60m memory: 307Mi Liveness: tcp-socket :27017 delay=30s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [/bin/sh -i -c mongo 127.0.0.1:27017/$MONGODB_DATABASE -u $MONGODB_USER -p $MONGODB_PASSWORD --eval="quit()"] delay=3s timeout=1s period=10s #success=1 #failure=3 Environment: MONGODB_USER: <set to the key 'database-user' in secret 'nodejs-mongo-persistent'> Optional: false MONGODB_PASSWORD: <set to the key 'database-password' in secret 'nodejs-mongo-persistent'> Optional: false MONGODB_DATABASE: sampledb MONGODB_ADMIN_PASSWORD: <set to the key 'database-admin-password' in secret 'nodejs-mongo-persistent'> Optional: false Mounts: /var/lib/mongodb/data from mongodb-data (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-rbzg6 (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: mongodb-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mongodb ReadOnly: false default-token-rbzg6: Type: Secret (a volume populated by a Secret) SecretName: default-token-rbzg6 Optional: false QoS Class: Burstable Node-Selectors: type=compute Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4h 4s 519 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{mongodb} Normal Killing Killing container with id docker://mongodb:Need to kill Pod [root@free-stg-master-03fb6 ~]# oc describe pod database-2-5fcz6 -n yasun-1-1 Name: database-2-5fcz6 Namespace: yasun-1-1 Security Policy: restricted Node: ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193 Start Time: Thu, 20 Jul 2017 19:23:48 +0000 Labels: deployment=database-2 deploymentconfig=database name=database Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"yasun-1-1","name":"database-2","uid":"3cc07aeb-6cd6-11e7-bb9e-02306c0c... kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container mysql; cpu limit for container mysql openshift.io/deployment-config.latest-version=2 openshift.io/deployment-config.name=database openshift.io/deployment.name=database-2 openshift.io/scc=restricted Status: Terminating (expires Fri, 28 Jul 2017 21:24:56 +0000) Termination Grace Period: 30s IP: Controllers: ReplicationController/database-2 Containers: mysql: Container ID: docker://5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 Image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec Image ID: docker-pullable://registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec Port: 3306/TCP State: Running Started: Mon, 24 Jul 2017 21:08:44 +0000 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 512Mi Requests: cpu: 60m memory: 307Mi Liveness: tcp-socket :3306 delay=30s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [/bin/sh -i -c MYSQL_PWD='aE6hTcmx' mysql -h 127.0.0.1 -u userVCI -D sampledb -e 'SELECT 1'] delay=5s timeout=1s period=10s #success=1 #failure=3 Environment: MYSQL_USER: <set to the key 'database-user' in secret 'dancer-mysql-persistent'> Optional: false MYSQL_PASSWORD: <set to the key 'database-password' in secret 'dancer-mysql-persistent'> Optional: false MYSQL_DATABASE: sampledb Mounts: /var/lib/mysql/data from database-data (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-qqx38 (ro) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: database-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: database ReadOnly: false default-token-qqx38: Type: Secret (a volume populated by a Secret) SecretName: default-token-qqx38 Optional: false QoS Class: Burstable Node-Selectors: type=compute Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4h 37s 519 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{mysql} Normal Killing Killing container with id docker://mysql:Need to kill Pod [root@free-stg-master-03fb6 ~]# Yaml for pods: [root@free-stg-master-03fb6 ~]# oc get pod database-2-5fcz6 -n yasun-1-1 -o=yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"yasun-1-1","name":"database-2","uid":"3cc07aeb-6cd6-11e7-bb9e-02306c0cdc4b","apiVersion":"v1","resourceVersion":"34780432"}} kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container mysql; cpu limit for container mysql' openshift.io/deployment-config.latest-version: "2" openshift.io/deployment-config.name: database openshift.io/deployment.name: database-2 openshift.io/scc: restricted creationTimestamp: 2017-07-20T19:23:43Z deletionGracePeriodSeconds: 30 deletionTimestamp: 2017-07-28T21:24:56Z generateName: database-2- labels: deployment: database-2 deploymentconfig: database name: database name: database-2-5fcz6 namespace: yasun-1-1 ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: database-2 uid: 3cc07aeb-6cd6-11e7-bb9e-02306c0cdc4b resourceVersion: "36769986" selfLink: /api/v1/namespaces/yasun-1-1/pods/database-2-5fcz6 uid: f17b1c10-6d80-11e7-91e2-02306c0cdc4b spec: containers: - env: - name: MYSQL_USER valueFrom: secretKeyRef: key: database-user name: dancer-mysql-persistent - name: MYSQL_PASSWORD valueFrom: secretKeyRef: key: database-password name: dancer-mysql-persistent - name: MYSQL_DATABASE value: sampledb image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec imagePullPolicy: Always livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 tcpSocket: port: 3306 timeoutSeconds: 1 name: mysql ports: - containerPort: 3306 protocol: TCP readinessProbe: exec: command: - /bin/sh - -i - -c - MYSQL_PWD='aE6hTcmx' mysql -h 127.0.0.1 -u userVCI -D sampledb -e 'SELECT 1' failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: "1" memory: 512Mi requests: cpu: 60m memory: 307Mi securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID - SYS_CHROOT privileged: false runAsUser: 1005000000 seLinuxOptions: level: s0:c71,c15 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/mysql/data name: database-data - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-qqx38 readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-24g9f nodeName: ip-172-31-75-193.us-east-2.compute.internal nodeSelector: type: compute restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1005000000 seLinuxOptions: level: s0:c71,c15 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: database-data persistentVolumeClaim: claimName: database - name: default-token-qqx38 secret: defaultMode: 420 secretName: default-token-qqx38 status: conditions: - lastProbeTime: null lastTransitionTime: 2017-07-20T19:23:48Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2017-07-25T22:22:13Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2017-07-20T19:23:43Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec imageID: docker-pullable://registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec lastState: {} name: mysql ready: true restartCount: 0 state: running: startedAt: 2017-07-24T21:08:44Z hostIP: 172.31.75.193 phase: Running qosClass: Burstable startTime: 2017-07-20T19:23:48Z [root@free-stg-master-03fb6 ~]# oc get pod mongodb-8-jj7xp -n xyz -o=yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"xyz","name":"mongodb-8","uid":"3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b","apiVersion":"v1","resourceVersion":"34780694"}} kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container mongodb; cpu limit for container mongodb' openshift.io/deployment-config.latest-version: "8" openshift.io/deployment-config.name: mongodb openshift.io/deployment.name: mongodb-8 openshift.io/scc: restricted creationTimestamp: 2017-07-20T19:23:43Z deletionGracePeriodSeconds: 30 deletionTimestamp: 2017-07-28T21:24:55Z generateName: mongodb-8- labels: deployment: mongodb-8 deploymentconfig: mongodb name: mongodb name: mongodb-8-jj7xp namespace: xyz ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: mongodb-8 uid: 3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b resourceVersion: "36770066" selfLink: /api/v1/namespaces/xyz/pods/mongodb-8-jj7xp uid: f1c9f7b6-6d80-11e7-91e2-02306c0cdc4b spec: containers: - env: - name: MONGODB_USER valueFrom: secretKeyRef: key: database-user name: nodejs-mongo-persistent - name: MONGODB_PASSWORD valueFrom: secretKeyRef: key: database-password name: nodejs-mongo-persistent - name: MONGODB_DATABASE value: sampledb - name: MONGODB_ADMIN_PASSWORD valueFrom: secretKeyRef: key: database-admin-password name: nodejs-mongo-persistent image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece imagePullPolicy: Always livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 tcpSocket: port: 27017 timeoutSeconds: 1 name: mongodb ports: - containerPort: 27017 protocol: TCP readinessProbe: exec: command: - /bin/sh - -i - -c - mongo 127.0.0.1:27017/$MONGODB_DATABASE -u $MONGODB_USER -p $MONGODB_PASSWORD --eval="quit()" failureThreshold: 3 initialDelaySeconds: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: "1" memory: 512Mi requests: cpu: 60m memory: 307Mi securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID - SYS_CHROOT privileged: false runAsUser: 1000130000 seLinuxOptions: level: s0:c11,c10 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/mongodb/data name: mongodb-data - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-rbzg6 readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-2zp9s nodeName: ip-172-31-75-193.us-east-2.compute.internal nodeSelector: type: compute restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000130000 seLinuxOptions: level: s0:c11,c10 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: mongodb-data persistentVolumeClaim: claimName: mongodb - name: default-token-rbzg6 secret: defaultMode: 420 secretName: default-token-rbzg6 status: conditions: - lastProbeTime: null lastTransitionTime: 2017-07-20T19:23:48Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2017-07-25T22:22:10Z message: 'containers with unready status: [mongodb]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2017-07-20T19:23:43Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece imageID: docker-pullable://registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece lastState: {} name: mongodb ready: false restartCount: 0 state: running: startedAt: 2017-07-24T21:02:39Z hostIP: 172.31.75.193 phase: Running qosClass: Burstable startTime: 2017-07-20T19:23:48Z [root@free-stg-master-03fb6 ~]# oc get -n management-infra pod oso-clamd-57h3b -o=yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"management-infra","name":"oso-clamd","uid":"1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7","apiVersion":"extensions","resourceVersion":"22329955"}} openshift.io/scc: privileged creationTimestamp: 2017-06-26T15:36:49Z generateName: oso-clamd- labels: name: oso-clamd name: oso-clamd-57h3b namespace: management-infra ownerReferences: - apiVersion: extensions/v1beta1 blockOwnerDeletion: true controller: true kind: DaemonSet name: oso-clamd uid: 1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7 resourceVersion: "36799993" selfLink: /api/v1/namespaces/management-infra/pods/oso-clamd-57h3b uid: 456773b5-5a85-11e7-ba4c-02306c0cdc4b spec: containers: - env: - name: OO_PAUSE_ON_START value: "false" image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest imagePullPolicy: Always name: oso-clamd resources: {} securityContext: capabilities: drop: - AUDIT_WRITE - CHOWN - DAC_OVERRIDE - FOWNER - FSETID - KILL - MKNOD - NET_BIND_SERVICE - NET_RAW - SETFCAP - SETGID - SETPCAP - SETUID - SYS_CHROOT privileged: true runAsUser: 10000 seLinuxOptions: level: s0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/clamav name: clamsigs readOnly: true - mountPath: /var/run/clamd.scan name: clamsock - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: clamd-deployer-token-dwzg1 readOnly: true - env: - name: OO_PAUSE_ON_START value: "false" image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest imagePullPolicy: Always name: oso-clamd-update resources: {} securityContext: privileged: false runAsUser: 10001 seLinuxOptions: level: s0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/clamav name: clamsigs - mountPath: /token name: token - mountPath: /usr/bin/oc name: usr-bin-oc - mountPath: /var/run/clamd.scan name: clamsock - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: clamd-deployer-token-dwzg1 readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: clamd-deployer-dockercfg-zffvz nodeName: ip-172-31-75-193.us-east-2.compute.internal nodeSelector: type: compute restartPolicy: Always schedulerName: default-scheduler securityContext: seLinuxOptions: level: s0 serviceAccount: clamd-deployer serviceAccountName: clamd-deployer terminationGracePeriodSeconds: 10 volumes: - hostPath: path: /var/run/clamd.scan name: clamsock - emptyDir: {} name: clamsigs - name: token secret: defaultMode: 420 secretName: clamd-image-watcher-token - hostPath: path: /usr/bin/oc name: usr-bin-oc - name: clamd-deployer-token-dwzg1 secret: defaultMode: 420 secretName: clamd-deployer-token-dwzg1 status: conditions: - lastProbeTime: null lastTransitionTime: 2017-06-26T15:36:50Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2017-07-29T01:59:26Z message: 'containers with unready status: [oso-clamd]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2017-06-26T15:36:56Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72 image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest imageID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd@sha256:90ba009c48d28494aac15240f41d311a5384db3e97649f5eeffc1cefb055b9e6 lastState: terminated: containerID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72 exitCode: 1 finishedAt: 2017-07-29T01:59:26Z reason: Error startedAt: 2017-07-29T01:59:17Z name: oso-clamd ready: false restartCount: 1141 state: waiting: message: Back-off 5m0s restarting failed container=oso-clamd pod=oso-clamd-57h3b_management-infra(456773b5-5a85-11e7-ba4c-02306c0cdc4b) reason: CrashLoopBackOff - containerID: docker://631352cbb39726531fab65c7b5621a513d570e47035a4a0ba8e049f8d730b713 image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest imageID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9 lastState: terminated: containerID: docker://6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419 exitCode: 1 finishedAt: 2017-07-24T20:22:48Z message: 'Error on reading termination log /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: open /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: no such file or directory' reason: Error startedAt: 2017-07-19T23:14:02Z name: oso-clamd-update ready: true restartCount: 12 state: running: startedAt: 2017-07-24T21:35:45Z hostIP: 172.31.75.193 phase: Running podIP: 10.128.5.52 qosClass: BestEffort startTime: 2017-06-26T15:36:50Z
Created attachment 1307158 [details] Backtrace in readable form
Unlike the previously referenced issues this one does not appear to be an oadm drain issue. It is either a docker or a kubelet issue. This is yet another 'pods stuck terminating' problem. Which can have nearly infinite causes.
Quick triage: database-2-5fcz6 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 Container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 failed to exit within 30 seconds of signal 15 - using the force container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found mongodb-8-jj7xp ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d Container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d failed to exit within 30 seconds of signal 15 - using the force container kill failed because of 'container not found' or 'no such process': Cannot kill container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d: rpc error: code = 2 desc = containerd: container not found oso-clamd-57h3b 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419 Error removing mounted layer 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419: failed to remove device 6af5352dd32d0c60a7bdf47dd956af1cd8f9175263cfae113b7cd666d89324db:Device is Busy Handler for DELETE /v1.24/containers/6a8654f423a0 returned error: Driver devicemapper failed to remove root filesystem 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419: failed to remove device 6af5352dd32d0c60a7bdf47dd956af1cd8f9175263cfae113b7cd666d89324db:Device is Busy database-2-5fcz6 and mongodb-8-jj7xp removal is failing to remove because containerd reports the containers don't exist oso-clamd-57h3b removal is failing because of a busy mount point
The docker log is mostly a loop of the same errors over and over. There are three types: There are 34 unique device mapper ids that result in recurring "Error removing mounted layer" errors. There are 48 unique container ids that result in recurring "No such container" errors. database-2-5fcz6 and mongodb-8-jj7xp are only two that result in recurring "Cannot kill container ... containerd: container not found" messages. While the first two error types generate a ton of logging noise and are messy, I think the last one with "containerd: container not found" is indicative of the issue. I imagine that containerd is successfully deleting the container but the kubelet isn't getting notification of the success and thus continues trying to delete the container and containerd doesn't know about it anymore. Not sure how that would happen yet.
Which build of docker are you running? $ rpm -q -a |grep docker Thanks!
Why would you make that comment private? # rpm -qa \*docker\* docker-client-1.12.6-30.git97ba2c0.el7.x86_64 docker-common-1.12.6-30.git97ba2c0.el7.x86_64 docker-1.12.6-30.git97ba2c0.el7.x86_64 docker-rhel-push-plugin-1.12.6-30.git97ba2c0.el7.x86_64 atomic-openshift-docker-excluder-3.6.170-1.git.0.9eec78e.el7.noarch
I didn't notice before that 'docker ps' _does_ show the containers of which containerd has no knowledge. I'm not sure if the kubelet can be blamed if docker itself is also under the impression that the container still exists. Unfortunately, the logs don't go back far enough to see what happened the first time the container stop/delete was attempted. Assigning to containers to figure out why there is a disconnect between docker and containerd. Might also be a dup, but I'll let Containers decide.
Antonio, does this not look like a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1460729 ?
(In reply to Eric Paris from comment #10) > Antonio, does this not look like a dup of > https://bugzilla.redhat.com/show_bug.cgi?id=1460729 ? From my POV, this bug is the same as the bug you linked previously. For reference, that bug was caused by building docker using a wrong docker and containerd commit. I referenced the commits here https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c41 Containerd, specifically, was missing this commit https://github.com/projectatomic/containerd/commit/988be67cb9f5cf55a2591ab50ba25f4d9055b3ac I backported that containerd commit from upstream, then I aligned docker commit to f55a11849bb4b654c511f0d9cfe7e25e833c2bed (which is part of a DevMapper fixes wihch should have gone together, but for some reason we built from a different commit). The combination of the above, and more importantly the containerd backport fixed this issue as shown in https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c54 (which is indeed VERIFIED now).
The commits listed in https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c41 (referenced bug) are here: docker: https://github.com/projectatomic/docker/commit/f55a11849bb4b654c511f0d9cfe7e25e833c2bed containerd: https://github.com/projectatomic/containerd/commit/fa8fb3d455e1baf716f3131581f0ed8b07c573a6
*** This bug has been marked as a duplicate of bug 1460729 ***