Bug 1673692

Summary: "docker ps" hangs after Docker daemon using live-restore is restarted and containers are deleted
Product: Red Hat Enterprise Linux 7 Reporter: Robb Manes <rmanes>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED DUPLICATE QA Contact: atomic-bugs <atomic-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.5CC: ahoness, amurdaca, andres, dornelas, lsm5, mnoguera, pasik, pdwyer, rbost
Target Milestone: rcKeywords: Extras, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-14 22:46:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913    

Description Robb Manes 2019-02-07 19:17:08 UTC
Description of problem:
Remarkably similar to or a regression of bz1566439.

On the latest shipped Docker, running with live-restore, deleting containers running httpd after a restart will cause all further attempts of `docker ps` to hang.  This is being seen in a high-profile OpenStack environment and I can reproduce it at will locally.

Version-Release number of selected component (if applicable):
# rpm -qa | grep docker
docker-client-1.13.1-90.git07f3374.el7.x86_64
docker-common-1.13.1-90.git07f3374.el7.x86_64
docker-1.13.1-90.git07f3374.el7.x86_64
docker-rhel-push-plugin-1.13.1-90.git07f3374.el7.x86_64

How reproducible:
Consistently

Steps to Reproduce:
To be followed up in a seperate comment.

Actual results:
`docker ps` appears to hang - notably, all Docker commands from the client continue to hang.

Expected results:
`docker` client commands should not hang.

Comment 3 Robb Manes 2019-02-07 19:23:45 UTC
I checked the client and it appears to be waiting on the daemon, as we'd expect:

# dlv attach 1480

(dlv) goroutines
  Goroutine 1 - User: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:55 github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp.Do (0x7f05df)
  Goroutine 2 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 3 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 4 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 5 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/sigqueue.go:139 os/signal.signal_recv (0x447db6)
  Goroutine 6 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 9 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/lock_futex.go:227 runtime.notetsleepg (0x417832)
  Goroutine 10 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/netpoll.go:173 internal/poll.runtime_pollWait (0x42e4a7)
  Goroutine 11 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/transport.go:1822 net/http.(*persistConn).writeLoop (0x71e92b)
  Goroutine 18 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 21 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/transport.go:2033 net/http.(*persistConn).roundTrip (0x71f3b7)
[11 goroutines]

(dlv) goroutine 1

(dlv) bt
 0  0x0000000000433b0a in runtime.gopark
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292
 1  0x0000000000442fe0 in runtime.selectgo
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/select.go:392
 2  0x00000000007f05df in github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp.Do
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:55
 3  0x000000000080a08e in github.com/docker/docker/client.(*Client).doRequest
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/request.go:127
 4  0x0000000000809fb1 in github.com/docker/docker/client.(*Client).sendRequest
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/request.go:121
 5  0x00000000008093cb in github.com/docker/docker/client.(*Client).get
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/request.go:36
 6  0x00000000007f6db2 in github.com/docker/docker/client.(*Client).ContainerList
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/container_list.go:47
 7  0x000000000099e129 in github.com/docker/docker/cli/command/container.runPs
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/cli/command/container/list.go:121
 8  0x00000000009ab2e3 in github.com/docker/docker/cli/command/container.NewPsCommand.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/cli/command/container/list.go:37
 9  0x0000000000589d30 in github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:646
10  0x000000000058a484 in github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:742
11  0x000000000058a17b in github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:695
12  0x0000000000a3b933 in main.main
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/cmd/docker/docker.go:104
13  0x00000000004336b2 in runtime.main
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:198
14  0x000000000045e951 in runtime.goexit
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/asm_amd64.s:2361

We're sitting in select waiting for a case to return, both channels from a goroutine:

// Do sends an HTTP request with the provided http.Client and returns an HTTP response.
// If the client is nil, http.DefaultClient is used.
// If the context is canceled or times out, ctx.Err() will be returned.
func Do(ctx context.Context, client *http.Client, req *http.Request) (*http.Response, error) {
- - - - 8< - - - -
       go func() {
	        resp, err := client.Do(req)
	        testHookDoReturned()
	        result <- responseAndError{resp, err}
	}()

	var resp *http.Response

	select {
	case <-ctx.Done():
- - - - 8< - - - -
        case r := <-result:
- - - - 8< - - - -

(dlv) frame 2
> runtime.gopark() /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 (PC: 0x433b0a)
Warning: debugging optimized function
Frame 2: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:55 (PC: 7f05df)
    50:			result <- responseAndError{resp, err}
    51:		}()
    52:	
    53:		var resp *http.Response
    54:	
=>  55:		select {
    56:		case <-ctx.Done():
    57:			testHookContextDoneBeforeHeaders()
    58:			cancel()
    59:			// Clean up after the goroutine calling client.Do:
    60:			go func() {

Taking a look at the daemon now.

Comment 4 Robb Manes 2019-02-07 19:28:13 UTC
(In reply to Robb Manes from comment #0)
> Actual results:
> `docker ps` appears to hang - notably, all Docker commands from the client
> continue to hang.
> 
> Expected results:
> `docker` client commands should not hang.

This is in error, other commands work fine, just not `docker ps`.

Comment 6 Robb Manes 2019-02-07 23:16:32 UTC
I think maybe we're blocking here during this kill operation, at least on the other side of the docker client:

(dlv) goroutine
Thread 18486 at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/syscall/asm_linux_amd64.s:53
Goroutine 673:
	Runtime: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x4408fa)
	User: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:239 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Header (0x7e89a8)
	Go: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:2795 net/http.(*Server).Serve (0x73a74b)
	Start: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:1719 net/http.(*conn).serve (0x735eb0)

(dlv) bt
 0  0x00000000004408fa in runtime.gopark
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292
 1  0x00000000004506d0 in runtime.selectgo
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/select.go:392
 2  0x00000000007e89a8 in github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Header
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:239
 3  0x000000000080b0a3 in github.com/docker/docker/vendor/google.golang.org/grpc.recvResponse
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:62
 4  0x000000000080c037 in github.com/docker/docker/vendor/google.golang.org/grpc.invoke
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:209
 5  0x000000000080b7ec in github.com/docker/docker/vendor/google.golang.org/grpc.Invoke
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:119
 6  0x00000000008bc992 in github.com/docker/docker/vendor/github.com/docker/containerd/api/grpc/types.(*aPIClient).Signal
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/docker/containerd/api/grpc/types/api.pb.go:1042
 7  0x00000000008ccda3 in github.com/docker/docker/libcontainerd.(*client).Signal
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/libcontainerd/client_unix.go:93
 8  0x000000000120607b in github.com/docker/docker/daemon.(*Daemon).kill
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:163
 9  0x000000000120579d in github.com/docker/docker/daemon.(*Daemon).killWithSignal
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:94
10  0x0000000001205ef6 in github.com/docker/docker/daemon.(*Daemon).killPossiblyDeadProcess
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:153
11  0x0000000001223d9d in github.com/docker/docker/daemon.(*Daemon).containerStop
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/stop.go:52
12  0x00000000011ec03b in github.com/docker/docker/daemon.(*Daemon).cleanupContainer
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/delete.go:95
13  0x00000000011eb699 in github.com/docker/docker/daemon.(*Daemon).ContainerRm
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/delete.go:46
14  0x000000000125a2a8 in github.com/docker/docker/api/server/router/container.(*containerRouter).deleteContainers
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/router/container/container_routes.go:398
15  0x000000000125f049 in github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.deleteContainers)-fm
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/router/container/container.go:75
16  0x0000000001238c58 in github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/middleware/experimental.go:27
17  0x0000000001238fe8 in github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/middleware/version.go:47
18  0x00000000013eab8e in github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/pkg/authorization/middleware.go:69
19  0x0000000001237f9d in github.com/docker/docker/api/server/middleware.AuditMiddleware.WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/middleware/audit_linux.go:27
20  0x0000000001249e45 in github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/server.go:146
21  0x00000000007374e4 in net/http.HandlerFunc.ServeHTTP
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:1947
22  0x0000000000760716 in github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
23  0x00000000012487b0 in github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/router_swapper.go:29
24  0x000000000073a2ac in net/http.serverHandler.ServeHTTP
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:2694
25  0x0000000000736501 in net/http.(*conn).serve
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:1830
26  0x000000000046e231 in runtime.goexit
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/asm_amd64.s:2361

(dlv) frame 10
> runtime.gopark() /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 (PC: 0x4408fa)
Warning: debugging optimized function
Frame 10: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:153 (PC: 1205ef6)
   148:		return nil
   149:	}
   150:	
   151:	// killPossibleDeadProcess is a wrapper around killSig() suppressing "no such process" error.
   152:	func (daemon *Daemon) killPossiblyDeadProcess(container *container.Container, sig int) error {
=> 153:		err := daemon.killWithSignal(container, sig)
   154:		if err == syscall.ESRCH {
   155:			e := errNoSuchProcess{container.GetPID(), sig}
   156:			logrus.Debug(e)
   157:			return e
   158:		}

The container we try to kill:

(dlv) p container
*github.com/docker/docker/container.Container {
	CommonContainer: github.com/docker/docker/container.CommonContainer {
		StreamConfig: *(*github.com/docker/docker/container/stream.Config)(0xc42008e080),
		State: *(*github.com/docker/docker/container.State)(0xc420444cb0),
		Root: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+27 more",
		BaseFS: "/var/lib/docker/devicemapper/mnt/347d19aca36709179640b166bffb562...+40 more",
		RWLayer: github.com/docker/docker/layer.RWLayer(*github.com/docker/docker/layer.referencedRWLayer) ...,
		ID: "aa0b529e662e16771dc2590288b4360902999ef7c5c17e426a7ba5fa61cf8315",
		Created: (*time.Time)(0xc42036c250),
		Managed: false,
		Path: "kolla_start",
		Args: []string len: 0, cap: 0, [],
		Config: *(*github.com/docker/docker/api/types/container.Config)(0xc4200f8000),
		ImageID: "sha256:e60c79449957d047adc0a4d0b617dd55585b9126f598d36dd5131e5c4...+7 more",
		NetworkSettings: *(*github.com/docker/docker/daemon/network.Settings)(0xc4201aa640),
		LogPath: "",
		Name: "/agitated_joliot",
		Driver: "devicemapper",
		MountLabel: "system_u:object_r:svirt_sandbox_file_t:s0:c532,c827",
		ProcessLabel: "system_u:system_r:svirt_lxc_net_t:s0:c532,c827",
		RestartCount: 0,
		HasBeenStartedBefore: true,
		HasBeenManuallyStopped: true,
		MountPoints: map[string]*github.com/docker/docker/volume.MountPoint [...],
		HostConfig: *(*github.com/docker/docker/api/types/container.HostConfig)(0xc420027400),
		ExecCommands: *(*github.com/docker/docker/daemon/exec.Store)(0xc4208ab0c0),
		SecretStore: github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/exec.SecretGetter nil,
		SecretReferences: []*github.com/docker/docker/api/types/swarm.SecretReference len: 0, cap: 0, nil,
		LogDriver: github.com/docker/docker/daemon/logger.Logger(*github.com/docker/docker/daemon/logger/journald.journald) ...,
		LogCopier: *(*github.com/docker/docker/daemon/logger.Copier)(0xc420024540),
		restartManager: github.com/docker/docker/restartmanager.RestartManager(*github.com/docker/docker/restartmanager.restartManager) ...,
		attachContext: *(*github.com/docker/docker/container.attachContext)(0xc4208ab120),},
	AppArmorProfile: "",
	HostnamePath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+36 more",
	HostsPath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+33 more",
	ShmPath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+31 more",
	ResolvConfPath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+39 more",
	SeccompProfile: "",
	NoNewPrivileges: false,}

Pretty sure this is just calling to another process, however, so seeing which ones we need to look at.

Comment 8 Robb Manes 2019-02-08 00:13:12 UTC
.....uhh, reliably, if I attach delve to /usr/bin/docker-containerd-current:

# ps aux | grep docker
root     14213  0.0  0.2  70936  8352 ?        Ssl  18:30   0:00 /usr/libexec/docker/rhel-push-plugin
root     18758  1.0  0.9 893444 35024 ?        Ssl  19:00   0:00 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --insecure-registry rhos-qe-mirror-qeos.usersys.redhat.com:5000 --add-registry registry.access.redhat.com --add-registry docker.io --add-registry registry.fedoraproject.org --add-registry quay.io --add-registry registry.centos.org
root     18766  0.5  0.2 238208 10600 ?        Ssl  19:00   0:00 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --runtime-args --systemd-cgroup=true
root     18858  0.4  0.2 156800 10680 pts/0    Sl+  19:00   0:00 /usr/bin/docker-current rm -f 65469d55be21 4f6b2d14b291
root     18922  1.2  0.2  84124 10688 pts/1    Sl   19:00   0:00 /usr/bin/docker-current ps
root     18929  0.0  0.0 112712   980 pts/1    S+   19:00   0:00 grep --color=auto docker

# dlv attach 18766

My hung docker rm returns:

# docker rm -f 65469d55be21 4f6b2d14b291
Error response from daemon: Cannot kill container 65469d55be21607d8c02d2eaf99a83324e07b8c04fb53e375bb9509aa991d23d: rpc error: code = 14 desc = grpc: the connection is unavailable
Error response from daemon: Cannot kill container 4f6b2d14b291d4765e8fbd5f369f3b169e87293d697e1324107b2786abe9290c: rpc error: code = 14 desc = grpc: the connection is unavailable

And docker ps, which I had running in the background, returns:

CONTAINER ID        IMAGE                                                                    COMMAND             CREATED             STATUS              PORTS               NAMES
65469d55be21        rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-keystone   "kolla_start"       3 minutes ago       Up 3 minutes                            vibrant_minsky
4f6b2d14b291        rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-horizon    "kolla_start"       3 minutes ago       Up 3 minutes                            eager_bhaskara

I've done this four times now, spending significant time debugging docker client and the dockerd-current process, it only happens when I attach delve to the docker-containerd-current task.

GDB does it too:

# gdb -p 6407

Returns immediately, like it just unblocked:

Error response from daemon: Cannot kill container b50037abc5e1e03d1cb2bc3fcbb4a42c8642de6b254bbe548e5d5183aead162c: rpc error: code = 14 desc = grpc: the connection is unavailable
Error response from daemon: Cannot kill container a630d325228f5c5d5dc58673bae80c076160bf1525ab8d758ace5650b7716f41: rpc error: code = 14 desc = grpc: the connection is unavailable

Comment 10 Robb Manes 2019-02-08 01:28:11 UTC
I can't reproduce this on .88:

# rpm -qa | grep docker
docker-1.13.1-88.git07f3374.el7.x86_64
docker-rhel-push-plugin-1.13.1-88.git07f3374.el7.x86_64
docker-client-1.13.1-88.git07f3374.el7.x86_64
docker-common-1.13.1-88.git07f3374.el7.x86_64

The containerd commits between these versions:

$ git log 923a3878..b9680343 --oneline
b968034 use epoll_ctl(EPOLL_CTL_DEL) to remove oom fd from epoll
b54277a Set EFD_CLOEXEC for memory event fd.
3f92835 fix redundant if ... error to make ci happy
6e9b04b runtime: allocate cmdDoneCh on loadProcess

Of these, it looks like b968034 might block us in this scenario but trying to verify...

Comment 11 Antonio Murdaca 2019-02-08 17:39:23 UTC
We've identified the broken commit which introduced this regression and reverted it https://github.com/projectatomic/containerd/commit/7989550b83317f799af20ab4df3a5b6487767fc9

Comment 15 Robb Manes 2019-02-14 22:46:20 UTC

*** This bug has been marked as a duplicate of bug 1671861 ***