RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1673692 - "docker ps" hangs after Docker daemon using live-restore is restarted and containers are deleted
Summary: "docker ps" hangs after Docker daemon using live-restore is restarted and con...
Keywords:
Status: CLOSED DUPLICATE of bug 1671861
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Daniel Walsh
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1186913
TreeView+ depends on / blocked
 
Reported: 2019-02-07 19:17 UTC by Robb Manes
Modified: 2023-10-06 18:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-14 22:46:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1566439 0 urgent CLOSED docker ps is hanging 2023-10-06 17:46:23 UTC
Red Hat Bugzilla 1671861 0 urgent CLOSED Director deployed OCP 3.11 deployment fails with openshift-ansible getting stuck when restarting docker service on maste... 2023-10-06 18:07:02 UTC
Red Hat Knowledge Base (Solution) 3895221 0 None None None 2019-02-08 01:02:27 UTC

Description Robb Manes 2019-02-07 19:17:08 UTC
Description of problem:
Remarkably similar to or a regression of bz1566439.

On the latest shipped Docker, running with live-restore, deleting containers running httpd after a restart will cause all further attempts of `docker ps` to hang.  This is being seen in a high-profile OpenStack environment and I can reproduce it at will locally.

Version-Release number of selected component (if applicable):
# rpm -qa | grep docker
docker-client-1.13.1-90.git07f3374.el7.x86_64
docker-common-1.13.1-90.git07f3374.el7.x86_64
docker-1.13.1-90.git07f3374.el7.x86_64
docker-rhel-push-plugin-1.13.1-90.git07f3374.el7.x86_64

How reproducible:
Consistently

Steps to Reproduce:
To be followed up in a seperate comment.

Actual results:
`docker ps` appears to hang - notably, all Docker commands from the client continue to hang.

Expected results:
`docker` client commands should not hang.

Comment 3 Robb Manes 2019-02-07 19:23:45 UTC
I checked the client and it appears to be waiting on the daemon, as we'd expect:

# dlv attach 1480

(dlv) goroutines
  Goroutine 1 - User: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:55 github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp.Do (0x7f05df)
  Goroutine 2 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 3 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 4 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 5 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/sigqueue.go:139 os/signal.signal_recv (0x447db6)
  Goroutine 6 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 9 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/lock_futex.go:227 runtime.notetsleepg (0x417832)
  Goroutine 10 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/netpoll.go:173 internal/poll.runtime_pollWait (0x42e4a7)
  Goroutine 11 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/transport.go:1822 net/http.(*persistConn).writeLoop (0x71e92b)
  Goroutine 18 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x433b0a)
  Goroutine 21 - User: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/transport.go:2033 net/http.(*persistConn).roundTrip (0x71f3b7)
[11 goroutines]

(dlv) goroutine 1

(dlv) bt
 0  0x0000000000433b0a in runtime.gopark
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292
 1  0x0000000000442fe0 in runtime.selectgo
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/select.go:392
 2  0x00000000007f05df in github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp.Do
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:55
 3  0x000000000080a08e in github.com/docker/docker/client.(*Client).doRequest
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/request.go:127
 4  0x0000000000809fb1 in github.com/docker/docker/client.(*Client).sendRequest
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/request.go:121
 5  0x00000000008093cb in github.com/docker/docker/client.(*Client).get
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/request.go:36
 6  0x00000000007f6db2 in github.com/docker/docker/client.(*Client).ContainerList
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/client/container_list.go:47
 7  0x000000000099e129 in github.com/docker/docker/cli/command/container.runPs
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/cli/command/container/list.go:121
 8  0x00000000009ab2e3 in github.com/docker/docker/cli/command/container.NewPsCommand.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/cli/command/container/list.go:37
 9  0x0000000000589d30 in github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).execute
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:646
10  0x000000000058a484 in github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:742
11  0x000000000058a17b in github.com/docker/docker/vendor/github.com/spf13/cobra.(*Command).Execute
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/spf13/cobra/command.go:695
12  0x0000000000a3b933 in main.main
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/cmd/docker/docker.go:104
13  0x00000000004336b2 in runtime.main
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:198
14  0x000000000045e951 in runtime.goexit
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/asm_amd64.s:2361

We're sitting in select waiting for a case to return, both channels from a goroutine:

// Do sends an HTTP request with the provided http.Client and returns an HTTP response.
// If the client is nil, http.DefaultClient is used.
// If the context is canceled or times out, ctx.Err() will be returned.
func Do(ctx context.Context, client *http.Client, req *http.Request) (*http.Response, error) {
- - - - 8< - - - -
       go func() {
	        resp, err := client.Do(req)
	        testHookDoReturned()
	        result <- responseAndError{resp, err}
	}()

	var resp *http.Response

	select {
	case <-ctx.Done():
- - - - 8< - - - -
        case r := <-result:
- - - - 8< - - - -

(dlv) frame 2
> runtime.gopark() /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 (PC: 0x433b0a)
Warning: debugging optimized function
Frame 2: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:55 (PC: 7f05df)
    50:			result <- responseAndError{resp, err}
    51:		}()
    52:	
    53:		var resp *http.Response
    54:	
=>  55:		select {
    56:		case <-ctx.Done():
    57:			testHookContextDoneBeforeHeaders()
    58:			cancel()
    59:			// Clean up after the goroutine calling client.Do:
    60:			go func() {

Taking a look at the daemon now.

Comment 4 Robb Manes 2019-02-07 19:28:13 UTC
(In reply to Robb Manes from comment #0)
> Actual results:
> `docker ps` appears to hang - notably, all Docker commands from the client
> continue to hang.
> 
> Expected results:
> `docker` client commands should not hang.

This is in error, other commands work fine, just not `docker ps`.

Comment 6 Robb Manes 2019-02-07 23:16:32 UTC
I think maybe we're blocking here during this kill operation, at least on the other side of the docker client:

(dlv) goroutine
Thread 18486 at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/syscall/asm_linux_amd64.s:53
Goroutine 673:
	Runtime: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 runtime.gopark (0x4408fa)
	User: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:239 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Header (0x7e89a8)
	Go: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:2795 net/http.(*Server).Serve (0x73a74b)
	Start: /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:1719 net/http.(*conn).serve (0x735eb0)

(dlv) bt
 0  0x00000000004408fa in runtime.gopark
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292
 1  0x00000000004506d0 in runtime.selectgo
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/select.go:392
 2  0x00000000007e89a8 in github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).Header
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:239
 3  0x000000000080b0a3 in github.com/docker/docker/vendor/google.golang.org/grpc.recvResponse
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:62
 4  0x000000000080c037 in github.com/docker/docker/vendor/google.golang.org/grpc.invoke
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:209
 5  0x000000000080b7ec in github.com/docker/docker/vendor/google.golang.org/grpc.Invoke
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:119
 6  0x00000000008bc992 in github.com/docker/docker/vendor/github.com/docker/containerd/api/grpc/types.(*aPIClient).Signal
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/docker/containerd/api/grpc/types/api.pb.go:1042
 7  0x00000000008ccda3 in github.com/docker/docker/libcontainerd.(*client).Signal
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/libcontainerd/client_unix.go:93
 8  0x000000000120607b in github.com/docker/docker/daemon.(*Daemon).kill
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:163
 9  0x000000000120579d in github.com/docker/docker/daemon.(*Daemon).killWithSignal
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:94
10  0x0000000001205ef6 in github.com/docker/docker/daemon.(*Daemon).killPossiblyDeadProcess
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:153
11  0x0000000001223d9d in github.com/docker/docker/daemon.(*Daemon).containerStop
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/stop.go:52
12  0x00000000011ec03b in github.com/docker/docker/daemon.(*Daemon).cleanupContainer
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/delete.go:95
13  0x00000000011eb699 in github.com/docker/docker/daemon.(*Daemon).ContainerRm
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/delete.go:46
14  0x000000000125a2a8 in github.com/docker/docker/api/server/router/container.(*containerRouter).deleteContainers
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/router/container/container_routes.go:398
15  0x000000000125f049 in github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.deleteContainers)-fm
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/router/container/container.go:75
16  0x0000000001238c58 in github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/middleware/experimental.go:27
17  0x0000000001238fe8 in github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/middleware/version.go:47
18  0x00000000013eab8e in github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/pkg/authorization/middleware.go:69
19  0x0000000001237f9d in github.com/docker/docker/api/server/middleware.AuditMiddleware.WrapHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/middleware/audit_linux.go:27
20  0x0000000001249e45 in github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/server.go:146
21  0x00000000007374e4 in net/http.HandlerFunc.ServeHTTP
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:1947
22  0x0000000000760716 in github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103
23  0x00000000012487b0 in github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP
    at /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/api/server/router_swapper.go:29
24  0x000000000073a2ac in net/http.serverHandler.ServeHTTP
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:2694
25  0x0000000000736501 in net/http.(*conn).serve
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/net/http/server.go:1830
26  0x000000000046e231 in runtime.goexit
    at /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/asm_amd64.s:2361

(dlv) frame 10
> runtime.gopark() /opt/rh/go-toolset-1.10./usr/lib/go-toolset-1.10-golang/src/runtime/proc.go:292 (PC: 0x4408fa)
Warning: debugging optimized function
Frame 10: /usr/src/debug/docker-07f3374cf1b29aadf824d8b8dcbf6e63403689c6/_build/src/github.com/docker/docker/daemon/kill.go:153 (PC: 1205ef6)
   148:		return nil
   149:	}
   150:	
   151:	// killPossibleDeadProcess is a wrapper around killSig() suppressing "no such process" error.
   152:	func (daemon *Daemon) killPossiblyDeadProcess(container *container.Container, sig int) error {
=> 153:		err := daemon.killWithSignal(container, sig)
   154:		if err == syscall.ESRCH {
   155:			e := errNoSuchProcess{container.GetPID(), sig}
   156:			logrus.Debug(e)
   157:			return e
   158:		}

The container we try to kill:

(dlv) p container
*github.com/docker/docker/container.Container {
	CommonContainer: github.com/docker/docker/container.CommonContainer {
		StreamConfig: *(*github.com/docker/docker/container/stream.Config)(0xc42008e080),
		State: *(*github.com/docker/docker/container.State)(0xc420444cb0),
		Root: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+27 more",
		BaseFS: "/var/lib/docker/devicemapper/mnt/347d19aca36709179640b166bffb562...+40 more",
		RWLayer: github.com/docker/docker/layer.RWLayer(*github.com/docker/docker/layer.referencedRWLayer) ...,
		ID: "aa0b529e662e16771dc2590288b4360902999ef7c5c17e426a7ba5fa61cf8315",
		Created: (*time.Time)(0xc42036c250),
		Managed: false,
		Path: "kolla_start",
		Args: []string len: 0, cap: 0, [],
		Config: *(*github.com/docker/docker/api/types/container.Config)(0xc4200f8000),
		ImageID: "sha256:e60c79449957d047adc0a4d0b617dd55585b9126f598d36dd5131e5c4...+7 more",
		NetworkSettings: *(*github.com/docker/docker/daemon/network.Settings)(0xc4201aa640),
		LogPath: "",
		Name: "/agitated_joliot",
		Driver: "devicemapper",
		MountLabel: "system_u:object_r:svirt_sandbox_file_t:s0:c532,c827",
		ProcessLabel: "system_u:system_r:svirt_lxc_net_t:s0:c532,c827",
		RestartCount: 0,
		HasBeenStartedBefore: true,
		HasBeenManuallyStopped: true,
		MountPoints: map[string]*github.com/docker/docker/volume.MountPoint [...],
		HostConfig: *(*github.com/docker/docker/api/types/container.HostConfig)(0xc420027400),
		ExecCommands: *(*github.com/docker/docker/daemon/exec.Store)(0xc4208ab0c0),
		SecretStore: github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/exec.SecretGetter nil,
		SecretReferences: []*github.com/docker/docker/api/types/swarm.SecretReference len: 0, cap: 0, nil,
		LogDriver: github.com/docker/docker/daemon/logger.Logger(*github.com/docker/docker/daemon/logger/journald.journald) ...,
		LogCopier: *(*github.com/docker/docker/daemon/logger.Copier)(0xc420024540),
		restartManager: github.com/docker/docker/restartmanager.RestartManager(*github.com/docker/docker/restartmanager.restartManager) ...,
		attachContext: *(*github.com/docker/docker/container.attachContext)(0xc4208ab120),},
	AppArmorProfile: "",
	HostnamePath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+36 more",
	HostsPath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+33 more",
	ShmPath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+31 more",
	ResolvConfPath: "/var/lib/docker/containers/aa0b529e662e16771dc2590288b4360902999...+39 more",
	SeccompProfile: "",
	NoNewPrivileges: false,}

Pretty sure this is just calling to another process, however, so seeing which ones we need to look at.

Comment 8 Robb Manes 2019-02-08 00:13:12 UTC
.....uhh, reliably, if I attach delve to /usr/bin/docker-containerd-current:

# ps aux | grep docker
root     14213  0.0  0.2  70936  8352 ?        Ssl  18:30   0:00 /usr/libexec/docker/rhel-push-plugin
root     18758  1.0  0.9 893444 35024 ?        Ssl  19:00   0:00 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver=journald --signature-verification=false --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --insecure-registry rhos-qe-mirror-qeos.usersys.redhat.com:5000 --add-registry registry.access.redhat.com --add-registry docker.io --add-registry registry.fedoraproject.org --add-registry quay.io --add-registry registry.centos.org
root     18766  0.5  0.2 238208 10600 ?        Ssl  19:00   0:00 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc --runtime-args --systemd-cgroup=true
root     18858  0.4  0.2 156800 10680 pts/0    Sl+  19:00   0:00 /usr/bin/docker-current rm -f 65469d55be21 4f6b2d14b291
root     18922  1.2  0.2  84124 10688 pts/1    Sl   19:00   0:00 /usr/bin/docker-current ps
root     18929  0.0  0.0 112712   980 pts/1    S+   19:00   0:00 grep --color=auto docker

# dlv attach 18766

My hung docker rm returns:

# docker rm -f 65469d55be21 4f6b2d14b291
Error response from daemon: Cannot kill container 65469d55be21607d8c02d2eaf99a83324e07b8c04fb53e375bb9509aa991d23d: rpc error: code = 14 desc = grpc: the connection is unavailable
Error response from daemon: Cannot kill container 4f6b2d14b291d4765e8fbd5f369f3b169e87293d697e1324107b2786abe9290c: rpc error: code = 14 desc = grpc: the connection is unavailable

And docker ps, which I had running in the background, returns:

CONTAINER ID        IMAGE                                                                    COMMAND             CREATED             STATUS              PORTS               NAMES
65469d55be21        rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-keystone   "kolla_start"       3 minutes ago       Up 3 minutes                            vibrant_minsky
4f6b2d14b291        rhos-qe-mirror-qeos.usersys.redhat.com:5000/rhosp13/openstack-horizon    "kolla_start"       3 minutes ago       Up 3 minutes                            eager_bhaskara

I've done this four times now, spending significant time debugging docker client and the dockerd-current process, it only happens when I attach delve to the docker-containerd-current task.

GDB does it too:

# gdb -p 6407

Returns immediately, like it just unblocked:

Error response from daemon: Cannot kill container b50037abc5e1e03d1cb2bc3fcbb4a42c8642de6b254bbe548e5d5183aead162c: rpc error: code = 14 desc = grpc: the connection is unavailable
Error response from daemon: Cannot kill container a630d325228f5c5d5dc58673bae80c076160bf1525ab8d758ace5650b7716f41: rpc error: code = 14 desc = grpc: the connection is unavailable

Comment 10 Robb Manes 2019-02-08 01:28:11 UTC
I can't reproduce this on .88:

# rpm -qa | grep docker
docker-1.13.1-88.git07f3374.el7.x86_64
docker-rhel-push-plugin-1.13.1-88.git07f3374.el7.x86_64
docker-client-1.13.1-88.git07f3374.el7.x86_64
docker-common-1.13.1-88.git07f3374.el7.x86_64

The containerd commits between these versions:

$ git log 923a3878..b9680343 --oneline
b968034 use epoll_ctl(EPOLL_CTL_DEL) to remove oom fd from epoll
b54277a Set EFD_CLOEXEC for memory event fd.
3f92835 fix redundant if ... error to make ci happy
6e9b04b runtime: allocate cmdDoneCh on loadProcess

Of these, it looks like b968034 might block us in this scenario but trying to verify...

Comment 11 Antonio Murdaca 2019-02-08 17:39:23 UTC
We've identified the broken commit which introduced this regression and reverted it https://github.com/projectatomic/containerd/commit/7989550b83317f799af20ab4df3a5b6487767fc9

Comment 15 Robb Manes 2019-02-14 22:46:20 UTC

*** This bug has been marked as a duplicate of bug 1671861 ***


Note You need to log in before you can comment on or make changes to this bug.