Bug 1916383 - Crictl rm timeout
Summary: Crictl rm timeout
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: sandboxed-containers
Version: 4.7
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ariel Adam
QA Contact: Cameron Meadors
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-14 16:43 UTC by Qian Cai
Modified: 2023-03-09 01:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:00:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qian Cai 2021-01-14 16:43:25 UTC
Description of problem:
After changed the kata container's resolv.conf and using ping, "crictl rm" will hang.

# crictl rm 3a35a3ad68940
ERRO[0130] removing container "3a35a3ad68940" failed: rpc error: code = Unknown desc = unable to stop container 3a35a3ad68940484a255a808d45a2d4113e43d36761be7478d293844fd14df6d: failed to stop container 3a35a3ad68940484a255a808d45a2d4113e43d36761be7478d293844fd14df6d: StopContainer with signal killed timed out after (2m0s) 
FATA[0130] unable to remove container(s)

Steps:

# cat pod-config.json
{
    "metadata": {
        "name": "alpine-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "log_directory": "/tmp",
    "linux": {
    }
}

# cat container-pod.json
{
  "metadata": {
      "name": "alpine"
  },
  "image":{
      "image": "alpine"
  },
  "command": [
      "sleep", "3600"
  ],
  "log_path":"alpine.0.log",
  "linux": {
	"security_context": {
		"capabilities": {
			"add_capabilities": [
				"net_raw"
			]
		}
	}
  }
}

# crictl runp --runtime=kata pod-config.json 
d1993b6a0479657f928b3b7f9d427ec2e8cea1a1898f2e1be993081e7eb43130

# crictl create d1993b6a0479657f928b3b7f9d427ec2e8cea1a1898f2e1be993081e7eb43130 container-pod.json pod-config.json 
30e947cd0a85eab9634cef2975acd18d3e98367eeac1826506cea9d6d0bb784b

# crictl exec -it 30e947cd0a85eab9634cef2975acd18d3e98367eeac1826506cea9d6d0bb784b sh

# Inside the container, changed the resolv.conf to use:
nameserver 10.19.42.41
nameserver 10.11.5.19
nameserver 10.5.30.160

# ping www.google.com
PING www.google.com (74.125.68.105): 56 data bytes
64 bytes from 74.125.68.105: seq=0 ttl=100 time=255.879 ms
64 bytes from 74.125.68.105: seq=1 ttl=100 time=256.039 ms
...

# exit

# crictl rm 3a35a3ad68940

Version-Release number of selected component (if applicable):
kata-agent-1.11.3-1.el8.x86_64
kata-shim-1.11.3-1.el8.x86_64
kata-osbuilder-1.11.3-1.el8.x86_64
kata-runtime-1.11.3-1.el8.x86_64
cri-o-1.20.0-0.rhaos4.7.git845747f.el8.40.x86_64
cri-tools-1.20.0-1.el8.x86_64

How reproducible:
always

Comment 1 Qian Cai 2021-01-14 21:33:27 UTC
It looks like it can be reproduced just running circtl exec/rm iteration a few times without needing ping or /etc/resolv.conf modification.

# crictl runp --runtime=kata pod-config.json 
72d24ec1c4899df48e709b099e933393af4a8753b087a6d230ad32753f27f6ce

# crictl create 72d24ec1c4899df48e709b099e933393af4a8753b087a6d230ad32753f27f6ce container-pod1.json pod-config.json
864d956bcef03d4a42b9a7ee0439f180a670231be3d16e274b81d6048f831158

# crictl exec -it 864d956bcef03d4a42b9a7ee0439f180a670231be3d16e274b81d6048f831158 sh
/ # exit

# crictl rm 864d956bcef03d4a42b9a7ee0439f180a670231be3d16e274b81d6048f831158
ERRO[0130] removing container "864d956bcef03d4a42b9a7ee0439f180a670231be3d16e274b81d6048f831158" failed: rpc error: code = Unknown desc = unable to stop container 864d956bcef03d4a42b9a7ee0439f180a670231be3d16e274b81d6048f831158: failed to stop container 864d956bcef03d4a42b9a7ee0439f180a670231be3d16e274b81d6048f831158: StopContainer with signal killed timed out after (2m0s) 
FATA[0130] unable to remove container(s)

Comment 2 Qian Cai 2021-01-22 14:00:57 UTC
Don't need exec to reproduce it at all.

Comment 3 Qian Cai 2021-01-22 15:12:51 UTC
RRO[2021-01-22 10:10:47.602109141-05:00] StopContainer with signal killed timed out after (2m0s)  file="oci/runtime_vm.go:550"
DEBU[2021-01-22 10:10:47.602219937-05:00] RuntimeVM.StopContainer() end                 file="oci/runtime_vm.go:551"
DEBU[2021-01-22 10:10:47.602571312-05:00] Response error: StopContainer with signal killed timed out after (2m0s)
github.com/cri-o/cri-o/internal/oci.(*runtimeVM).waitCtrTerminate
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/oci/runtime_vm.go:563
github.com/cri-o/cri-o/internal/oci.(*runtimeVM).StopContainer
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/oci/runtime_vm.go:549
github.com/cri-o/cri-o/internal/oci.(*Runtime).StopContainer
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/oci/oci.go:344
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).ContainerStop
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/lib/stop.go:19
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).Remove
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/lib/remove.go:26
github.com/cri-o/cri-o/server.(*Server).RemoveContainer
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/server/container_remove.go:21
github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8038
github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/log/interceptors.go:56
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/server/metrics/interceptors.go:24
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8040
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1024
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).handleStream
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1313
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:722
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1374
failed to stop container bf85a2555b85dc12ba769ebdd3e8522614321e4f4d74165a0586f8e2185840e8
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).ContainerStop
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/lib/stop.go:25
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).Remove
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/lib/remove.go:26
github.com/cri-o/cri-o/server.(*Server).RemoveContainer
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/server/container_remove.go:21
github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8038
github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/log/interceptors.go:56
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/server/metrics/interceptors.go:24
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8040
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1024
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).handleStream
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1313
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:722
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1374
unable to stop container bf85a2555b85dc12ba769ebdd3e8522614321e4f4d74165a0586f8e2185840e8
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).Remove
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/lib/remove.go:28
github.com/cri-o/cri-o/server.(*Server).RemoveContainer
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/server/container_remove.go:21
github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8038
github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/internal/log/interceptors.go:56
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/server/metrics/interceptors.go:24
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8040
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).processUnaryRPC
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1024
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).handleStream
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1313
github.com/cri-o/cri-o/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1
	/builddir/build/BUILD/cri-o-a1ab08ae436218d48f69641edafcf3aa35991c08/_output/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:722
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1374  file="go-grpc-middleware/chain.go:25" id=de84be29-a9ad-433b-9c06-ff59e34bcfc7 name=/runtime.v1alpha2.RuntimeService/RemoveContainer

Comment 4 Francesco Giudici 2021-01-22 16:40:15 UTC
Looked into a VM reproducing the issue.
Only adding a note here: this happens only if the container is not started (crictl start $CNT_ID).
If the container is started and stopped, it can be removed without issues.
My guess is that the container status is not completely initialized if it is not started.

Wondering if the issue is still present in a more recent kata containers version. Will check.

Comment 5 Qian Cai 2021-01-22 21:52:45 UTC
Tested on the today's master kata v2.0 with the lastest cri-o/crictl/cni. Failed in a bit different way.

# /usr/local/bin/crictl rm f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65
ERRO[0010] removing container "f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65" failed: rpc error: code = Unknown desc = unable to stop container f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65: failed to stop container f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65: ESRCH: No such process: unknown 
FATA[0010] unable to remove container(s)

time="2021-01-22T16:44:52.765058823-05:00" level=error msg="failed to create listener" error="\"/containerd-shim/default/d776cbbd43bbd8919db18ddc4e01baed717d026cf13d534564529b8b93164297/shim-monitor.sock\": unix socket path too long (> 106)" source=containerd-kata-shim-v2 subsystem=shim-management
time="2021-01-22T16:44:52.836240895-05:00" level=error msg="post event" error="failed to publish event: exit status 1" pid=22835 sandbox=d776cbbd43bbd8919db18ddc4e01baed717d026cf13d534564529b8b93164297 source=containerd-kata-shim-v2
time="2021-01-22T16:44:52.89471994-05:00" level=error msg="post event" error="failed to publish event: exit status 1" pid=22835 sandbox=d776cbbd43bbd8919db18ddc4e01baed717d026cf13d534564529b8b93164297 source=containerd-kata-shim-v2
time="2021-01-22T16:45:20.619967258-05:00" level=error msg="post event" error="failed to publish event: exit status 1" pid=22835 sandbox=d776cbbd43bbd8919db18ddc4e01baed717d026cf13d534564529b8b93164297 source=containerd-kata-shim-v2

DEBU[2021-01-22 16:45:20.441476269-05:00] /etc/system-fips does not exist on host, not mounting FIPS mode secret  file="secrets/secrets.go:203"
DEBU[2021-01-22 16:45:20.442169947-05:00] Oci.newRuntimeVM() start                      file="oci/runtime_vm.go:57"
DEBU[2021-01-22 16:45:20.442238859-05:00] Oci.newRuntimeVM() end                        file="oci/runtime_vm.go:72"
DEBU[2021-01-22 16:45:20.442271690-05:00] RuntimeVM.CreateContainer() start             file="oci/runtime_vm.go:81"
DEBU[2021-01-22 16:45:20.442299207-05:00] RuntimeVM.startRuntimeDaemon() start          file="oci/runtime_vm.go:184"
DEBU[2021-01-22 16:45:20.489011475-05:00] RuntimeVM.startRuntimeDaemon() end            file="oci/runtime_vm.go:253"
DEBU[2021-01-22 16:45:20.489397007-05:00] Start writing stream "stdout" to log file "/tmp/ubi8.0.log"  file="io/logger.go:54"
DEBU[2021-01-22 16:45:20.489450112-05:00] Start writing stream "stderr" to log file "/tmp/ubi8.0.log"  file="io/logger.go:54"
DEBU[2021-01-22 16:45:20.554329329-05:00] RuntimeVM.CreateContainer() end               file="oci/runtime_vm.go:180"
DEBU[2021-01-22 16:45:20.554410998-05:00] RuntimeVM.UpdateContainerStatus() start       file="oci/runtime_vm.go:612"
DEBU[2021-01-22 16:45:20.554442871-05:00] RuntimeVM.updateContainerStatus() start       file="oci/runtime_vm.go:626"
DEBU[2021-01-22 16:45:20.555526093-05:00] RuntimeVM.updateContainerStatus() end         file="oci/runtime_vm.go:669"
DEBU[2021-01-22 16:45:20.555573142-05:00] RuntimeVM.UpdateContainerStatus() end         file="oci/runtime_vm.go:619"
INFO[2021-01-22 16:45:20.568761093-05:00] Created container f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65: //  file="server/container_create.go:562" id=02d92bea-76ce-4498-90d7-2c5b87037c1b name=/runtime.v1alpha2.RuntimeService/CreateContainer
DEBU[2021-01-22 16:45:20.568852883-05:00] Response: &CreateContainerResponse{ContainerId:f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65,}  file="go-grpc-middleware/chain.go:25" id=02d92bea-76ce-4498-90d7-2c5b87037c1b name=/runtime.v1alpha2.RuntimeService/CreateContainer
time="2021-01-22T16:45:20.619967258-05:00" level=error msg="post event" error="failed to publish event: exit status 1" pid=22835 sandbox=d776cbbd43bbd8919db18ddc4e01baed717d026cf13d534564529b8b93164297 source=containerd-kata-shim-v2
DEBU[2021-01-22 16:45:48.054957389-05:00] Request: &ContainerStatusRequest{ContainerId:f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65,Verbose:false,}  file="go-grpc-middleware/chain.go:25" id=3680351c-5edf-4354-85b7-8aa252d9c769 name=/runtime.v1alpha2.RuntimeService/ContainerStatus
DEBU[2021-01-22 16:45:48.055262731-05:00] Response: &ContainerStatusResponse{Status:&ContainerStatus{Id:f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65,Metadata:&ContainerMetadata{Name:ubi8,Attempt:0,},State:CONTAINER_CREATED,CreatedAt:1611351920435995350,StartedAt:0,FinishedAt:0,ExitCode:0,Image:&ImageSpec{Image:registry.access.redhat.com/ubi8:latest,Annotations:map[string]string{},},ImageRef:registry.access.redhat.com/ubi8@sha256:82e0fbbf1f3e223550aefbc28f44dc6b04967fe25788520eac910ac8281cec9e,Reason:,Message:,Labels:map[string]string{},Annotations:map[string]string{},Mounts:[]*Mount{},LogPath:/tmp/ubi8.0.log,},Info:map[string]string{},}  file="go-grpc-middleware/chain.go:25" id=3680351c-5edf-4354-85b7-8aa252d9c769 name=/runtime.v1alpha2.RuntimeService/ContainerStatus
DEBU[2021-01-22 16:45:48.056006136-05:00] Request: &RemoveContainerRequest{ContainerId:f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65,}  file="go-grpc-middleware/chain.go:25" id=31338a72-b334-4407-a6e6-0072e01868ad name=/runtime.v1alpha2.RuntimeService/RemoveContainer
INFO[2021-01-22 16:45:48.056074795-05:00] Removing container: f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65  file="server/container_remove.go:14" id=31338a72-b334-4407-a6e6-0072e01868ad name=/runtime.v1alpha2.RuntimeService/RemoveContainer
DEBU[2021-01-22 16:45:48.056130711-05:00] RuntimeVM.StopContainer() start               file="oci/runtime_vm.go:497"
WARN[2021-01-22 16:45:58.058195667-05:00] StopContainer with signal terminated timed out after (10s)  file="oci/runtime_vm.go:540"
DEBU[2021-01-22 16:45:58.059207072-05:00] RuntimeVM.StopContainer() end                 file="oci/runtime_vm.go:546"
DEBU[2021-01-22 16:45:58.059524183-05:00] Response error: unknown
github.com/cri-o/cri-o/utils/errdefs.init
	/root/gosrc/src/github.com/cri-o/cri-o/utils/errdefs/errors.go:38
runtime.doInit
	/usr/lib/golang/src/runtime/proc.go:5474
runtime.doInit
	/usr/lib/golang/src/runtime/proc.go:5469
runtime.doInit
	/usr/lib/golang/src/runtime/proc.go:5469
runtime.doInit
	/usr/lib/golang/src/runtime/proc.go:5469
runtime.doInit
	/usr/lib/golang/src/runtime/proc.go:5469
runtime.main
	/usr/lib/golang/src/runtime/proc.go:190
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1373
ESRCH: No such process
github.com/cri-o/cri-o/utils/errdefs.FromGRPC
	/root/gosrc/src/github.com/cri-o/cri-o/utils/errdefs/grpc.go:98
github.com/cri-o/cri-o/internal/oci.(*runtimeVM).kill
	/root/gosrc/src/github.com/cri-o/cri-o/internal/oci/runtime_vm.go:840
github.com/cri-o/cri-o/internal/oci.(*runtimeVM).StopContainer
	/root/gosrc/src/github.com/cri-o/cri-o/internal/oci/runtime_vm.go:545
github.com/cri-o/cri-o/internal/oci.(*Runtime).StopContainer
	/root/gosrc/src/github.com/cri-o/cri-o/internal/oci/oci.go:344
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).ContainerStop
	/root/gosrc/src/github.com/cri-o/cri-o/internal/lib/stop.go:19
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).Remove
	/root/gosrc/src/github.com/cri-o/cri-o/internal/lib/remove.go:26
github.com/cri-o/cri-o/server.(*Server).RemoveContainer
	/root/gosrc/src/github.com/cri-o/cri-o/server/container_remove.go:21
github.com/cri-o/cri-o/server/cri/v1alpha2.(*service).RemoveContainer
	/root/gosrc/src/github.com/cri-o/cri-o/server/cri/v1alpha2/rpc_remove_container.go:16
k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler.func1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8038
github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1
	/root/gosrc/src/github.com/cri-o/cri-o/internal/log/interceptors.go:56
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1
	/root/gosrc/src/github.com/cri-o/cri-o/server/metrics/interceptors.go:24
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8040
google.golang.org/grpc.(*Server).processUnaryRPC
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1024
google.golang.org/grpc.(*Server).handleStream
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:722
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1373
failed to stop container f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).ContainerStop
	/root/gosrc/src/github.com/cri-o/cri-o/internal/lib/stop.go:25
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).Remove
	/root/gosrc/src/github.com/cri-o/cri-o/internal/lib/remove.go:26
github.com/cri-o/cri-o/server.(*Server).RemoveContainer
	/root/gosrc/src/github.com/cri-o/cri-o/server/container_remove.go:21
github.com/cri-o/cri-o/server/cri/v1alpha2.(*service).RemoveContainer
	/root/gosrc/src/github.com/cri-o/cri-o/server/cri/v1alpha2/rpc_remove_container.go:16
k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler.func1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8038
github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1
	/root/gosrc/src/github.com/cri-o/cri-o/internal/log/interceptors.go:56
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1
	/root/gosrc/src/github.com/cri-o/cri-o/server/metrics/interceptors.go:24
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8040
google.golang.org/grpc.(*Server).processUnaryRPC
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1024
google.golang.org/grpc.(*Server).handleStream
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:722
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1373
unable to stop container f56a22bbda449cd7a941d37388aedc04c46d197befacbb25135716ecd4dbfe65
github.com/cri-o/cri-o/internal/lib.(*ContainerServer).Remove
	/root/gosrc/src/github.com/cri-o/cri-o/internal/lib/remove.go:28
github.com/cri-o/cri-o/server.(*Server).RemoveContainer
	/root/gosrc/src/github.com/cri-o/cri-o/server/container_remove.go:21
github.com/cri-o/cri-o/server/cri/v1alpha2.(*service).RemoveContainer
	/root/gosrc/src/github.com/cri-o/cri-o/server/cri/v1alpha2/rpc_remove_container.go:16
k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler.func1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8038
github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1
	/root/gosrc/src/github.com/cri-o/cri-o/internal/log/interceptors.go:56
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1
	/root/gosrc/src/github.com/cri-o/cri-o/server/metrics/interceptors.go:24
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_RemoveContainer_Handler
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:8040
google.golang.org/grpc.(*Server).processUnaryRPC
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1024
google.golang.org/grpc.(*Server).handleStream
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:1313
google.golang.org/grpc.(*Server).serveStreams.func1.1
	/root/gosrc/src/github.com/cri-o/cri-o/vendor/google.golang.org/grpc/server.go:722
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1373  file="go-grpc-middleware/chain.go:25" id=31338a72-b334-4407-a6e6-0072e01868ad name=/runtime.v1alpha2.RuntimeService/RemoveContainer

Comment 7 Qian Cai 2021-02-24 20:27:22 UTC
Okay, I tracked down to https://github.com/kata-containers/kata-containers/blob/master/src/runtime/virtcontainers/kata_agent.go#L1585

If the container was created but not started yet, SIGKILL will return ESRCH, and the created container running those processes inside the VM (our container entry point sleep was not running yet. See the container-pod.json in comment #0).

# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0    968     4 ?        S    19:48   0:00 /pause
root           2  0.0  0.1   6884  2220 ?        S    19:48   0:00 /usr/bin/kata-agent init

If the container was started, we have those processes inside.

# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0    968     4 ?        S    19:48   0:00 /pause
root           2  0.0  0.0  23024   180 ?        S    19:48   0:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 3600

So my assumption is that StopContainer() will try to kill the entry point (sleep) but can't find it in the created container? However, I am yet to understand in,

https://github.com/cri-o/cri-o/blob/master/internal/oci/runtime_vm.go#L529

The first SIGTERM was successful for the created container but not the following SIGKILL.

Comment 9 Qian Cai 2021-02-25 21:56:21 UTC
It looks like we are getting here into the agent code here:

https://github.com/kata-containers/kata-containers/blob/a12772c6010f6886be7af86cd8c1a03008c4bb6c/src/agent/src/rpc.rs#L313

This is gonna to be fun to learn rust!

Comment 10 Qian Cai 2021-02-25 22:11:04 UTC
I think the next step is to verify that if find_process() does really find the init process or not.

https://github.com/kata-containers/kata-containers/blob/a12772c6010f6886be7af86cd8c1a03008c4bb6c/src/agent/src/rpc.rs#L1329

Comment 12 Qian Cai 2021-03-04 20:31:34 UTC
I have to context switch to another task, so save some the states for now. Basically, "crictl runp" will start a VM and spawn a single kata-agent process. Then, "crictl create" will spawn another kata-agent. The first puzzle here is why it needs two kata-agent processes?

Later, "crictl rm" will send two signals SIGTERM and then SIGKILL to one of kata-agent processes using the specific PID. Since the SIGTERM will terminate the process and the following SIGKILL found the PID is not longer existed. Hence, ESRCH. Even running "crictl rm" multiple times later would still use that non-existing kata-agent PID. So, the second puzzle here why that get_process() [1] would never be updated to always get the *current* processes inside the VM?


[1] https://github.com/kata-containers/kata-containers/blob/a12772c6010f6886be7af86cd8c1a03008c4bb6c/src/agent/rustjail/src/container.rs#L785

Comment 13 Qian Cai 2021-03-04 20:36:13 UTC
The rust agent debugging is somewhat broken for now. To workaround it needs,

https://github.com/kata-containers/kata-containers/pull/1483
https://github.com/kata-containers/kata-containers/issues/1473#issuecomment-788939831

Comment 16 Shiftzilla 2023-03-09 01:00:56 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-8836


Note You need to log in before you can comment on or make changes to this bug.