Bug 1475242 - Device busy - pod volumes not cleaned up and stuck in "Terminating" state
Summary: Device busy - pod volumes not cleaned up and stuck in "Terminating" state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.7.0
Assignee: Jhon Honce
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-26 09:28 UTC by Nicolas Nosenzo
Modified: 2020-09-10 11:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-28 22:05:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Nicolas Nosenzo 2017-07-26 09:28:11 UTC
Description of problem:

Occasionally pod volumes cannot be cleaned up, showing below message within docker logs:

Jul 25 06:20:43 worker-node.example oci-register-machine[13004]: 2017/07/25 06:20:43 Register machine: prestart f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a 12979 /var/lib/docker/devicemapper/mnt/4d50741a4fc69810c767eb397d504375f1daf5288fdf9482c4a4a8f94bae7e2d/rootfs
Jul 25 06:37:38 worker-node.example oci-register-machine[46422]: 2017/07/25 06:37:38 Register machine: poststop f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a 0 /var/lib/docker/devicemapper/mnt/4d50741a4fc69810c767eb397d504375f1daf5288fdf9482c4a4a8f94bae7e2d/rootfs
Jul 25 06:37:39 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:39.328451685Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop?t=2 returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:39 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:39.328487875Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.182012375Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop?t=30 returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.182071437Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.387715795Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.387760651Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:38:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:38:40.584992060Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:38:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:38:40.585057074Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"

$ cat docker_inspect_40b856db7232 | jq '.[0] | {"HostConfig.Binds": .HostConfig.Binds, GraphDriver: .GraphDriver, Mounts: .Mounts}'
{
  "HostConfig.Binds": [
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-secrets:/secrets:Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets:/client-secrets:Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-token-myimq:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/etc-hosts:/etc/hosts",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/containers/hawkular-metrics/8b778dc7:/dev/termination-log"
  ],
  "GraphDriver": {
    "Name": "devicemapper",
    "Data": {
      "DeviceId": "73",
      "DeviceName": "docker-253:3-5505025-f488354b1435173b86c54c9c1e97c50e90c22b89981044ec2cf88e1f4da2aa83",
      "DeviceSize": "10737418240"
    }
  },
  "Mounts": [
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets",
      "Destination": "/client-secrets",
      "Mode": "Z",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-token-myimq",
      "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
      "Mode": "ro,Z",
      "RW": false,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/etc-hosts",
      "Destination": "/etc/hosts",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/containers/hawkular-metrics/8b778dc7",
      "Destination": "/dev/termination-log",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-secrets",
      "Destination": "/secrets",
      "Mode": "Z",
      "RW": true,
      "Propagation": "rprivate"
    }
  ]
}



Version-Release number of selected component (if applicable):

docker-1.12.6-28.git1398f24.el7.x86_64
atomic-openshift-3.4.1.37-1
kernel-3.10.0-514.21.2.el7.x86_64

How reproducible:
Intermittent on Customer env.  

Steps to Reproduce:
1. Deploy a pod.
2. Remove the pod.


Actual results:

dockerd-current[1135]: time="2017-07-25T06:37:40.387715795Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"


Expected results:

Pod get removed, showing a message like:
dockerd-current[]: time="" level=info msg="{Action=remove, LoginUID=..., PID=...}"



Additional info:

This topic was discussed on https://bugzilla.redhat.com/show_bug.cgi?id=1450554 which is already closed as CURRENTRELEASE

Comment 3 DeShuai Ma 2017-10-10 06:09:26 UTC
Verify on openshift v3.7.0-0.143.1
kubernetes v1.7.0+80709908fd
etcd 3.2.1
I don't see those error log after long opts in ocp env.

Comment 6 errata-xmlrpc 2017-11-28 22:05:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.