Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1475242

Summary: Device busy - pod volumes not cleaned up and stuck in "Terminating" state
Product: OpenShift Container Platform Reporter: Nicolas Nosenzo <nnosenzo>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.1CC: aos-bugs, jokerman, mmccomas, pdwyer
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:05:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nicolas Nosenzo 2017-07-26 09:28:11 UTC
Description of problem:

Occasionally pod volumes cannot be cleaned up, showing below message within docker logs:

Jul 25 06:20:43 worker-node.example oci-register-machine[13004]: 2017/07/25 06:20:43 Register machine: prestart f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a 12979 /var/lib/docker/devicemapper/mnt/4d50741a4fc69810c767eb397d504375f1daf5288fdf9482c4a4a8f94bae7e2d/rootfs
Jul 25 06:37:38 worker-node.example oci-register-machine[46422]: 2017/07/25 06:37:38 Register machine: poststop f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a 0 /var/lib/docker/devicemapper/mnt/4d50741a4fc69810c767eb397d504375f1daf5288fdf9482c4a4a8f94bae7e2d/rootfs
Jul 25 06:37:39 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:39.328451685Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop?t=2 returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:39 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:39.328487875Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.182012375Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop?t=30 returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.182071437Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.387715795Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.387760651Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:38:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:38:40.584992060Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:38:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:38:40.585057074Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"

$ cat docker_inspect_40b856db7232 | jq '.[0] | {"HostConfig.Binds": .HostConfig.Binds, GraphDriver: .GraphDriver, Mounts: .Mounts}'
{
  "HostConfig.Binds": [
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-secrets:/secrets:Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets:/client-secrets:Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-token-myimq:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/etc-hosts:/etc/hosts",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/containers/hawkular-metrics/8b778dc7:/dev/termination-log"
  ],
  "GraphDriver": {
    "Name": "devicemapper",
    "Data": {
      "DeviceId": "73",
      "DeviceName": "docker-253:3-5505025-f488354b1435173b86c54c9c1e97c50e90c22b89981044ec2cf88e1f4da2aa83",
      "DeviceSize": "10737418240"
    }
  },
  "Mounts": [
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets",
      "Destination": "/client-secrets",
      "Mode": "Z",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-token-myimq",
      "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
      "Mode": "ro,Z",
      "RW": false,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/etc-hosts",
      "Destination": "/etc/hosts",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/containers/hawkular-metrics/8b778dc7",
      "Destination": "/dev/termination-log",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-secrets",
      "Destination": "/secrets",
      "Mode": "Z",
      "RW": true,
      "Propagation": "rprivate"
    }
  ]
}



Version-Release number of selected component (if applicable):

docker-1.12.6-28.git1398f24.el7.x86_64
atomic-openshift-3.4.1.37-1
kernel-3.10.0-514.21.2.el7.x86_64

How reproducible:
Intermittent on Customer env.  

Steps to Reproduce:
1. Deploy a pod.
2. Remove the pod.


Actual results:

dockerd-current[1135]: time="2017-07-25T06:37:40.387715795Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"


Expected results:

Pod get removed, showing a message like:
dockerd-current[]: time="" level=info msg="{Action=remove, LoginUID=..., PID=...}"



Additional info:

This topic was discussed on https://bugzilla.redhat.com/show_bug.cgi?id=1450554 which is already closed as CURRENTRELEASE

Comment 3 DeShuai Ma 2017-10-10 06:09:26 UTC
Verify on openshift v3.7.0-0.143.1
kubernetes v1.7.0+80709908fd
etcd 3.2.1
I don't see those error log after long opts in ocp env.

Comment 6 errata-xmlrpc 2017-11-28 22:05:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188