This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1475242 - Device busy - pod volumes not cleaned up and stuck in "Terminating" state
Device busy - pod volumes not cleaned up and stuck in "Terminating" state
Status: VERIFIED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
3.4.1
Unspecified Unspecified
unspecified Severity high
: ---
: 3.7.0
Assigned To: Jhon Honce
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-26 05:28 EDT by Nicolas Nosenzo
Modified: 2017-10-10 02:09 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nicolas Nosenzo 2017-07-26 05:28:11 EDT
Description of problem:

Occasionally pod volumes cannot be cleaned up, showing below message within docker logs:

Jul 25 06:20:43 worker-node.example oci-register-machine[13004]: 2017/07/25 06:20:43 Register machine: prestart f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a 12979 /var/lib/docker/devicemapper/mnt/4d50741a4fc69810c767eb397d504375f1daf5288fdf9482c4a4a8f94bae7e2d/rootfs
Jul 25 06:37:38 worker-node.example oci-register-machine[46422]: 2017/07/25 06:37:38 Register machine: poststop f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a 0 /var/lib/docker/devicemapper/mnt/4d50741a4fc69810c767eb397d504375f1daf5288fdf9482c4a4a8f94bae7e2d/rootfs
Jul 25 06:37:39 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:39.328451685Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop?t=2 returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:39 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:39.328487875Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.182012375Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop?t=30 returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.182071437Z" level=error msg="Handler for POST /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/stop returned error: Container f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a is already stopped"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.387715795Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:37:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:37:40.387760651Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:38:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:38:40.584992060Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"
Jul 25 06:38:40 worker-node.example dockerd-current[1135]: time="2017-07-25T06:38:40.585057074Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"

$ cat docker_inspect_40b856db7232 | jq '.[0] | {"HostConfig.Binds": .HostConfig.Binds, GraphDriver: .GraphDriver, Mounts: .Mounts}'
{
  "HostConfig.Binds": [
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-secrets:/secrets:Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets:/client-secrets:Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-token-myimq:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/etc-hosts:/etc/hosts",
    "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/containers/hawkular-metrics/8b778dc7:/dev/termination-log"
  ],
  "GraphDriver": {
    "Name": "devicemapper",
    "Data": {
      "DeviceId": "73",
      "DeviceName": "docker-253:3-5505025-f488354b1435173b86c54c9c1e97c50e90c22b89981044ec2cf88e1f4da2aa83",
      "DeviceSize": "10737418240"
    }
  },
  "Mounts": [
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets",
      "Destination": "/client-secrets",
      "Mode": "Z",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-token-myimq",
      "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
      "Mode": "ro,Z",
      "RW": false,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/etc-hosts",
      "Destination": "/etc/hosts",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/containers/hawkular-metrics/8b778dc7",
      "Destination": "/dev/termination-log",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/origin/openshift.local.volumes/pods/ae111028-5be2-11e7-9d6e-fa163e6b6630/volumes/kubernetes.io~secret/hawkular-metrics-secrets",
      "Destination": "/secrets",
      "Mode": "Z",
      "RW": true,
      "Propagation": "rprivate"
    }
  ]
}



Version-Release number of selected component (if applicable):

docker-1.12.6-28.git1398f24.el7.x86_64
atomic-openshift-3.4.1.37-1
kernel-3.10.0-514.21.2.el7.x86_64

How reproducible:
Intermittent on Customer env.  

Steps to Reproduce:
1. Deploy a pod.
2. Remove the pod.


Actual results:

dockerd-current[1135]: time="2017-07-25T06:37:40.387715795Z" level=error msg="Handler for DELETE /containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a?v=1 returned error: Unable to remove filesystem for f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a: remove /var/lib/docker/containers/f08a75c541ac053ff6440936ecf78a78098e69cbce23e535991ff4bc35323e4a/shm: device or resource busy"


Expected results:

Pod get removed, showing a message like:
dockerd-current[]: time="" level=info msg="{Action=remove, LoginUID=..., PID=...}"



Additional info:

This topic was discussed on https://bugzilla.redhat.com/show_bug.cgi?id=1450554 which is already closed as CURRENTRELEASE
Comment 3 DeShuai Ma 2017-10-10 02:09:26 EDT
Verify on openshift v3.7.0-0.143.1
kubernetes v1.7.0+80709908fd
etcd 3.2.1
I don't see those error log after long opts in ocp env.

Note You need to log in before you can comment on or make changes to this bug.