Bug 1419577 - [3.3] Unmount device fails in certain cases
Summary: [3.3] Unmount device fails in certain cases
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Storage
Version: 3.x
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Hemant Kumar
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-06 14:50 UTC by Hemant Kumar
Modified: 2017-05-30 12:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-30 12:47:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Hemant Kumar 2017-02-06 14:50:42 UTC
Description of problem:

Sometimes when pod is moved as a result of drain or something else, the bind volume is unmounted but AWS attached device doesn't get unmounted from the node. This causes problems with pod never starting on another node.


How reproducible:

Sometimes


Steps to Reproduce:
1. Create a multinode node cluster and create 5-6 deployments (not pods) on a node. Make sure these pods write to the mounted EBS PV (something like busybox write).
2. Now drain that node, so as all pods running on it moves.
3. Check if all moved pods are running successfully. I had to do this several times to reproduce the bug.

Actual results:

One or more pod can get stuck in ContainerCreating because AWS never attaches device on new node. 


Expected results:

All pods should move successfully.


Additional info:

If device pod is using is "busy" (i.e being written to), the first unmount fails with "device is busy error".  Eventually though container is deleted and device becomes "unbusy" but error handling code in volumemanager doesn't kick and it deletes the device from actual state of world - thinking device is unmounted. In other words - current code silently swallows unmount error and because error is not propagated volumemanager thinks device is successfully unmounted.

https://github.com/openshift/ose/pull/602/files#diff-f7240ab860b1b30388948da95bb5b02aR237

Comment 2 Chao Yang 2017-02-15 10:53:51 UTC
Umount issue is passed on 
openshift v3.3.1.13
kubernetes v1.3.0+52492b4
etcd 2.3.0+git

Create 5 app like below
oc new-app php:5.6~https://github.com/openshift/sti-php --context-dir='5.6/test/test-app'
Create a dynamic pvc
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "ebsc2",
    "annotations": {
        "volume.alpha.kubernetes.io/storage-class": "foo"
    },
    "labels": {
        "name": "dynamic-pvc"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "3Gi"
      }
    }
  }
}
~ 
oc volume dc/sti-php --add --type=persistentVolumeClaim --mount-path=/opt1 --name=v1 --claim-name=ebsc2 --overwrite
oadm manage-node ip-172-18-5-95.ec2.internal --evacuate --pod-selector="app=sti-php"
Check ebs volume is umounted from node ip-172-18-5-95.ec2.internal 
No error message when grep "device is busy error" /var/log/messages


Note You need to log in before you can comment on or make changes to this bug.