Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1455675 - [3.5] Volume unmounted from node but not detached - no unmount request in logs
[3.5] Volume unmounted from node but not detached - no unmount request in logs
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage (Show other bugs)
3.5.1
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.5.z
Assigned To: Matthew Wong
chaoyang
:
Depends On:
Blocks: 1457510
  Show dependency treegraph
 
Reported: 2017-05-25 14:27 EDT by Hemant Kumar
Modified: 2017-06-15 14:40 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: volumes attached to non-running AWS instances get incorrectly marked as detached by the periodic 'verify volumes are attached' routine because non-running AWS instances are not considered nodes by the routine Consequence: volumes that are incorrectly marked detached will never be detached if or when they need to be later Fix: consider non-running AWS instances to be nodes in the 'verify volumes are attached' routine Result: volumes attached to non-running AWS instances are correctly tracked as attached and will be detached when they need to be later
Story Points: ---
Clone Of:
: 1457510 (view as bug list)
Environment:
Last Closed: 2017-06-15 14:40:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
grep vol-0ed79e9051f8d1d4d /var/log/messages (46.02 KB, text/plain)
2017-06-05 08:06 EDT, Jianwei Hou
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1425 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, 3.3, and 3.2 bug fix update 2017-06-15 18:35:53 EDT

  None (edit)
Comment 7 Jianwei Hou 2017-06-05 08:05:11 EDT
Tested on openshift v3.5.5.23

1. Create PVC/PV and rc.
2. Stop the node service the Pod is scheduled to.
3. Pod scheduled to new node, but stuck at 'ContainerCreating' state
 oc get pods              
NAME        READY     STATUS              RESTARTS   AGE
ebs-dpz5p   1/1       Unknown             0          14m
ebs-s2h7b   0/1       ContainerCreating   0          7m

4. Bring back the node service, the pod ebs-dpz5p is deleted.
5. Volume is not unmounted and detached successfully. New pod could not become 'Running'.
oc get pods
NAME        READY     STATUS              RESTARTS   AGE
ebs-s2h7b   0/1       ContainerCreating   0          29m

# grep vol-0ed79e9051f8d1d4d /var/log/messages
```
Jun  5 07:57:12 ip-172-18-15-42 atomic-openshift-node: I0605 07:57:12.280650   20830 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (spec.Name: "pvol") from pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50").
Jun  5 07:57:12 ip-172-18-15-42 atomic-openshift-node: E0605 07:57:12.285100   20830 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d\" (\"a87c9660-49e1-11e7-b5c0-0e14d6e6ec50\")" failed. No retries permitted until 2017-06-05 07:59:12.285077064 -0400 EDT (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (volume.spec.Name: "pvol") pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50") with: remove /var/lib/origin/openshift.local.volumes/pods/a87c9660-49e1-11e7-b5c0-0e14d6e6ec50/volumes/kubernetes.io~aws-ebs/pvc-a050a061-49e1-11e7-b5c0-0e14d6e6ec50: device or resource busy
Jun  5 07:59:12 ip-172-18-15-42 atomic-openshift-node: I0605 07:59:12.309581   20830 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (spec.Name: "pvol") from pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50").
Jun  5 08:01:12 ip-172-18-15-42 journal: I0605 08:01:12.413553   20830 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (spec.Name: "pvol") from pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50").
Jun  5 08:01:12 ip-172-18-15-42 journal: E0605 08:01:12.418120   20830 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d\" (\"a87c9660-49e1-11e7-b5c0-0e14d6e6ec50\")" failed. No retries permitted until 2017-06-05 08:03:12.418098061 -0400 EDT (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (volume.spec.Name: "pvol") pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50") with: remove /var/lib/origin/openshift.local.volumes/pods/a87c9660-49e1-11e7-b5c0-0e14d6e6ec50/volumes/kubernetes.io~aws-ebs/pvc-a050a061-49e1-11e7-b5c0-0e14d6e6ec50: device or resource busy
Jun  5 08:01:12 ip-172-18-15-42 atomic-openshift-node: I0605 08:01:12.413553   20830 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (spec.Name: "pvol") from pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50").
Jun  5 08:01:12 ip-172-18-15-42 atomic-openshift-node: E0605 08:01:12.418120   20830 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d\" (\"a87c9660-49e1-11e7-b5c0-0e14d6e6ec50\")" failed. No retries permitted until 2017-06-05 08:03:12.418098061 -0400 EDT (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0ed79e9051f8d1d4d" (volume.spec.Name: "pvol") pod "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50" (UID: "a87c9660-49e1-11e7-b5c0-0e14d6e6ec50") with: remove /var/lib/origin/openshift.local.volumes/pods/a87c9660-49e1-11e7-b5c0-0e14d6e6ec50/volumes/kubernetes.io~aws-ebs/pvc-a050a061-49e1-11e7-b5c0-0e14d6e6ec50: device or resource busy
```
Comment 8 Jianwei Hou 2017-06-05 08:06 EDT
Created attachment 1285030 [details]
grep vol-0ed79e9051f8d1d4d /var/log/messages
Comment 10 Hemant Kumar 2017-06-05 09:59:09 EDT
No the original cluster wasn't containarized. None of the clusters in openshift.io are containarized. Is the cluster Jianwei is using containarized?
Comment 11 Matthew Wong 2017-06-05 11:15:47 EDT
Yes, nsenter_mount is doing the mounting. IMO we should test this bug against a non-containerized env and open a new bug against containerized.
Comment 12 Hemant Kumar 2017-06-05 11:18:58 EDT
Yes - I agree. If the failure was because of atomic-openshift-node running in a container, we might have to double check several different code paths to fix that. 

If this bug is fixed as is, in non-containarized environments, we should go ahead with accepting it as VEFIFIED. @jianwei - would you agree to that?
Comment 13 Jianwei Hou 2017-06-06 01:49:48 EDT
I agree, I have verified this is fixed on rpm installed ocp cluster v3.5.5.23. The ebs volume was unmounted and detached from old node, then attached and mounted to new node.
Could you please move it to on_qa status?
I'll open a new one against containerized ocp
Comment 14 Jianwei Hou 2017-06-06 02:12:46 EDT
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1459006 to track the containerized env issue.
Comment 16 errata-xmlrpc 2017-06-15 14:40:59 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1425

Note You need to log in before you can comment on or make changes to this bug.