Bug 1323105

Summary: [Dedicated]Pod attached pv cannot terminated after re-deployment if being scheduled to nodes other than ip-172-31-2-203.ec2.internal
Product: OpenShift Online Reporter: Wenjing Zheng <wzheng>
Component: StorageAssignee: Bradley Childs <bchilds>
Status: CLOSED CURRENTRELEASE QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.xCC: aos-bugs, chaoyang, decarr, jhou, jokerman, mmccomas, mturansk
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-23 15:08:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Wenjing Zheng 2016-04-01 09:59:57 UTC
Description of problem:
Pod created after re-deployment cannot attach the EBS volume since old pod doesn't terminate to detach the volume and return error about "VolumeInUse: vol-9771d635 is already attached to an instance" in old pod event, but no such issue if pod is scheduled to node ip-172-31-2-203.ec2.internal.


Version-Release number of selected component (if applicable):
ded-stage-aws
atomic-openshift-3.2.0.8-1.git.0.f4edaed.el7.x86_64


How reproducible:
Most of times - if pods are scheduled to nodes other than ip-172-31-2-203.ec2.internal

Steps to Reproduce:
1. Process mysql-persistent template:
$oc process -f https://raw.githubusercontent.com/openshift/origin/master/examples/db-templates/mysql-persistent-template.json | oc create -f -
2. Wait pod is running and change db password via $oc deploy mysql --latest
3. Check pod status
4. Describe pod
 
Actual result:
3. $ oc get pods
NAME                  READY     STATUS              RESTARTS   AGE
mysql-1-n4eac         0/1       ContainerCreating   0          2h        ip-172-31-2-202.ec2.internal
mysql-2-deploy        0/1       Error               0          3h        ip-172-31-2-202.ec2.internal
4. Below errors in Event list:
Events:
  FirstSeen     LastSeen        Count   From                                    SubobjectPath   Type            Reason          Message
  ---------     --------        -----   ----                                    -------------   --------        ------          -------
  58m           58m             1       {default-scheduler }                                    Normal          Scheduled       Successfully assigned mysql-1-n4eac to ip-172-31-2-203.ec2.internal
  57m           <invalid>       53      {kubelet ip-172-31-2-203.ec2.internal}                  Warning         FailedMount     Unable to mount volumes for pod "mysql-1-n4eac_wzheng2(77057ac5-f7c7-11e5-89b5-0eb4d24322f9)": Could not attach EBS Disk "aws://us-east-1c/vol-9771d635": Error attaching EBS volume: VolumeInUse: vol-9771d635 is already attached to an instance
                status code: 400, request id:
  57m           <invalid>       53      {kubelet ip-172-31-2-203.ec2.internal}          Warning FailedSync      Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-9771d635": Error attaching EBS volume: VolumeInUse: vol-9771d635 is already attached to an instance
                status code: 400, request id: 


Expected results:
Pod should be running like it does in node ip-172-31-2-203.ec2.internal

Additional info:

Comment 2 Mark Turansky 2016-04-05 19:00:18 UTC
I believe this issue is the attach/detach logic handled by  https://github.com/kubernetes/kubernetes/issues/20262

Basically, Node A has a recon loop in Kubelet that will detach/unmount orphaned volumes (no pod needs them anymore).   Unless and until Kubelet on Node A processes that orphaned volume, it will remain unaccessible for Node B (volume already in use error).

The centralized attach/detach controller in the above PR seeks to mitigate this problem.

Comment 3 Jianwei Hou 2016-04-06 03:16:37 UTC
(In reply to Mark Turansky from comment #2)
> I believe this issue is the attach/detach logic handled by 
> https://github.com/kubernetes/kubernetes/issues/20262
> 
> Basically, Node A has a recon loop in Kubelet that will detach/unmount
> orphaned volumes (no pod needs them anymore).   Unless and until Kubelet on
> Node A processes that orphaned volume, it will remain unaccessible for Node
> B (volume already in use error).
> 
> The centralized attach/detach controller in the above PR seeks to mitigate
> this problem.

From what I saw in the environment, during the 2nd deploy, the first mysql pod was stucked in 'Terminating' status, the volume was not detached from the instance, at the same time the new mysql pod was trying to mount the same ebs volume too, and it got the error above.

Comment 4 Mark Turansky 2016-04-06 12:22:36 UTC
Did you find why it was stuck in "terminating" status?

So long as that pod remains on the original node (even in an error state), then its volumes are necessary on that node.

Only when the pod is gone completely from a node will Kubelet unmount and detach all volumes.

Comment 5 Abhishek Gupta 2016-04-08 19:13:33 UTC
Clayton (I believe) mentioned on the bug scrub call today that pods stuck in Terminating state was an issue that Derek was chasing and is currently working on a fix.

Comment 6 Derek Carr 2016-04-20 15:31:31 UTC
I think the pod stuck in terminating issue could have been resolved by:

https://github.com/kubernetes/kubernetes/pull/23746

The node could have got stuck if the docker image pull for mysql:latest returned a 50x error response at any point each time it was attempting to restart/start the container.

Comment 7 Wenjing Zheng 2016-04-21 06:38:52 UTC
No such issue in ded-stage-aws now, pod is running normally after re-deployment.