Bug 1397693
Summary: | EBS volume could not detached from node if node service is stopped | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Chao Yang <chaoyang> |
Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.4.0 | CC: | aos-bugs, bchilds, chaoyang, eparis, hekumar, jhou, jsafrane, sdodson, tdawson, trankin, xtian |
Target Milestone: | --- | ||
Target Release: | 3.4.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | atomic-openshift-3.4.0.35-1.git.0.86b11df.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-22 18:10:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chao Yang
2016-11-23 08:15:47 UTC
Per hchen on IRC: This is fixed here: https://github.com/kubernetes/kubernetes/pull/36840 Which is covered in this PR for openshift: https://github.com/openshift/origin/pull/12024 I see this merged in OSE-3.3 Where is the PR for origin-1.4 Patch is merged in 3.3: https://github.com/openshift/origin/pull/12024 and 3.4: https://github.com/openshift/origin/pull/12124 Please re-test. This has been merged into ocp and is in OCP v3.4.0.33 or newer. This is failed on openshift v3.4.0.33+71c05b2 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 Some log in master is as below: Dec 6 22:38:30 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.498886 13926 reconciler.go:161] Volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5"/Node "ip-172-18-7-28.ec2.internal" is attached--touching. Dec 6 22:38:30 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.498902 13926 reconciler.go:165] Attempting to start AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5" to node "ip-172-18-7-27.ec2.internal" Dec 6 22:38:30 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.498918 13926 reconciler.go:168] Started AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5" to node "ip-172-18-7-27.ec2.internal" Dec 6 22:38:30 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.641721 13926 reconciler.go:161] Volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5"/Node "ip-172-18-7-28.ec2.internal" is attached--touching. Dec 6 22:38:30 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.641738 13926 reconciler.go:165] Attempting to start AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5" to node "ip-172-18-7-27.ec2.internal" Dec 6 22:38:30 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.761132 13926 aws.go:1241] Assigned mount device ba -> volume vol-3455cca5 Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.822992 13926 reconciler.go:161] Volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5"/Node "ip-172-18-7-28.ec2.internal" is attached--touching. Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:30.823005 13926 reconciler.go:165] Attempting to start AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5" to node "ip-172-18-7-27.ec2.internal" Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:31.037276 13926 reconciler.go:161] Volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5"/Node "ip-172-18-7-28.ec2.internal" is attached--touching. Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:31.037292 13926 reconciler.go:165] Attempting to start AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5" to node "ip-172-18-7-27.ec2.internal" Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:31.161153 13926 reconciler.go:161] Volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5"/Node "ip-172-18-7-28.ec2.internal" is attached--touching. Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:31.161170 13926 reconciler.go:165] Attempting to start AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5" to node "ip-172-18-7-27.ec2.internal" Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:31.254002 13926 aws.go:1260] Releasing in-process attachment entry: ba -> volume vol-3455cca5 Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: E1206 22:38:31.254017 13926 attacher.go:72] Error attaching volume "aws://us-east-1d/vol-3455cca5": Error attaching EBS volume: VolumeInUse: vol-3455cca5 is already attached to an instance Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: E1206 22:38:31.254098 13926 nestedpendingoperations.go:253] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5\"" failed. No retries permitted until 2016-12-06 22:38:31.754083838 -0500 EST (durationBeforeRetry 500ms). Error: recovered from panic "runtime error: invalid memory address or nil pointer dereference". (err=<nil>) Call stack: Dec 6 22:38:31 ip-172-18-11-221 atomic-openshift-master: I1206 22:38:31.299487 13926 reconciler.go:161] Volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-3455cca5"/Node "ip-172-18-7-28.ec2.internal" is attached--touching. This is failed on the atomic-openshift-3.4.0.35-1.git.0.86b11df.el7.x86_64 Please check log on https://bugzilla.redhat.com/show_bug.cgi?id=1397693#c21 and https://bugzilla.redhat.com/show_bug.cgi?id=1397693#c22 @hchen They are stopped and terminated This has been merged into ocp and is in OCP v3.4.1.7 or newer. This is failed on openshift v3.4.1.7 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned registry-console-2-y5kgt to ip-172-18-15-88.ec2.internal 5m 1m 10 {controller-manager } Warning FailedMount Failed to attach volume "pvc-5b55fcd2-f191-11e6-9747-0e758138dc84" on node "ip-172-18-15-88.ec2.internal" with: Error attaching EBS volume: VolumeInUse: vol-087a2a79f623c5358 is already attached to an instance status code: 400, request id: 3m 1m 2 {kubelet ip-172-18-15-88.ec2.internal} Warning FailedMount Unable to mount volumes for pod "registry-console-2-y5kgt_default(df6cf382-f192-11e6-9747-0e758138dc84)": timeout expired waiting for volumes to attach/mount for pod "registry-console-2-y5kgt"/"default". list of unattached/unmounted volumes=[v1] 3m 1m 2 {kubelet ip-172-18-15-88.ec2.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "registry-console-2-y5kgt"/"default". list of unattached/unmounted volumes=[v1] I will re-test it on newer OCP Can you elaborate on how it failed? As I said - if atomic-openshift-node is not running on the node, the volume being used won't be unmounted and hence it can't be detached. This is by design. I opened a trello card - https://trello.com/c/hNeu0drR/418-enable-force-detach-when-volume-is-attached-to-a-node-and-node-is-down to fix that problem because fixing this behavior is more than a quick bug fix. To recap - what I am saying is, none of the code we pushed fixes originally reported bug in the ticket. The bug is asking for a feature which does not exist. What we *did* fix is - we fixed the panic that happens when error is being logged via an event. So following panic: " Error: recovered from panic "runtime error: invalid memory address or nil pointer dereference". (err=<nil>) Call stack:" should be gone. Sorry for the misunderstand. I re-test this bug again on OCP v3.4.1.7. When the pod using ebs volume in the "ContainerCreating" status, could not found error "runtime error: invalid memory address or nil pointer dereference" in the /var/log/message Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0289 |