1298284 – Races in persistent volume attachment when starting a pod.

Bug 1298284 - Races in persistent volume attachment when starting a pod.

Summary: Races in persistent volume attachment when starting a pod.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jan Safranek
QA Contact:	Liang Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-13 16:19 UTC by Jan Safranek
Modified:	2016-05-12 17:10 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-12 17:10:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jan Safranek 2016-01-13 16:19:40 UTC

Description of problem:
When a pod uses a PersistentVolumeClaim, this can happen:



Version-Release number of selected component (if applicable):
origin-3.1.0

How reproducible:
~30%-50%

Steps to Reproduce:
1. create a PV and a claim (I use Cinder volumes, but I saw it on AWS and GCE too)
2. create a pod that uses the claim
3. In a loop:
  3.1 create the pod
  3.2 wait until it's running
  3.3 run 'kubectl describe pods'
  3.4 delete it
  3.5 wait until the volume is unmounted and detached from the node (this is important!)

Actual results:
at step 3.3, you can see it was necessary to restart the pod container(s), "kubectl describe pods" shows something like:

Error syncing pod, skipping: not all containers have started: 0 != 1

Expected results:
The containers start on the first try.

Additional info:
The pod is started eventually, it's just slower (~1 minute in my OpenStack setup, attaching a volume is slow...).

Fix: https://github.com/kubernetes/kubernetes/pull/19600

Comment 1 Mark Turansky 2016-01-13 16:24:56 UTC

In a conversation with Jan, we determined this is not a blocker because kubelet and the volume eventually reach the correct state.  It might take a few minutes to reconcile, making it a bad UX but not a blocker.

Comment 2 Mark Turansky 2016-02-04 13:11:40 UTC

Upstream PR is merged.  Awaiting rebase into Origin.

Comment 3 Jan Safranek 2016-02-08 11:10:21 UTC

In case there is no rebase I filled Origin PR: https://github.com/openshift/origin/pull/7107

Comment 4 Jan Safranek 2016-02-09 14:38:54 UTC

Origin PR merged

Comment 5 Jianwei Hou 2016-02-16 02:45:55 UTC

Verified with
openshift v1.1.2-260-gf556adc
kubernetes v1.2.0-origin
etcd 2.2.2+git

The issue described here is not reproducible, moving to verified.

Note You need to log in before you can comment on or make changes to this bug.