Bug 1298284 - Races in persistent volume attachment when starting a pod.
Races in persistent volume attachment when starting a pod.
Product: OpenShift Origin
Classification: Red Hat
Component: Storage (Show other bugs)
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Jan Safranek
Liang Xia
Depends On:
  Show dependency treegraph
Reported: 2016-01-13 11:19 EST by Jan Safranek
Modified: 2016-05-12 13:10 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-05-12 13:10:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jan Safranek 2016-01-13 11:19:40 EST
Description of problem:
When a pod uses a PersistentVolumeClaim, this can happen:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. create a PV and a claim (I use Cinder volumes, but I saw it on AWS and GCE too)
2. create a pod that uses the claim
3. In a loop:
  3.1 create the pod
  3.2 wait until it's running
  3.3 run 'kubectl describe pods'
  3.4 delete it
  3.5 wait until the volume is unmounted and detached from the node (this is important!)

Actual results:
at step 3.3, you can see it was necessary to restart the pod container(s), "kubectl describe pods" shows something like:

Error syncing pod, skipping: not all containers have started: 0 != 1

Expected results:
The containers start on the first try.

Additional info:
The pod is started eventually, it's just slower (~1 minute in my OpenStack setup, attaching a volume is slow...).

Fix: https://github.com/kubernetes/kubernetes/pull/19600
Comment 1 Mark Turansky 2016-01-13 11:24:56 EST
In a conversation with Jan, we determined this is not a blocker because kubelet and the volume eventually reach the correct state.  It might take a few minutes to reconcile, making it a bad UX but not a blocker.
Comment 2 Mark Turansky 2016-02-04 08:11:40 EST
Upstream PR is merged.  Awaiting rebase into Origin.
Comment 3 Jan Safranek 2016-02-08 06:10:21 EST
In case there is no rebase I filled Origin PR: https://github.com/openshift/origin/pull/7107
Comment 4 Jan Safranek 2016-02-09 09:38:54 EST
Origin PR merged
Comment 5 Jianwei Hou 2016-02-15 21:45:55 EST
Verified with
openshift v1.1.2-260-gf556adc
kubernetes v1.2.0-origin
etcd 2.2.2+git

The issue described here is not reproducible, moving to verified.

Note You need to log in before you can comment on or make changes to this bug.