Bug 1453078 - Master service restart can cause volume/pv/pvc un-sync
Summary: Master service restart can cause volume/pv/pvc un-sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Matthew Wong
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-22 05:58 UTC by Liang Xia
Modified: 2017-08-16 19:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
master logs (2.28 MB, application/x-gzip)
2017-05-22 06:23 UTC, Liang Xia
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Liang Xia 2017-05-22 05:58:54 UTC
Description of problem:
Some volume provisioning failed with error "the resource XXX already exists" after master restart

Version-Release number of selected component (if applicable):
openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

How reproducible:
Very often, but not always

Steps to Reproduce:
1.Login to server and create a project.
$ oc login $server
$ oc new-project project-name

2.Keep creating pvc.
$ for i in {1..100} ; do
  cp pvc.json pvc-$i.json
  sed -i "s/#NAME#/pvc-$RANDOM/" pvc-$i.json
  oc create -f pvc-$i.json
  rm -f pvc-$i.json
done

3.Restart master service.
$ systemctl restart atomic-openshift-master

4.Wait for several minutes.

5.Check pvc status.
$ oc describe pvc

Actual results:
$ oc describe pvc
Name:        pvc-959
Namespace:    lxiap
StorageClass:    
Status:        Pending
Volume:        
Labels:        name=dynamic-pvc
Annotations:    volume.alpha.kubernetes.io/storage-class=foo
        volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd
Capacity:    
Access Modes:    
Events:
  FirstSeen    LastSeen    Count    From                SubObjectPath    Type        Reason            Message
  ---------    --------    -----    ----                -------------    --------    ------            -------
  8m        8m        1    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  8m        8m        2    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  7m        6m        5    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  6m        6m        2    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  6m        6m        1    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  5m        5m        1    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  5m        11s        23    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists


Expected results:
The pvc should reuse existing volume, see https://github.com/kubernetes/kubernetes/pull/38702

Additional info:
$ cat pvc.json 
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "#NAME#",
    "annotations": {
        "volume.alpha.kubernetes.io/storage-class": "foo"
    },
    "labels": {
        "name": "dynamic-pvc"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}

Comment 1 Liang Xia 2017-05-22 06:23:02 UTC
Created attachment 1280907 [details]
master logs

journalctl -u atomic-openshift-master > atomic-openshift-master.log

Comment 3 Matthew Wong 2017-05-23 22:08:18 UTC
The bug is that we check for the alreadyExists error too late; we should check it right after gce.service.Disks.Insert(gce.projectID, zone, diskToCreate).Do(), not gce.waitForZoneOp(createOp, zone).

Comment 4 Matthew Wong 2017-05-24 18:46:18 UTC
https://github.com/openshift/origin/pull/14329

Comment 6 Liang Xia 2017-06-05 08:30:36 UTC
Verified the fix works on OCP v3.6.86

Used the same steps as in #comment 0, all pvc can become bound.
After pvcs are deleted, pv and volumes are deleted automatically.

Comment 8 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.