Bug 1453078

Summary: Master service restart can cause volume/pv/pvc un-sync
Product: OpenShift Container Platform Reporter: Liang Xia <lxia>
Component: StorageAssignee: Matthew Wong <mawong>
Status: CLOSED ERRATA QA Contact: Liang Xia <lxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, mawong, smunilla
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 05:25:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
master logs none

Description Liang Xia 2017-05-22 05:58:54 UTC
Description of problem:
Some volume provisioning failed with error "the resource XXX already exists" after master restart

Version-Release number of selected component (if applicable):
openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

How reproducible:
Very often, but not always

Steps to Reproduce:
1.Login to server and create a project.
$ oc login $server
$ oc new-project project-name

2.Keep creating pvc.
$ for i in {1..100} ; do
  cp pvc.json pvc-$i.json
  sed -i "s/#NAME#/pvc-$RANDOM/" pvc-$i.json
  oc create -f pvc-$i.json
  rm -f pvc-$i.json
done

3.Restart master service.
$ systemctl restart atomic-openshift-master

4.Wait for several minutes.

5.Check pvc status.
$ oc describe pvc

Actual results:
$ oc describe pvc
Name:        pvc-959
Namespace:    lxiap
StorageClass:    
Status:        Pending
Volume:        
Labels:        name=dynamic-pvc
Annotations:    volume.alpha.kubernetes.io/storage-class=foo
        volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd
Capacity:    
Access Modes:    
Events:
  FirstSeen    LastSeen    Count    From                SubObjectPath    Type        Reason            Message
  ---------    --------    -----    ----                -------------    --------    ------            -------
  8m        8m        1    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  8m        8m        2    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  7m        6m        5    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  6m        6m        2    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  6m        6m        1    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  5m        5m        1    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists
  5m        11s        23    persistent-volume-controller            Warning        ProvisioningFailed    Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists


Expected results:
The pvc should reuse existing volume, see https://github.com/kubernetes/kubernetes/pull/38702

Additional info:
$ cat pvc.json 
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "#NAME#",
    "annotations": {
        "volume.alpha.kubernetes.io/storage-class": "foo"
    },
    "labels": {
        "name": "dynamic-pvc"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}

Comment 1 Liang Xia 2017-05-22 06:23:02 UTC
Created attachment 1280907 [details]
master logs

journalctl -u atomic-openshift-master > atomic-openshift-master.log

Comment 3 Matthew Wong 2017-05-23 22:08:18 UTC
The bug is that we check for the alreadyExists error too late; we should check it right after gce.service.Disks.Insert(gce.projectID, zone, diskToCreate).Do(), not gce.waitForZoneOp(createOp, zone).

Comment 4 Matthew Wong 2017-05-24 18:46:18 UTC
https://github.com/openshift/origin/pull/14329

Comment 6 Liang Xia 2017-06-05 08:30:36 UTC
Verified the fix works on OCP v3.6.86

Used the same steps as in #comment 0, all pvc can become bound.
After pvcs are deleted, pv and volumes are deleted automatically.

Comment 8 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716