Description of problem: Some volume provisioning failed with error "the resource XXX already exists" after master restart Version-Release number of selected component (if applicable): openshift v3.6.74 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 How reproducible: Very often, but not always Steps to Reproduce: 1.Login to server and create a project. $ oc login $server $ oc new-project project-name 2.Keep creating pvc. $ for i in {1..100} ; do cp pvc.json pvc-$i.json sed -i "s/#NAME#/pvc-$RANDOM/" pvc-$i.json oc create -f pvc-$i.json rm -f pvc-$i.json done 3.Restart master service. $ systemctl restart atomic-openshift-master 4.Wait for several minutes. 5.Check pvc status. $ oc describe pvc Actual results: $ oc describe pvc Name: pvc-959 Namespace: lxiap StorageClass: Status: Pending Volume: Labels: name=dynamic-pvc Annotations: volume.alpha.kubernetes.io/storage-class=foo volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd Capacity: Access Modes: Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8m 8m 1 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 8m 8m 2 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 7m 6m 5 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 6m 6m 2 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 6m 6m 1 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 5m 5m 1 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 5m 11s 23 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists Expected results: The pvc should reuse existing volume, see https://github.com/kubernetes/kubernetes/pull/38702 Additional info: $ cat pvc.json { "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "#NAME#", "annotations": { "volume.alpha.kubernetes.io/storage-class": "foo" }, "labels": { "name": "dynamic-pvc" } }, "spec": { "accessModes": [ "ReadWriteOnce" ], "resources": { "requests": { "storage": "1Gi" } } } }
Created attachment 1280907 [details] master logs journalctl -u atomic-openshift-master > atomic-openshift-master.log
The bug is that we check for the alreadyExists error too late; we should check it right after gce.service.Disks.Insert(gce.projectID, zone, diskToCreate).Do(), not gce.waitForZoneOp(createOp, zone).
https://github.com/openshift/origin/pull/14329
Verified the fix works on OCP v3.6.86 Used the same steps as in #comment 0, all pvc can become bound. After pvcs are deleted, pv and volumes are deleted automatically.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716