Bug 1453078
| Summary: | Master service restart can cause volume/pv/pvc un-sync | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Liang Xia <lxia> | ||||
| Component: | Storage | Assignee: | Matthew Wong <mawong> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.6.0 | CC: | aos-bugs, mawong, smunilla | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-10 05:25:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1280907 [details]
master logs
journalctl -u atomic-openshift-master > atomic-openshift-master.log
The bug is that we check for the alreadyExists error too late; we should check it right after gce.service.Disks.Insert(gce.projectID, zone, diskToCreate).Do(), not gce.waitForZoneOp(createOp, zone). Verified the fix works on OCP v3.6.86 Used the same steps as in #comment 0, all pvc can become bound. After pvcs are deleted, pv and volumes are deleted automatically. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |
Description of problem: Some volume provisioning failed with error "the resource XXX already exists" after master restart Version-Release number of selected component (if applicable): openshift v3.6.74 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 How reproducible: Very often, but not always Steps to Reproduce: 1.Login to server and create a project. $ oc login $server $ oc new-project project-name 2.Keep creating pvc. $ for i in {1..100} ; do cp pvc.json pvc-$i.json sed -i "s/#NAME#/pvc-$RANDOM/" pvc-$i.json oc create -f pvc-$i.json rm -f pvc-$i.json done 3.Restart master service. $ systemctl restart atomic-openshift-master 4.Wait for several minutes. 5.Check pvc status. $ oc describe pvc Actual results: $ oc describe pvc Name: pvc-959 Namespace: lxiap StorageClass: Status: Pending Volume: Labels: name=dynamic-pvc Annotations: volume.alpha.kubernetes.io/storage-class=foo volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd Capacity: Access Modes: Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8m 8m 1 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 8m 8m 2 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 7m 6m 5 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 6m 6m 2 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 6m 6m 1 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 5m 5m 1 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists 5m 11s 23 persistent-volume-controller Warning ProvisioningFailed Failed to provision volume with StorageClass "": googleapi: Error 409: The resource 'projects/openshift-gce-devel/zones/us-central1-a/disks/kubernetes-dynamic-pvc-dd23a7dd-3e9d-11e7-8885-42010af00002' already exists, alreadyExists Expected results: The pvc should reuse existing volume, see https://github.com/kubernetes/kubernetes/pull/38702 Additional info: $ cat pvc.json { "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "#NAME#", "annotations": { "volume.alpha.kubernetes.io/storage-class": "foo" }, "labels": { "name": "dynamic-pvc" } }, "spec": { "accessModes": [ "ReadWriteOnce" ], "resources": { "requests": { "storage": "1Gi" } } } }