| Summary: | PVC fails to bind | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Bob Kozdemba <bkozdemb> | ||||||||||||
| Component: | Storage | Assignee: | Jan Safranek <jsafrane> | ||||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Jianwei Hou <jhou> | ||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 3.2.1 | CC: | aos-bugs, bkozdemb, eparis, hchen, jsafrane, mturansk, plarsen, wmeng | ||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2016-09-29 14:13:35 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Bob Kozdemba
2016-08-24 23:12:10 UTC
Bob, please attach this information: - output of `oc get pv -o yaml` - output of `oc get pvc -o yaml` - full logs of openshift master from creation of the PV to the reported failure, preferably with log level 5. @mturansk, does it ring a bell? Created attachment 1194047 [details]
output of 'journalctl -u atomic-openshift-master -le'
This morning I powered up the vm and the pvc from last night did eventually bind. So I deleted it and created a new one and the new pvc has been in pending status for a several minutes (~11 minutes). If it eventually succeeds w/o a reboot, I'll report back. How can I increase the log level for the master service? I think you need to add or edit this line in /etc/sysconfig/openshift-master: OPTIONS=--loglevel=5 And for deal with binding problems, please wait at least 15 minutes, which is a periodic sync interval that should re-try any failed binding. It's much better in 3.3, where it's I think 15 seconds and things recover from error much faster. Created attachment 1194056 [details]
journalctl-level5.txt
This is from a different system that exhibits the same behaviour.
The error message "spec is immutable once a claim has been bound to a volume" during PVC binder addClaim() is a sign of race condition that is fixed in kubernetes 1.3 (openshift 3.3). (In reply to Jan Safranek from comment #6) > And for deal with binding problems, please wait at least 15 minutes, which > is a periodic sync interval that should re-try any failed binding. It's much > better in 3.3, where it's I think 15 seconds and things recover from error > much faster. Still pending after 4 hours. I'll try a reboot. [cloud-user@master ~]$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE claim1 Pending pv0001 0 4h [cloud-user@master ~]$ (In reply to Bob Kozdemba from comment #9) > (In reply to Jan Safranek from comment #6) > > And for deal with binding problems, please wait at least 15 minutes, which > > is a periodic sync interval that should re-try any failed binding. It's much > > better in 3.3, where it's I think 15 seconds and things recover from error > > much faster. > > Still pending after 4 hours. I'll try a reboot. > > [cloud-user@master ~]$ oc get pvc > NAME STATUS VOLUME CAPACITY ACCESSMODES AGE > claim1 Pending pv0001 0 4h > [cloud-user@master ~]$ The reboot did not help but deleting and re-creating the pv and pvc worked. This environment is hosted by Ravello/Google. I will try a fresh install on my libvirt laptop and report back. Please attach also out put of: $ oc get pv -o yaml $ oc get pvc -o yaml Created attachment 1194332 [details]
pvc
Created attachment 1194333 [details]
pv
these are *input* PV/PVCs, I need to see their full status and spec *after* the bug occurs (i.e. you can see "Error updating claim with VolumeName" in logs). Really, please run $ oc get pv -o yaml $ oc get pvc -o yaml (In reply to Jan Safranek from comment #14) > these are *input* PV/PVCs, I need to see their full status and spec *after* > the bug occurs (i.e. you can see "Error updating claim with VolumeName" in > logs). > > Really, please run > $ oc get pv -o yaml > $ oc get pvc -o yaml Sorry about that. Next time it fails, I'll post the output. Created attachment 1194504 [details]
journalctl -u atomic-openshift-master -le
Just had a failure.
# oc get pv pv02 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
creationTimestamp: 2016-08-26T19:35:10Z
name: pv02
resourceVersion: "9906"
selfLink: /api/v1/persistentvolumes/pv02
uid: 337e1ff8-6bc4-11e6-a36d-525400f2f8cb
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: logging-es-1
namespace: logging
resourceVersion: "9904"
uid: 0222db92-6bd9-11e6-8283-525400f2f8cb
nfs:
path: /storage/pv/pv02
server: allinone.koz.laptop
persistentVolumeReclaimPolicy: Recycle
status:
phase: Bound
[root@allinone ~]#
[root@allinone ~]# oc get pvc logging-es-1 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: 2016-08-26T22:04:06Z
name: logging-es-1
namespace: logging
resourceVersion: "9904"
selfLink: /api/v1/namespaces/logging/persistentvolumeclaims/logging-es-1
uid: 0222db92-6bd9-11e6-8283-525400f2f8cb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: logging-volume
status:
phase: Pending
[root@allinone ~]#
Thanks for the logs! Summary: - pv02 is recycled ~5 minutes before it's bound to logging-es-1. This should not matter, posting here just in case it does. - "Synchronizing PersistentVolumeClaim[logging-es-1] for binding" tries to bind pv02 to logging-es-1. pv02 is bound just fine (valid claimref + status), however PVC logging-es-1 already has volumeName: logging-volume. No clue how it got here. Bob, what is PV named "logging-volume"? I don't see it in the logs, neither I see PVC logging-es-1 getting bound to it. This is exactly the situation we solved with volume controller rewrite in 3.3, however, I've never seen it "live" in 3.2. It should be pretty hard to get a PVC that is half-bound to a PV and you seem to reproduce it pretty regularly. Can I get access to your cluster? Or get all PVs/PVCs and full master log (from startup) even if it is long? cc @mark, does it ring a bell? How can a PVC be bound to a PV (i.e. pvc.Spec.VolumeName is set) and still have pvc.Status.Phase = Pending in the old volume controller? It should be Bound. (In reply to Jan Safranek from comment #19) > cc @mark, does it ring a bell? How can a PVC be bound to a PV (i.e. > pvc.Spec.VolumeName is set) and still have pvc.Status.Phase = Pending in the > old volume controller? It should be Bound. According to the error, there's a problem writing the data to etcd: Forbidden: spec is immutable once a claim has been bound to a volume. This would indicate a locking/timestamp issue, or a race condition where multiple threads have different versions of the record being written out of sequence. (In reply to Jan Safranek from comment #18) > Thanks for the logs! > > Summary: > > - pv02 is recycled ~5 minutes before it's bound to logging-es-1. This should > not matter, posting here just in case it does. > > - "Synchronizing PersistentVolumeClaim[logging-es-1] for binding" tries to > bind pv02 to logging-es-1. pv02 is bound just fine (valid claimref + > status), however PVC logging-es-1 already has volumeName: logging-volume. No > clue how it got here. > > Bob, what is PV named "logging-volume"? I don't see it in the logs, neither > I see PVC logging-es-1 getting bound to it. > > This is exactly the situation we solved with volume controller rewrite in > 3.3, however, I've never seen it "live" in 3.2. It should be pretty hard to > get a PVC that is half-bound to a PV and you seem to reproduce it pretty > regularly. Can I get access to your cluster? Or get all PVs/PVCs and full > master log (from startup) even if it is long? Hi Jan: I've been able to reproduce the problem in a cloud VM and on my laptop. Since I'm traveling the next few days it would be best if I give you access to the cloud VM. Send your public ssh key to bkozdemb and I'll set you up. Bob I asked for access to the machines on Sep 9th privately and Bob was not able to reproduce the issue since then. I am closing this bug, feel free to reopen it when you stumble upon it again. |