Bug 1507628
Summary: | GlusterFS registry PVC not binding when default StorageClass specified | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hongkai Liu <hongkliu> |
Component: | Installer | Assignee: | Jose A. Rivera <jarrpa> |
Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.0 | CC: | ansverma, aos-bugs, hongkliu, jokerman, mifiedle, mmccomas |
Target Milestone: | --- | ||
Target Release: | 3.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-07-30 19:09:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hongkai Liu
2017-10-30 18:26:38 UTC
It might be related to the fact that the default storage class on the cluster with AWS instances is aws gb2. (In reply to Hongkai Liu from comment #4) > It might be related to the fact that the default storage class on the > cluster with AWS instances is aws gb2. Correction: gp2 To start, I'll note that use of this playbook has not been tested and is currently not supported. That doesn't mean I won't try to figure this out. :) Please provide the output of "oc describe pvc registry-glusterfs-claim". Also provide the output of "oc get pv" and "oc describe pv registry-glusterfs-volume" if said volume exists. Thanks. Please check the attachment: terminal output Probably that rings some bell already. At least, oc describe pvc registry-glusterfs-claim is ready there. BTW, what is the difference of config.yml and registry.yml? The readme says the latter has the same behaviors as the former. And I get confused there. :) Ah, right, I did actually read through that. I was evidently unable to keep the huge volume of info between the three attachments straight. :) Still, the other outputs would help. See if you can do the following: 1. Get a YAML definition for the registry-glusterfs-claim PVC "oc get pvc registry-glusterfs-claim -o yaml" 2. Delete the current PVC 3. Remove the metadata except for the name 4. Provide a storageClassName parameter of "" (empty string) 5. "oc create" the modified PVC YAML file. config.yml will setup a GlusterFS cluster managed by heketi and (by default) create a StorageClass that will use it. registry.yml will setup a GlusterFS cluster managed by heketi without a StorageClass (by default) AND it will create a volume that is intended for use as storage for a hosted registry. registry.yml uses all the same ansible as config.yml with slightly different defaults and then adds a few more tasks on top of that. Today I did not see the 2nd PVC registry-glusterfs-claim. But registry-claim is still there. So I run the commands agains that PVC. I seems that PVC is bound after the modification. How to fix the playbook? Can we expect the PVC is attached to docker registry pod after running the registry.yaml playbook? Or it just create the PVC and we need to attach manually? $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE registry-claim Pending gp2 10m $ oc get pvc registry-claim -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs creationTimestamp: 2017-10-31T15:20:03Z name: registry-claim namespace: default resourceVersion: "15700" selfLink: /api/v1/namespaces/default/persistentvolumeclaims/registry-claim uid: f809fb84-be4e-11e7-820e-02431c970084 spec: accessModes: - ReadWriteMany resources: requests: storage: 5Gi storageClassName: gp2 status: phase: Pending $ oc get pvc registry-claim -o yaml > registry-claim.yaml $ vi registry-claim.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: registry-claim spec: accessModes: - ReadWriteMany resources: requests: storage: 5Gi storageClassName: "" $ oc delete pvc registry-claim persistentvolumeclaim "registry-claim" deleted $ oc create -f registry-claim.yaml persistentvolumeclaim "registry-claim" created $ oc get pvc -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: 2017-10-31T15:35:46Z name: registry-claim namespace: default resourceVersion: "17488" selfLink: /api/v1/namespaces/default/persistentvolumeclaims/registry-claim uid: 2a459d9a-be51-11e7-820e-02431c970084 spec: accessModes: - ReadWriteMany resources: requests: storage: 5Gi storageClassName: "" volumeName: registry-volume status: accessModes: - ReadWriteMany capacity: storage: 5Gi phase: Bound kind: List metadata: resourceVersion: "" selfLink: "" $ oc get pv -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: 2017-10-31T15:20:01Z name: registry-volume namespace: "" resourceVersion: "17486" selfLink: /api/v1/persistentvolumes/registry-volume uid: f6e5d6d0-be4e-11e7-820e-02431c970084 spec: accessModes: - ReadWriteMany capacity: storage: 5Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: registry-claim namespace: default resourceVersion: "17484" uid: 2a459d9a-be51-11e7-820e-02431c970084 glusterfs: endpoints: glusterfs-registry-endpoints path: glusterfs-registry-volume persistentVolumeReclaimPolicy: Retain status: phase: Bound kind: List metadata: resourceVersion: "" selfLink: "" The general fix is to provide a storageClassName of "" (empty string) in the playbooks. I do not know how to do this, off the top of my head. The infra storage goes through a lot of layers, so I'd have to look carefully and see where the fix should go. I'm updating the BZ title to reflect the specific problem. I'll try to have someone look into this this week. Hi Jose, do you have any update on this? Thanks. Not at this time. I'll try to get to it this week. Next week I'll be traveling, so I can't guarantee my time then. (In reply to Jose A. Rivera from comment #12) > Not at this time. I'll try to get to it this week. Next week I'll be > traveling, so I can't guarantee my time then. Understood. Thanks for the update. I'm having difficulty replicating this issue. Can you still hit in on the latest OCP 3.10 builds? If so, can you give me the output of "oc get pvc <PVC> -o yaml" were PVC is the name of one of the non-binding PVCs? Hi Jose, I have not yet had the working env. for 3.10. Do you think it also makes sense to give it a try on 3.9? Thanks. Definitely does. Cool. I will test it today with 3.9. The playbook failed now. Will attach the ansible log and inventory file later. $ git log --oneline -1 b2999772f (HEAD -> release-3.9, tag: openshift-ansible-3.9.17-1, origin/release-3.9) Automatic commit of package [openshift-ansible] release [3.9.17-1]. # yum list installed | grep openshift atomic-openshift.x86_64 3.9.13-1.git.0.e0acf74.el7 $ dnf list ansible Last metadata expiration check: 25 days, 3:21:13 ago on Sat 10 Mar 2018 12:19:47 PM UTC. Installed Packages ansible.noarch 2.4.3.0-1.fc27 @updates $ ansible-playbook -i /tmp/2.file openshift-ansible/playbooks/openshift-glusterfs/registry.yml TASK [openshift_persistent_volumes : include_tasks] ************************************************************************** task path: /home/fedora/openshift-ansible/roles/openshift_persistent_volumes/tasks/main.yml:39 included: /home/fedora/openshift-ansible/roles/openshift_persistent_volumes/tasks/pvc.yml for ec2-34-215-64-176.us-west-2.compute.amazonaws.com TASK [openshift_persistent_volumes : Deploy PersistentVolumeClaim definitions] *********************************************** task path: /home/fedora/openshift-ansible/roles/openshift_persistent_volumes/tasks/pvc.yml:2 fatal: [ec2-34-215-64-176.us-west-2.compute.amazonaws.com]: FAILED! => { "changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'storageclass'" } Ah, I think I get it. Created the following PR: https://github.com/openshift/openshift-ansible/pull/7778 Can you backport it to 3.9 branch as well? I am not sure what it is going to happen when I run master playbook against 3.9 cluster. Thanks. I'll backport it once it gets merged to master. Cherry-pick PR created: https://github.com/openshift/openshift-ansible/pull/7782 The failing task passed. However, the pvc does not seem to be attached to the registry pod. Background of the test: The existing cluster has registry pod using aws-s3 as storage backend before running the playbook. $ git log --oneline -1 62289a155 (HEAD -> bz1507628) GlusterFS: Fix missing parameter for registry PVC root@ip-172-31-30-135: ~ # oc get pod NAME READY STATUS RESTARTS AGE docker-registry-1-4f4rl 1/1 Running 0 7h registry-console-1-p4pkc 1/1 Running 0 7h router-1-q8wq9 1/1 Running 0 7h root@ip-172-31-30-135: ~ # oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE registry-glusterfs-claim Bound registry-glusterfs-volume 100Gi RWX 7m root@ip-172-31-30-135: ~ # oc volumes docker-registry-1-4f4rl error: resource(s) were provided, but no name, label selector, or --all flag specified root@ip-172-31-30-135: ~ # oc volumes pod docker-registry-1-4f4rl pods/docker-registry-1-4f4rl empty directory as registry-storage mounted at /registry secret/registry-certificates as registry-certificates mounted at /etc/secrets secret/registry-config as docker-config mounted at /etc/registry secret/registry-token-lzqtl as registry-token-lzqtl mounted at /var/run/secrets/kubernetes.io/serviceaccount Hey, progress! Can you provide the log output from the successful run? I haven't dealt with a registry that was using s3 storage before, so I don't know how that is going to work... the ansible I wrote assumes that the registry is writing all data to /registry inside the container, hence if swapcopy is on I just do an rsync from that directory to a local directory where the GlusterFS volume is mounted. I'm not sure how this would be done with S3. We may need to just not support that for now. I did not collect the ansible log the first time. Have to delete project glusterfs and then rerun the playbook. Did not know how to recover it. Hopefully it can reveal some details that you want to see. Let me know otherwise. The log of the 2nd run will be attached. From what I know, aws s3 is configured for registry by a configMap mounted to /etc/registry/config.yml I believe the playbook has to change that configMap if we want to support that case (maybe even for other cases). ....ohhh, I see what's going on. I never had the chance to really flesh out the swap and swapcopy implementations so I don't think they will work as they are now. However, at this point we have resolved the initial problem of the BZ. At this time you have a GlusterFS volume, PV, and PVC ready to be used by an integrated registry. The catch here is that you have to manually perform the volume swap and any data copying yourself. This is, for now, working as designed. If you want to pursue that feature further, I'd ask you to create a new RFE BZ for it and we'll take it in as we have time. Hi Jose, I might misunderstand you here. The PV and PVC are created and bound. BUT, The PVC is not attached to the registry pod. It is "empty directory". See my comment 26 above. Yes. The user is left to attach the PVC to the pods themselves. This is as designed. Your original problem was merely that the PVC was stuck in Pending status. The swap and swapcopy options are not officially documented anywhere (GitHub README doesn't count!), have not been tested by me, and are thus not supported. OK. I see now. Probably we can use the same logic for the configuration file of the registry. I am fine with both cases. Thanks for the fix. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |