Created attachment 1382512 [details] logs from heketi pod Description of problem: I installed a gluster cluster (CRS) https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#Deploy_CRS with these docs. I then deployed heketi on OCP cns-deploy -n storage-project -s ../id_gluster gluster-topology.json --admin-key makemestorage --user-key givemeinfo -w 1000000 It gave me a Storage Class I can add to OCP (saw that it was using the "user" account rather than the admin account, but thought I'd test anyway) So I added it, created a PVC using that SC and noticed that it took a very long time to provision. So I opened the heketi pod logs and saw that it was creating the volumes and creating the gluster bricks. Then I checked in heketi through heketi-cli if any volumes were registered, and saw a whopping 57 volumes for the PVC. I then logged on to the current elected OCP master to check the controller logs and saw that the heketi API was responding with "Error: Administrator access required" I then removed the PVC, deleted all the volumes (not heketidb volume) and used the heketi-cli to create a volume using the user creds (not admin) I got the same error about adminstrator access being created. But having tailed the heketi pod logs I saw that It had responded with 401 Unauthorized but immediately started creating the volumes for my create volume request. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: Read my description of the problem. Actual results: creates a volume although I don't have the permissions for it. Expected results: To not create the volume Additional info: glusterfs-rdma-3.8.4-54.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.2.el7rhgs.noarch glusterfs-cli-3.8.4-54.el7rhgs.x86_64 python-gluster-3.8.4-54.el7rhgs.noarch glusterfs-libs-3.8.4-54.el7rhgs.x86_64 glusterfs-3.8.4-54.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch gluster-block-0.2.1-14.el7rhgs.x86_64 glusterfs-server-3.8.4-54.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.7.x86_64 glusterfs-geo-replication-3.8.4-54.el7rhgs.x86_64 glusterfs-api-3.8.4-54.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 heketi version 5.0.0-19
I tried to reproduce this issue quickly and could see that, the volume creation is NOT happening in the backend. It gives below logs and no volume creation happening. ~~~~ [negroni] Started POST /volumes [negroni] Completed 401 Unauthorized in 133.899µs ~~~~ May be the volume creation you noticed were happening for older requests in heketi's queue. Can you please reproduce this issue in a fresh setup with just one brand new volume create request from heketi-cli ?
Hi, [negroni] Started GET /queue/0bb391c331634212c89960887a977210 [negroni] Completed 401 Unauthorized in 86.306µs [heketi] INFO 2018/01/17 14:10:15 Creating brick 33a4e24986c6e32fe01e8954a8a7fdbf [heketi] INFO 2018/01/17 14:10:15 Creating brick eb70b6bb5883ecdee2251ec4aa6a95cf [heketi] INFO 2018/01/17 14:10:15 Creating brick ba36c08c06e4e3c93461c6394b7eff12 [sshexec] DEBUG 2018/01/17 14:10:15 /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:176: Host: 192.168.1.7:22 Command: /bin/bash -c 'mkdir -p /var/lib/heketi/mounts/vg_0db132fd5508e6a3598e62a97fcf0e17/brick_eb70b6bb5883ecdee2251ec4aa6a95cf' This is output from my logs. And I am 100% sure that its not an older request, because it was a new environment and i only have 1 or 2 pvc request in the openshift cluster, not 50+. I don't have a test cluster available at the moment to try this on unfortunately.
(In reply to Takeshi Larsson from comment #7) > Hi, > > [negroni] Started GET /queue/0bb391c331634212c89960887a977210 > [negroni] Completed 401 Unauthorized in 86.306µs > [heketi] INFO 2018/01/17 14:10:15 Creating brick > 33a4e24986c6e32fe01e8954a8a7fdbf > [heketi] INFO 2018/01/17 14:10:15 Creating brick > eb70b6bb5883ecdee2251ec4aa6a95cf > [heketi] INFO 2018/01/17 14:10:15 Creating brick > ba36c08c06e4e3c93461c6394b7eff12 > [sshexec] DEBUG 2018/01/17 14:10:15 > /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:176: Host: 192.168.1.7:22 > Command: /bin/bash -c 'mkdir -p > /var/lib/heketi/mounts/vg_0db132fd5508e6a3598e62a97fcf0e17/ > brick_eb70b6bb5883ecdee2251ec4aa6a95cf' > > This is output from my logs. And I am 100% sure that its not an older > request, because it was a new environment and i only have 1 or 2 pvc > request in the openshift cluster, not 50+. > > I don't have a test cluster available at the moment to try this on > unfortunately. Takashi, if there is one previously failed request exist, it can actually cause 50+ requests until it satisfy as it keep trying till the claim is satisfied. As you dont have a setup to reproduce this issue, I think we need to reopen this issue if someone hit this again. Unfortunately I am not able to reproduce this issue. What do you think ?
Alright, sure. I'll put this in my todo for now and reopen if I can recreate it. I did however reproduce this bug twice when I hit this issue first. Thanks for trying :)
(In reply to Takeshi Larsson from comment #9) > Alright, sure. I'll put this in my todo for now and reopen if I can recreate > it. > I did however reproduce this bug twice when I hit this issue first. > Thanks for trying :) But one thing to note here is, it could be also specific to the setup you mentioned. I see you have some configuration in cns-deploy where you mentioned admin and user key. I am also interested to know the Storageclass used for this PVC creation. Also, I see you referred CRS setups. so, I am not 100% sure there is no BUG. As said above, it could be specific to the setup you followed and may be if I follow the same I could also hit this. What I can do now is, I will remove the release flags from this bug and keep it open for some more time. If you are or I am able to hit it, we will revisit this bug. I need to check some more before I say there is A/NO bug.
This might be related to bz 1441708. We will most likely not be able to include a fix in cns-3.10, so I'm moving this out to cns-3.11 for re-review.