Bug 1555062
Summary: | glusterfs-storage-block: cannot create 100 PVCs | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Hongkai Liu <hongkliu> |
Component: | heketi | Assignee: | Humble Chirammal <hchiramm> |
Status: | CLOSED DEFERRED | QA Contact: | Neha Berry <nberry> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.0 | CC: | akamra, aos-bugs, aos-storage-staff, ekuric, hchiramm, hongkliu, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rcyriac, rhs-bugs, rtalur, sankarshan, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-07-11 15:05:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1585985 | ||
Bug Blocks: |
Description
Hongkai Liu
2018-03-13 21:56:57 UTC
I tested again with the latest version of ocp and cns: Details are recorded here: http://file.rdu.redhat.com/~hongkliu/test_result/bz1555062/20180622/terminal.txt This time it reached to about 400 gluster-block PVCs (much better than before) although the process is VERY VERY slow. I was hoping that it took about 10 secs to create a PVC. Issue A: During the time of creation, I started deleting the created PVCs. Then heketi showed the logs with the pattern in Comment 14. heketi log: http://file.rdu.redhat.com/~hongkliu/test_result/bz1555062/20180622/hkt.log The above terminal.txt shows the results of "systemctl status gluster-blockd.service" in each glusterfs pod. All of them are DEAD. Issue B: We have lots of "glusterd.service failed" in heketi logs even before I start to delete. At those time, I check the service status in the pod, they are all running. Not sure where heketi got failed status. Issue C: By watching the logs, as the number of PVCs went big, there was a HUGE time gap between heketi succeeded on creation of the block and k8s attached the PVC to the pod. What did happen in between? At the beginning of the test, this time span is almost instant. Thanks Hongkai for your latest update ( comment#17). It looks like the initial issue reported in this bugzilla got resolution and there is another issue now. Can we open a new bugzilla with the details in comment#17? That would help us to track this better. @Humble, Probably I should not have listed the issues all together. Maybe focus on the creation of gluster-block PVCs in this bz. I retried again. PVC creation after about 90 is VERY slow (sometimes over 10 minutes even for ONE PVC). Is this expected? I do not feel the current behavior is reasonable. In my opinion, the time spent on the 1st PVC and the 101st PVC should be more or less the same. But it is quite different in the test. Prasanna, Need you to look at Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: self.shell.log.info("Configuration saved to %s" % savefile) Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: File "/usr/lib/python2.7/site-packages/configshell_fb/log.py", line 147, in info Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: self._log('info', msg) Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: File "/usr/lib/python2.7/site-packages/configshell_fb/log.py", line 116, in _log Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: self.con.display(msg, error=error) Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: File "/usr/lib/python2.7/site-packages/configshell_fb/console.py", line 160, in display Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: self.raw_write(text, output=output) Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: File "/usr/lib/python2.7/site-packages/configshell_fb/console.py", line 143, in raw_write Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: output.flush() Jun 22 19:22:53 ip-172-31-27-102.us-west-2.compute.internal gluster-blockd[429]: IOError: [Errno 32] Broken pipe Is this a backtrace in configshell? Is this fixed in any of the latest builds? Hongkai Liu, Would it be possible for you to test it at multiple layers? 1. First, see how many blockvolume create operations can be performed in a loop using glusterblock cli. 2. Do the same with heketi-cli. 3. Do the same with PV create operations. This will help us identify where the bottle neck is. I can do it on my machine but considering that you are using cloud provider, my results might be very different than yours. Could you please provide the test scripts/commands for comment 23? Thanks. (In reply to Hongkai Liu from comment #24) > Could you please provide the test scripts/commands for comment 23? Thanks. 1. Testing at gluster-block layer Create one block PV using Openshift. Use heketi volume list to find the block hosting volume. You will use the block hosting volume name in next command. $ oc rsh glusterfs-pod $ gluster-block create <FILE_VOLUME_NAME>/<ANY_NAME_FOR_BLOCKVOL> ha <HA_COUNT> auth <ENABLE/DISABLE> prealloc full <BLOCKHOST1,BLOCKHOST2,BLOCKHOST3> <SIZE_INT>GiB --json <----- This command has to be performed in a loop but also keeping the free size in hosting volume in mind. 2. Let us skip heketi layer for now. 3. Same test as you performed before. Hongkai, I request you to test the above with the latest builds. @Raghavendra that is similar to Comment 8, isnt it? I will redo it with the latest CNS again. (In reply to Hongkai Liu from comment #27) > @Raghavendra > > that is similar to Comment 8, isnt it? yes, indeed! > I will redo it with the latest CNS again. Thanks! 1. Testing at gluster-block layer ### about 23 mins to finish creating 300 blocks and about 39 mins to finish deleting them. The version info and the command output: http://file.rdu.redhat.com/~hongkliu/test_result/bz1555062/20180702/version.terminal.output.txt 3. Same test as you performed before. ### about 3 hours to finish creating 300 pods with PVCs of gluster-block. It was every slow from 38 to 43. http://file.rdu.redhat.com/~hongkliu/test_result/bz1555062/20180702/pvc.log |