Bug 1559834

Summary: Created 219 volumes of 1gb each but space utilized is greater than 1Tb
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachael <rgeorge>
Component: heketiAssignee: Michael Adam <madam>
Status: CLOSED DUPLICATE QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.9CC: hchiramm, kramdoss, pprakash, rcyriac, rhs-bugs, rtalur, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-23 20:28:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1641915    

Description Rachael 2018-03-23 11:41:10 UTC
Description of problem:
A script was run to create 300 PVCs of 1Gb each. Each node has two devices one of 1Tb and the other of 50Gb. After 219 volumes the heketi logs show no space error. Heketi topology info also shows 0 free space across all devices on all the three nodes.


Version-Release number of selected component (if applicable):
rhgs-volmanager-rhel7   v3.9.0

How reproducible:

Actual results:
PVC creation should be successful

Expected results:
PVC creation fails due to no space

Comment 2 Rachael 2018-03-23 11:51:34 UTC
Logs are available here: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1559834/log/

Comment 3 Rachael 2018-03-26 13:58:40 UTC
Updated the logs with heketi.db and script used for pvc creation:  http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1559834/log/

Comment 4 Raghavendra Talur 2018-03-27 08:52:39 UTC
Looking through the heketi logs, I see

$ grep "Started async operation: Create Volume" heketi.log  | wc -l
1062
$ grep "Started POST /volumes" heketi.log   | wc -l
21624

I think it means, there are 1062 requests that are accepted by heketi for volume creation. Also, the number of requests that have reached negroni for volume create OR volume expand are 21624. It is either the case that so many PVC requests were made or the openshift storage provisioner requested so many as retry mechanism.

I need logs from provisioner to debug further.

Comment 5 Raghavendra Talur 2018-03-27 13:26:53 UTC
oc describe pod heketi.....


  Type     Reason         Age                  From                                        Message                     
  ----     ------         ----                 ----                                        -------                     
  Warning  InspectFailed  1h (x19 over 5h)     kubelet, dhcp47-160.lab.eng.blr.redhat.com  Failed to inspect image "rhgs3/rhgs-volmanager-rhel7:v3.9.0": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  Unhealthy      59m (x380 over 23h)  kubelet, dhcp47-160.lab.eng.blr.redhat.com  Liveness probe failed: Get http://10.129.0.8:8080/hello: dial tcp 10.129.0.8:8080: getsockopt: connection refused
  Warning  Failed         38m (x55 over 6h)    kubelet, dhcp47-160.lab.eng.blr.redhat.com  Error: context deadline exceeded
  Normal   Pulled         34m (x257 over 23h)  kubelet, dhcp47-160.lab.eng.blr.redhat.com  Container image "rhgs3/rhgs-volmanager-rhel7:v3.9.0" already present on machine
  Normal   Killing        13m (x353 over 23h)  kubelet, dhcp47-160.lab.eng.blr.redhat.com  Killing container with id docker://heketi:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy      9m (x2760 over 23h)  kubelet, dhcp47-160.lab.eng.blr.redhat.com  Readiness probe failed: Get http://10.129.0.8:8080/hello: dial tcp 10.129.0.8:8080: getsockopt: connection refused
  Normal   Created        3m (x194 over 23h)   kubelet, dhcp47-160.lab.eng.blr.redhat.com  Created container

Comment 6 Rachael 2018-03-27 13:36:47 UTC
Logs from provisioner added to   http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1559834/log/

Comment 19 Raghavendra Talur 2019-01-23 20:28:50 UTC

*** This bug has been marked as a duplicate of bug 1554467 ***