Bug 1583168 - On creating PVCs in bulk, volume count mismatch between heketi and gluster backend(Gluster lists one extra volume)
Summary: On creating PVCs in bulk, volume count mismatch between heketi and gluster ba...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: John Mulligan
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: OCS-3.11.1-Engineering-Proposed-BZs OCS-3.11.1-devel-triage-done
TreeView+ depends on / blocked
 
Reported: 2018-05-28 11:47 UTC by Neha Berry
Modified: 2019-03-12 20:09 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-12 20:09:48 UTC
Embargoed:


Attachments (Terms of Use)

Comment 11 Raghavendra Talur 2018-09-20 14:15:37 UTC
Going through the gluster cmd history, I found only create commands related to volume 13 in question.

[2018-05-28 14:19:07.157559]  : volume create fl_glusterfs_mongodb-13_0eccac84-6282-11e8-bce8-005056a52b66 replica 3 10.70.42.84:/var/lib/heketi/mounts/vg_ac1e7f8b95ff5e45dfc512ff80a39500/brick_059f53211e900cf48c2c1b6ec6a57292/brick 10.70.42.86:/var/lib/heketi/mounts/vg_0e56fba30532535400683bfba6418693/brick_16c4a50bda4066e6a22372fbd9be1e9a/brick 10.70.41.217:/var/lib/heketi/mounts/vg_3df896c762972fe426d32a583c98938d/brick_b09d463676bc7cf2aa568dd0861a2367/brick : SUCCESS
[2018-05-28 14:19:52.902926]  : volume start fl_glusterfs_mongodb-13_0eccac84-6282-11e8-bce8-005056a52b66 : SUCCESS
[2018-05-28 14:25:44.807127]  : volume create fl_glusterfs_mongodb-14_f7731709-6282-11e8-bce8-005056a52b66 replica 3 10.70.42.84:/var/lib/heketi/mounts/vg_ac1e7f8b95ff5e45dfc512ff80a39500/brick_88ef0b992801d59cd0aeea998faea381/brick 10.70.42.86:/var/lib/heketi/mounts/vg_0e56fba30532535400683bfba6418693/brick_9199142f5b8d9a044cd3d671e9942e5e/brick 10.70.41.217:/var/lib/heketi/mounts/vg_4e3c85737db8bb8de87e5e04465c37ef/brick_123e57134fbf9b8c9187eb0851a4d24b/brick : SUCCESS
[2018-05-28 14:25:55.191166]  : volume start fl_glusterfs_mongodb-14_f7731709-6282-11e8-bce8-005056a52b66 : SUCCESS
[2018-05-28 14:26:00.810355]  : volume create fl_glusterfs_mongodb-13_f77396c8-6282-11e8-bce8-005056a52b66 replica 3 10.70.42.84:/var/lib/heketi/mounts/vg_ea22c9a72381f27d14a7656721e62a0b/brick_80950cb457ab663b8e51ad4ed8b9f534/brick 10.70.42.86:/var/lib/heketi/mounts/vg_0e56fba30532535400683bfba6418693/brick_d018321a2ed997aef577f61f1568b8e0/brick 10.70.41.217:/var/lib/heketi/mounts/vg_4e3c85737db8bb8de87e5e04465c37ef/brick_1a8f98451ad975f958e25b99143df9a3/brick : SUCCESS
[2018-05-28 14:26:05.606277]  : volume start fl_glusterfs_mongodb-13_f77396c8-6282-11e8-bce8-005056a52b66 : SUCCESS
[2018-05-28 14:51:36.020415]  : v status fl_glusterfs_mongodb-13_0eccac84-6282-11e8-bce8-005056a52b66 : SUCCESS

Also, found the following logs in events

29m       29m       1         mongodb-13.1532d51460c932f9            PersistentVolumeClaim                                 Warning   ProvisioningFailed      persistentvolume-controller                  Failed to provision volume with StorageClass "gluster-container": failed to create volume: failed to create volume: Get http://172.31.45.8:8080/queue/0438e64f9bff557bc3b33b2bb6112d22: dial tcp 172.31.45.8:8080: getsockopt: connection refused
29m       29m       1         mongodb-13.1532d5158e4479d3            PersistentVolumeClaim                                 Warning   ProvisioningFailed      persistentvolume-controller                  Failed to provision volume with StorageClass "gluster-container": failed to create volume: failed to create volume: Post http://172.31.45.8:8080/volumes: dial tcp 172.31.45.8:8080: getsockopt: connection refused

Noticing that heketi logs start only at 14:25,

```
Heketi 6.0.0
[heketi] INFO 2018/05/28 14:25:21 Loaded kubernetes executor
```

it can be assumed that heketi pod rebooted between 
14:19:52.902926 and 14:25:55.191166. Hence, it is possible that heketi created the volume but failed before the provisioner queried the result. Throttling feature in heketi would reduce the occurence of such bugs. It does not fully fix it though.

The real fix would be to have a handle that can be used by provisioner and heketi to identify requests.


Note You need to log in before you can comment on or make changes to this bug.