Bug 1534146

Summary:

heketi volume create fails inspite of having sufficient free devices

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

krishnaram Karthick <kramdoss>

Component:

heketi

Assignee:

John Mulligan <jmulligan>

Status:

CLOSED CURRENTRELEASE

QA Contact:

krishnaram Karthick <kramdoss>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

cns-3.6

CC:

aprajapa, hchiramm, jcall, kramdoss, madam, pprakash, rcyriac, rhs-bugs, rtalur, storage-qa-internal

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-24 11:42:13 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
heketi_logs	none

Description krishnaram Karthick 2018-01-13 14:44:25 UTC

Description of problem:

While running one of our automation tests, it was found that heketi fails to create volume in spite of having free devices in each of the node. There seems to be an issue with the way heketi chooses the device from which a volume is created. 

Please go through the sequence of events below.

[root@dhcp46-207 ~]# heketi-cli volume list
Id:0560fa1f5ebc2c5beefb74c1cae695c3    Cluster:83767272f27df11472bb41f0318ba7a5    Name:vol_0560fa1f5ebc2c5beefb74c1cae695c3 [block]
Id:25521d472ab31a0770eeb897c0429424    Cluster:83767272f27df11472bb41f0318ba7a5    Name:heketidbstorage
[root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=96              --json
Error: Unable to execute command on glusterfs-631sd:   Volume group "vg_678e090ff1169da372984f54deecd810" has insufficient free space (24287 extents): 24576 required.
[root@dhcp46-207 ~]# heketi-cli node list
Id:83f4ae5b08a4f298419f57f844be4968	Cluster:83767272f27df11472bb41f0318ba7a5
Id:c3e3bb2db015271036585dc8a8cc70b0	Cluster:83767272f27df11472bb41f0318ba7a5
Id:df0f1ec19a211f5c33f3e2e457c31027	Cluster:83767272f27df11472bb41f0318ba7a5
Id:f999739b27b9506be50cb1a63ddfcad3	Cluster:83767272f27df11472bb41f0318ba7a5
[root@dhcp46-207 ~]# heketi-cli node info 83f4ae5b08a4f298419f57f844be4968
Node Id: 83f4ae5b08a4f298419f57f844be4968
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 1
Management Hostname: dhcp46-199.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.199
Devices:
Id:3afaa7d53c5df10b66f65ad85f6478d7   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):502     Free (GiB):97      
Id:a2b6eae009ca8818e0a096f4d4835f31   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):2       Free (GiB):97      
[root@dhcp46-207 ~]# heketi-cli node info c3e3bb2db015271036585dc8a8cc70b0
Node Id: c3e3bb2db015271036585dc8a8cc70b0
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 1
Management Hostname: dhcp46-193.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.193
Devices:
Id:a55564f7750b405cfa1c9cf5a19355d0   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
Id:acb84ee02e32288941b4898f843e3a28   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):2       Free (GiB):597     
[root@dhcp46-207 ~]# heketi-cli node info df0f1ec19a211f5c33f3e2e457c31027
Node Id: df0f1ec19a211f5c33f3e2e457c31027
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 2
Management Hostname: dhcp46-197.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.197
Devices:
Id:502e4028f5213bffecb51438b22f0702   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
Id:678e090ff1169da372984f54deecd810   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):502     Free (GiB):97      
[root@dhcp46-207 ~]# heketi-cli node info f999739b27b9506be50cb1a63ddfcad3
Node Id: f999739b27b9506be50cb1a63ddfcad3
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 2
Management Hostname: dhcp46-201.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.201
Devices:
Id:a8e913f6e09833e8acaaab0fbb98cbba   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):502     Free (GiB):97      
Id:bcef7506a6724c8498cc46c42c239d97   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):2       Free (GiB):97      
[root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=95 --json
Error: Unable to execute command on glusterfs-ttjvf:   Volume group "vg_3afaa7d53c5df10b66f65ad85f6478d7" has insufficient free space (24287 extents): 24320 required.
[root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=94 --json
{"size":94,"name":"vol_b86607e1e242370e52876e3e31b2825a","durability":{"type":"replicate","replicate":{"replica":3},"disperse":{"data":4,"redundancy":2}},"snapshot":{"enable":false,"factor":1},"id":"b86607e1e242370e52876e3e31b2825a","cluster":"83767272f27df11472bb41f0318ba7a5","mount":{"glusterfs":{"hosts":["10.70.46.193","10.70.46.197","10.70.46.199","10.70.46.201"],"device":"10.70.46.193:vol_b86607e1e242370e52876e3e31b2825a","options":{"backup-volfile-servers":"10.70.46.197,10.70.46.199,10.70.46.201"}}},"blockinfo":{},"bricks":[{"id":"92a414343f4c019b9f938316146c29c0","path":"/var/lib/heketi/mounts/vg_a2b6eae009ca8818e0a096f4d4835f31/brick_92a414343f4c019b9f938316146c29c0/brick","device":"a2b6eae009ca8818e0a096f4d4835f31","node":"83f4ae5b08a4f298419f57f844be4968","volume":"b86607e1e242370e52876e3e31b2825a","size":98566144},{"id":"a64b35d069fe5795c29854693aaf10ef","path":"/var/lib/heketi/mounts/vg_678e090ff1169da372984f54deecd810/brick_a64b35d069fe5795c29854693aaf10ef/brick","device":"678e090ff1169da372984f54deecd810","node":"df0f1ec19a211f5c33f3e2e457c31027","volume":"b86607e1e242370e52876e3e31b2825a","size":98566144},{"id":"e15fcdc988af50ff6e966bcfb4e2cc7f","path":"/var/lib/heketi/mounts/vg_a55564f7750b405cfa1c9cf5a19355d0/brick_e15fcdc988af50ff6e966bcfb4e2cc7f/brick","device":"a55564f7750b405cfa1c9cf5a19355d0","node":"c3e3bb2db015271036585dc8a8cc70b0","volume":"b86607e1e242370e52876e3e31b2825a","size":98566144}]}[root@dhcp46-207 ~]# 
[root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=2 --json
Error: Unable to execute command on glusterfs-631sd:   Volume group "vg_678e090ff1169da372984f54deecd810" has insufficient free space (102 extents): 512 required.
[root@dhcp46-207 ~]# 
[root@dhcp46-207 ~]# 
[root@dhcp46-207 ~]# heketi-cli node info f999739b27b9506be50cb1a63ddfcad3
Node Id: f999739b27b9506be50cb1a63ddfcad3
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 2
Management Hostname: dhcp46-201.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.201
Devices:
Id:a8e913f6e09833e8acaaab0fbb98cbba   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):502     Free (GiB):97      
Id:bcef7506a6724c8498cc46c42c239d97   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):2       Free (GiB):97      
[root@dhcp46-207 ~]# heketi-cli node info df0f1ec19a211f5c33f3e2e457c31027
Node Id: df0f1ec19a211f5c33f3e2e457c31027
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 2
Management Hostname: dhcp46-197.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.197
Devices:
Id:502e4028f5213bffecb51438b22f0702   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
Id:678e090ff1169da372984f54deecd810   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):596     Free (GiB):2       
[root@dhcp46-207 ~]# heketi-cli node info c3e3bb2db015271036585dc8a8cc70b0
Node Id: c3e3bb2db015271036585dc8a8cc70b0
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 1
Management Hostname: dhcp46-193.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.193
Devices:
Id:a55564f7750b405cfa1c9cf5a19355d0   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):94      Free (GiB):5       
Id:acb84ee02e32288941b4898f843e3a28   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):2       Free (GiB):597     
[root@dhcp46-207 ~]# heketi-cli node info 83f4ae5b08a4f298419f57f844be4968
Node Id: 83f4ae5b08a4f298419f57f844be4968
State: online
Cluster Id: 83767272f27df11472bb41f0318ba7a5
Zone: 1
Management Hostname: dhcp46-199.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.199
Devices:
Id:3afaa7d53c5df10b66f65ad85f6478d7   Name:/dev/sdd            State:online    Size (GiB):599     Used (GiB):502     Free (GiB):97      
Id:a2b6eae009ca8818e0a096f4d4835f31   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):96      Free (GiB):3       

Version-Release number of selected component (if applicable):
cns-deploy-5.0.0-54.el7rhgs.x86_64
heketi-client-5.0.0-16.el7rhgs.x86_64

How reproducible:
Always. (Depending on the size of volume being created)

Steps to Reproduce:

1. Have more than 2 devices on each node
2. check out the device free size on each of the node (97gb in this case)
3. Try to create a volume with less than this value (94 gb in this case) --> volume created
4. Try to create a volume of a size 1 gb

Actual results:
volume creation fails, although there is close to 90gb of free space from other devices

Expected results:
heketi should chose the free space from other available devices and create the volume

Additional info:
1) This must be a day 1 bug
2) heketi logs shall be attached

Comment 2 krishnaram Karthick 2018-01-13 14:53:34 UTC

Created attachment 1380743 [details]
heketi_logs

Comment 6 Michael Adam 2018-05-08 06:09:03 UTC

Usually, heketi would only start to create bricks etc if it
believes the space on the devices it has chosen is sufficient.
So there are two circumstances under which thes lvcreate
operations could fail:

1) The really available space on the chosen device is pretty
   much exactly the size requested. Since heketi always allocates
   a little more (due to metadata requirements, and possibly a lot
   more if the snapshot factor is set), this little more could just
   be too much for the device. The bug here is that the original
   estimation in heketi's allocator/placer code is done with the
   input size, not with the space that would really be requested.

2) The free space recorded in the heketi db could have gone out
   of sync with gluster.

The sizes I see in the paste and log do actually rather
point to the second case, because the diff between the
requested and the available extents is quite high.

==> Please check whether the free space info matches
    the info on the gluster side.

I also note that I am surprised that heketi does not seem
to over-allocate at all in this case, not even for the metadata.


All that said, with CNS 3.9 we have introduced a retry mechanism
which should in this case let heketi try different device
constellations if the previous one fails at the executor level.

==> Could you try whether this is an issue still with 3.9?

Comment 7 John Mulligan 2018-05-15 17:47:22 UTC

(In reply to Michael Adam from comment #6)
> Usually, heketi would only start to create bricks etc if it
> believes the space on the devices it has chosen is sufficient.
> So there are two circumstances under which thes lvcreate
> operations could fail:
> 
> 1) The really available space on the chosen device is pretty
>    much exactly the size requested. Since heketi always allocates
>    a little more (due to metadata requirements, and possibly a lot
>    more if the snapshot factor is set), this little more could just
>    be too much for the device. The bug here is that the original
>    estimation in heketi's allocator/placer code is done with the
>    input size, not with the space that would really be requested.
> 

Unless I'm overlooking something the final decision if a particular brick will fit on a device includes (an estimate of) the metadata overhead. 
See https://github.com/heketi/heketi/blob/master/apps/glusterfs/device_entry.go#L368
and https://github.com/heketi/heketi/blob/master/apps/glusterfs/device_entry.go#L407

It's possible that this is underestimating the amount of LVM needs from the underlying device though.

> 2) The free space recorded in the heketi db could have gone out
>    of sync with gluster.
> 
> The sizes I see in the paste and log do actually rather
> point to the second case, because the diff between the
> requested and the available extents is quite high.
> 
> ==> Please check whether the free space info matches
>     the info on the gluster side.
> 
> I also note that I am surprised that heketi does not seem
> to over-allocate at all in this case, not even for the metadata.
> 
> 
> All that said, with CNS 3.9 we have introduced a retry mechanism
> which should in this case let heketi try different device
> constellations if the previous one fails at the executor level.
> 
> ==> Could you try whether this is an issue still with 3.9?

Agreed.