Bug 1534146
| Summary: | heketi volume create fails inspite of having sufficient free devices | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | krishnaram Karthick <kramdoss> | ||||
| Component: | heketi | Assignee: | John Mulligan <jmulligan> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | krishnaram Karthick <kramdoss> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | cns-3.6 | CC: | aprajapa, hchiramm, jcall, kramdoss, madam, pprakash, rcyriac, rhs-bugs, rtalur, storage-qa-internal | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-09-24 11:42:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
krishnaram Karthick
2018-01-13 14:44:25 UTC
Created attachment 1380743 [details]
heketi_logs
Usually, heketi would only start to create bricks etc if it
believes the space on the devices it has chosen is sufficient.
So there are two circumstances under which thes lvcreate
operations could fail:
1) The really available space on the chosen device is pretty
much exactly the size requested. Since heketi always allocates
a little more (due to metadata requirements, and possibly a lot
more if the snapshot factor is set), this little more could just
be too much for the device. The bug here is that the original
estimation in heketi's allocator/placer code is done with the
input size, not with the space that would really be requested.
2) The free space recorded in the heketi db could have gone out
of sync with gluster.
The sizes I see in the paste and log do actually rather
point to the second case, because the diff between the
requested and the available extents is quite high.
==> Please check whether the free space info matches
the info on the gluster side.
I also note that I am surprised that heketi does not seem
to over-allocate at all in this case, not even for the metadata.
All that said, with CNS 3.9 we have introduced a retry mechanism
which should in this case let heketi try different device
constellations if the previous one fails at the executor level.
==> Could you try whether this is an issue still with 3.9?
(In reply to Michael Adam from comment #6) > Usually, heketi would only start to create bricks etc if it > believes the space on the devices it has chosen is sufficient. > So there are two circumstances under which thes lvcreate > operations could fail: > > 1) The really available space on the chosen device is pretty > much exactly the size requested. Since heketi always allocates > a little more (due to metadata requirements, and possibly a lot > more if the snapshot factor is set), this little more could just > be too much for the device. The bug here is that the original > estimation in heketi's allocator/placer code is done with the > input size, not with the space that would really be requested. > Unless I'm overlooking something the final decision if a particular brick will fit on a device includes (an estimate of) the metadata overhead. See https://github.com/heketi/heketi/blob/master/apps/glusterfs/device_entry.go#L368 and https://github.com/heketi/heketi/blob/master/apps/glusterfs/device_entry.go#L407 It's possible that this is underestimating the amount of LVM needs from the underlying device though. > 2) The free space recorded in the heketi db could have gone out > of sync with gluster. > > The sizes I see in the paste and log do actually rather > point to the second case, because the diff between the > requested and the available extents is quite high. > > ==> Please check whether the free space info matches > the info on the gluster side. > > I also note that I am surprised that heketi does not seem > to over-allocate at all in this case, not even for the metadata. > > > All that said, with CNS 3.9 we have introduced a retry mechanism > which should in this case let heketi try different device > constellations if the previous one fails at the executor level. > > ==> Could you try whether this is an issue still with 3.9? Agreed. |