Description of problem: While running one of our automation tests, it was found that heketi fails to create volume in spite of having free devices in each of the node. There seems to be an issue with the way heketi chooses the device from which a volume is created. Please go through the sequence of events below. [root@dhcp46-207 ~]# heketi-cli volume list Id:0560fa1f5ebc2c5beefb74c1cae695c3 Cluster:83767272f27df11472bb41f0318ba7a5 Name:vol_0560fa1f5ebc2c5beefb74c1cae695c3 [block] Id:25521d472ab31a0770eeb897c0429424 Cluster:83767272f27df11472bb41f0318ba7a5 Name:heketidbstorage [root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=96 --json Error: Unable to execute command on glusterfs-631sd: Volume group "vg_678e090ff1169da372984f54deecd810" has insufficient free space (24287 extents): 24576 required. [root@dhcp46-207 ~]# heketi-cli node list Id:83f4ae5b08a4f298419f57f844be4968 Cluster:83767272f27df11472bb41f0318ba7a5 Id:c3e3bb2db015271036585dc8a8cc70b0 Cluster:83767272f27df11472bb41f0318ba7a5 Id:df0f1ec19a211f5c33f3e2e457c31027 Cluster:83767272f27df11472bb41f0318ba7a5 Id:f999739b27b9506be50cb1a63ddfcad3 Cluster:83767272f27df11472bb41f0318ba7a5 [root@dhcp46-207 ~]# heketi-cli node info 83f4ae5b08a4f298419f57f844be4968 Node Id: 83f4ae5b08a4f298419f57f844be4968 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 1 Management Hostname: dhcp46-199.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.199 Devices: Id:3afaa7d53c5df10b66f65ad85f6478d7 Name:/dev/sdd State:online Size (GiB):599 Used (GiB):502 Free (GiB):97 Id:a2b6eae009ca8818e0a096f4d4835f31 Name:/dev/sde State:online Size (GiB):99 Used (GiB):2 Free (GiB):97 [root@dhcp46-207 ~]# heketi-cli node info c3e3bb2db015271036585dc8a8cc70b0 Node Id: c3e3bb2db015271036585dc8a8cc70b0 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 1 Management Hostname: dhcp46-193.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.193 Devices: Id:a55564f7750b405cfa1c9cf5a19355d0 Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Id:acb84ee02e32288941b4898f843e3a28 Name:/dev/sdd State:online Size (GiB):599 Used (GiB):2 Free (GiB):597 [root@dhcp46-207 ~]# heketi-cli node info df0f1ec19a211f5c33f3e2e457c31027 Node Id: df0f1ec19a211f5c33f3e2e457c31027 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 2 Management Hostname: dhcp46-197.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.197 Devices: Id:502e4028f5213bffecb51438b22f0702 Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Id:678e090ff1169da372984f54deecd810 Name:/dev/sdd State:online Size (GiB):599 Used (GiB):502 Free (GiB):97 [root@dhcp46-207 ~]# heketi-cli node info f999739b27b9506be50cb1a63ddfcad3 Node Id: f999739b27b9506be50cb1a63ddfcad3 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 2 Management Hostname: dhcp46-201.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.201 Devices: Id:a8e913f6e09833e8acaaab0fbb98cbba Name:/dev/sdd State:online Size (GiB):599 Used (GiB):502 Free (GiB):97 Id:bcef7506a6724c8498cc46c42c239d97 Name:/dev/sde State:online Size (GiB):99 Used (GiB):2 Free (GiB):97 [root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=95 --json Error: Unable to execute command on glusterfs-ttjvf: Volume group "vg_3afaa7d53c5df10b66f65ad85f6478d7" has insufficient free space (24287 extents): 24320 required. [root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=94 --json {"size":94,"name":"vol_b86607e1e242370e52876e3e31b2825a","durability":{"type":"replicate","replicate":{"replica":3},"disperse":{"data":4,"redundancy":2}},"snapshot":{"enable":false,"factor":1},"id":"b86607e1e242370e52876e3e31b2825a","cluster":"83767272f27df11472bb41f0318ba7a5","mount":{"glusterfs":{"hosts":["10.70.46.193","10.70.46.197","10.70.46.199","10.70.46.201"],"device":"10.70.46.193:vol_b86607e1e242370e52876e3e31b2825a","options":{"backup-volfile-servers":"10.70.46.197,10.70.46.199,10.70.46.201"}}},"blockinfo":{},"bricks":[{"id":"92a414343f4c019b9f938316146c29c0","path":"/var/lib/heketi/mounts/vg_a2b6eae009ca8818e0a096f4d4835f31/brick_92a414343f4c019b9f938316146c29c0/brick","device":"a2b6eae009ca8818e0a096f4d4835f31","node":"83f4ae5b08a4f298419f57f844be4968","volume":"b86607e1e242370e52876e3e31b2825a","size":98566144},{"id":"a64b35d069fe5795c29854693aaf10ef","path":"/var/lib/heketi/mounts/vg_678e090ff1169da372984f54deecd810/brick_a64b35d069fe5795c29854693aaf10ef/brick","device":"678e090ff1169da372984f54deecd810","node":"df0f1ec19a211f5c33f3e2e457c31027","volume":"b86607e1e242370e52876e3e31b2825a","size":98566144},{"id":"e15fcdc988af50ff6e966bcfb4e2cc7f","path":"/var/lib/heketi/mounts/vg_a55564f7750b405cfa1c9cf5a19355d0/brick_e15fcdc988af50ff6e966bcfb4e2cc7f/brick","device":"a55564f7750b405cfa1c9cf5a19355d0","node":"c3e3bb2db015271036585dc8a8cc70b0","volume":"b86607e1e242370e52876e3e31b2825a","size":98566144}]}[root@dhcp46-207 ~]# [root@dhcp46-207 ~]# heketi-cli -s http://172.30.32.27:8080 volume create --size=2 --json Error: Unable to execute command on glusterfs-631sd: Volume group "vg_678e090ff1169da372984f54deecd810" has insufficient free space (102 extents): 512 required. [root@dhcp46-207 ~]# [root@dhcp46-207 ~]# [root@dhcp46-207 ~]# heketi-cli node info f999739b27b9506be50cb1a63ddfcad3 Node Id: f999739b27b9506be50cb1a63ddfcad3 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 2 Management Hostname: dhcp46-201.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.201 Devices: Id:a8e913f6e09833e8acaaab0fbb98cbba Name:/dev/sdd State:online Size (GiB):599 Used (GiB):502 Free (GiB):97 Id:bcef7506a6724c8498cc46c42c239d97 Name:/dev/sde State:online Size (GiB):99 Used (GiB):2 Free (GiB):97 [root@dhcp46-207 ~]# heketi-cli node info df0f1ec19a211f5c33f3e2e457c31027 Node Id: df0f1ec19a211f5c33f3e2e457c31027 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 2 Management Hostname: dhcp46-197.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.197 Devices: Id:502e4028f5213bffecb51438b22f0702 Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Id:678e090ff1169da372984f54deecd810 Name:/dev/sdd State:online Size (GiB):599 Used (GiB):596 Free (GiB):2 [root@dhcp46-207 ~]# heketi-cli node info c3e3bb2db015271036585dc8a8cc70b0 Node Id: c3e3bb2db015271036585dc8a8cc70b0 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 1 Management Hostname: dhcp46-193.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.193 Devices: Id:a55564f7750b405cfa1c9cf5a19355d0 Name:/dev/sde State:online Size (GiB):99 Used (GiB):94 Free (GiB):5 Id:acb84ee02e32288941b4898f843e3a28 Name:/dev/sdd State:online Size (GiB):599 Used (GiB):2 Free (GiB):597 [root@dhcp46-207 ~]# heketi-cli node info 83f4ae5b08a4f298419f57f844be4968 Node Id: 83f4ae5b08a4f298419f57f844be4968 State: online Cluster Id: 83767272f27df11472bb41f0318ba7a5 Zone: 1 Management Hostname: dhcp46-199.lab.eng.blr.redhat.com Storage Hostname: 10.70.46.199 Devices: Id:3afaa7d53c5df10b66f65ad85f6478d7 Name:/dev/sdd State:online Size (GiB):599 Used (GiB):502 Free (GiB):97 Id:a2b6eae009ca8818e0a096f4d4835f31 Name:/dev/sde State:online Size (GiB):99 Used (GiB):96 Free (GiB):3 Version-Release number of selected component (if applicable): cns-deploy-5.0.0-54.el7rhgs.x86_64 heketi-client-5.0.0-16.el7rhgs.x86_64 How reproducible: Always. (Depending on the size of volume being created) Steps to Reproduce: 1. Have more than 2 devices on each node 2. check out the device free size on each of the node (97gb in this case) 3. Try to create a volume with less than this value (94 gb in this case) --> volume created 4. Try to create a volume of a size 1 gb Actual results: volume creation fails, although there is close to 90gb of free space from other devices Expected results: heketi should chose the free space from other available devices and create the volume Additional info: 1) This must be a day 1 bug 2) heketi logs shall be attached
Created attachment 1380743 [details] heketi_logs
Usually, heketi would only start to create bricks etc if it believes the space on the devices it has chosen is sufficient. So there are two circumstances under which thes lvcreate operations could fail: 1) The really available space on the chosen device is pretty much exactly the size requested. Since heketi always allocates a little more (due to metadata requirements, and possibly a lot more if the snapshot factor is set), this little more could just be too much for the device. The bug here is that the original estimation in heketi's allocator/placer code is done with the input size, not with the space that would really be requested. 2) The free space recorded in the heketi db could have gone out of sync with gluster. The sizes I see in the paste and log do actually rather point to the second case, because the diff between the requested and the available extents is quite high. ==> Please check whether the free space info matches the info on the gluster side. I also note that I am surprised that heketi does not seem to over-allocate at all in this case, not even for the metadata. All that said, with CNS 3.9 we have introduced a retry mechanism which should in this case let heketi try different device constellations if the previous one fails at the executor level. ==> Could you try whether this is an issue still with 3.9?
(In reply to Michael Adam from comment #6) > Usually, heketi would only start to create bricks etc if it > believes the space on the devices it has chosen is sufficient. > So there are two circumstances under which thes lvcreate > operations could fail: > > 1) The really available space on the chosen device is pretty > much exactly the size requested. Since heketi always allocates > a little more (due to metadata requirements, and possibly a lot > more if the snapshot factor is set), this little more could just > be too much for the device. The bug here is that the original > estimation in heketi's allocator/placer code is done with the > input size, not with the space that would really be requested. > Unless I'm overlooking something the final decision if a particular brick will fit on a device includes (an estimate of) the metadata overhead. See https://github.com/heketi/heketi/blob/master/apps/glusterfs/device_entry.go#L368 and https://github.com/heketi/heketi/blob/master/apps/glusterfs/device_entry.go#L407 It's possible that this is underestimating the amount of LVM needs from the underlying device though. > 2) The free space recorded in the heketi db could have gone out > of sync with gluster. > > The sizes I see in the paste and log do actually rather > point to the second case, because the diff between the > requested and the available extents is quite high. > > ==> Please check whether the free space info matches > the info on the gluster side. > > I also note that I am surprised that heketi does not seem > to over-allocate at all in this case, not even for the metadata. > > > All that said, with CNS 3.9 we have introduced a retry mechanism > which should in this case let heketi try different device > constellations if the previous one fails at the executor level. > > ==> Could you try whether this is an issue still with 3.9? Agreed.