Description of problem: Observing the below Assertion failed messages in rebalance logs. [2017-02-27 02:41:30.131290] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7f303b897750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7f303b896fd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7f303b8de6dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-02-27 02:43:14.836106] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7f303b897750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7f303b896fd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7f303b8de6dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-02-27 02:44:27.161614] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7f303b897750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7f303b896fd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7f303b8de6dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-02-27 02:44:33.495690] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7f303b897750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7f303b896fd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7f303b8de6dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-02-27 02:45:09.172526] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7f303b897750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7f303b896fd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7f303b8de6dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) On a different node, different volume: [2017-03-01 06:40:32.458913] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7fe904ceb750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7fe904ceafd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7fe904d326dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-03-01 06:40:51.784109] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7fe904ceb750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7fe904ceafd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7fe904d326dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-03-01 06:40:52.208967] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7fe904ceb750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7fe904ceafd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7fe904d326dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-03-01 06:41:01.669186] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7fe904ceb750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7fe904ceafd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7fe904d326dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2017-03-01 06:41:15.839481] E [mem-pool.c:314:__gf_free] (-->/lib64/libglusterfs.so.0(dict_destroy+0x40) [0x7fe904ceb750] -->/lib64/libglusterfs.so.0(data_destroy+0x55) [0x7fe904ceafd5] -->/lib64/libglusterfs.so.0(__gf_free+0xfc) [0x7fe904d326dc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) Version-Release number of selected component (if applicable): 3.7.9-12.el7rhgs.x86_64 How reproducible: Happens regularly on customer trusted storage pool Additional info: 23 Node Storage Pool 58 Volumes All volumes showing the assertion errors have a distribute component, distribute, distributed-replicate, etc.
Could not reproduce the issue in my test machine. Created 8*2 volume and created data (dirs and files). Did multiple remove-bricks, but found no errors. Will give it few more try.
@ Susant, Do you need any additional information from Cal or customer, which you think might be useful for reproducing ? @ Cal, Can you even try to to reproduce issue with miniature version of customer environment.
(In reply to Bipin Kunal from comment #7) > @ Susant, Do you need any additional information from Cal or customer, which > you think might be useful for reproducing ? > > @ Cal, Can you even try to to reproduce issue with miniature version of > customer environment. The problem in hand points to a memory overrun. I went through the rebalance code and could not find any evidence of such problem and was it caused by some other translator e.g AFR can not be confirmed from the logs as it does not point to the translator which caused it. A reproducer will be highly helpful here. Still few more information will be helpful here. 1- Xattr information on the directories and files. 2- What kind of operations were running in parallel? -Susant
@ Bipin: I'll try to set up a simplified reproducer tomorrow. @ Susant: I'll ask the customer to supply the additional information. -Cal
@ Susant: Can you supply a command that will return the Xattr information you need? I'm not sure exactly what you're looking for. I've asked the customer about what else might have been running in parallel.
Does the customer have hardlinks to his files?
Verified this BZ on glusterfs version 3.8.4-33.el7rhgs.x86_64. Followed the same steps as in Comment 107, the script didn't throw any errors and all the files on the bricks migrated successfully as expected without any issues. Moving this BZ to Verified.
*** Bug 1467495 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774