1757399 – Rebalance is causing glusterfs crash on client node

Bug 1757399 - Rebalance is causing glusterfs crash on client node

Summary: Rebalance is causing glusterfs crash on client node

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Nithya Balachandran
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1756325 1759141 1760779
Blocks:	1769315 1786983 1804522
TreeView+	depends on / blocked

Reported:	2019-10-01 12:10 UTC by Nithya Balachandran
Modified:	2020-03-03 07:46 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:	1756325
Clones:	1769315 1786983 1804522 (view as bug list)
Environment:
Last Closed:	2020-03-03 07:46:57 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23506	0	None	Merged	cluster/dht: Correct fd processing loop	2019-10-02 13:43:36 UTC

Comment 1 Nithya Balachandran 2019-10-01 12:13:07 UTC

An unsafe loop while processing fds in the dht rebalance check tasks caused the client process to crash as it was operating on an fd that had already been freed.


It looks like the fd has already been freed.


fd->inode is set to NULL in fd_destroy. fds are allocated from the mempools using mem_get. Checking the pool header info:



(gdb) f 1
#1  0x00007f3923004af7 in fd_unref (fd=0x7f3910ccec28) at fd.c:515
515	    LOCK(&fd->inode->lock);
(gdb) p *fd
$1 = {pid = 13340, flags = 33345, refcount = {lk = 0x7f3910ccec38 "\t", value = 9}, inode_list = {next = 0x7f3910ccec40, prev = 0x7f3910ccec40}, inode = 0x0, lock = {
    spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, _ctx = 0x7f3910c73b70, xl_count = 39, lk_ctx = 0x7f39100ede90, 
  anonymous = false}
(gdb) p ((pooled_obj_hdr_t *)fd)-1
$2 = (pooled_obj_hdr_t *) 0x7f3910ccec00
(gdb) p sizeof(pooled_obj_hdr_t)
$3 = 40
(gdb) p/x sizeof(pooled_obj_hdr_t)
$4 = 0x28
(gdb) p *$2
$5 = {magic = 3735929054, next = 0x7f3910c3cec0, pool_list = 0x7f3910000960, power_of_two = 8, pool = 0x7f39100605c0}
(gdb) p/x *$2
$6 = {magic = 0xdeadc0de, next = 0x7f3910c3cec0, pool_list = 0x7f3910000960, power_of_two = 0x8, pool = 0x7f39100605c0}
(gdb) 


$6->magic = 0xdeadc0de 

#define GF_MEM_INVALID_MAGIC 0xDEADC0DE

In mem_put:

    hdr->magic = GF_MEM_INVALID_MAGIC;


As fd_destroy calls mem_put, this indicates that the memory has already been freed.

To double check, check the memory header for fd->_ctx which is allocated using GF_CALLOC:

(gdb) p fd->_ctx
$13 = (struct _fd_ctx *) 0x7f3910c73b70
(gdb) p *(((struct mem_header *)0x7f3910c73b70) -1)
$14 = {type = 269061216, size = 139883059273280, mem_acct = 0x0, magic = 0, padding = {0, 0, 0, 0, 0, 0, 0, 0}}
(gdb) p/x *(((struct mem_header *)0x7f3910c73b70) -1)
$15 = {type = 0x10098c60, size = 0x7f39100ede40, mem_acct = 0x0, magic = 0x0, padding = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
(gdb) 



The header struct members are invalid.

Comment 2 Worker Ant 2019-10-01 12:18:39 UTC

REVIEW: https://review.gluster.org/23506 (cluster/dht: Correct fd processing loop) posted (#1) for review on master by N Balachandran

Comment 3 Worker Ant 2019-10-02 13:43:37 UTC

REVIEW: https://review.gluster.org/23506 (cluster/dht: Correct fd processing loop) merged (#6) on master by Xavi Hernandez

Comment 4 Worker Ant 2020-02-19 17:06:18 UTC

REVIEW: https://review.gluster.org/24132 (cluster/dht: Correct fd processing loop) posted (#2) for review on release-5 by Barak Sason Rofman

Comment 5 Worker Ant 2020-02-20 09:17:55 UTC

REVISION POSTED: https://review.gluster.org/24132 (cluster/dht: Correct fd processing loop) posted (#3) for review on release-5 by MOHIT AGRAWAL

Comment 6 Susant Kumar Palai 2020-03-03 07:46:57 UTC

Patch: https://review.gluster.org/#/c/glusterfs/+/24132/

Note You need to log in before you can comment on or make changes to this bug.