1752739 – fuse mount crash observed with sharding + truncate

Bug 1752739 - fuse mount crash observed with sharding + truncate

Summary: fuse mount crash observed with sharding + truncate

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	sharding
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 4
Assignee:	Vinayak Hariharmath
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1751085
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-17 06:45 UTC by bipin
Modified:	2021-04-29 07:20 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-6.0-50
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1751722
Environment:
Last Closed:	2021-04-29 07:20:36 UTC
Embargoed:
Dependent Products:
Flags:	vharihar: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:1462	0	None	None	None	2021-04-29 07:20:53 UTC

Description bipin 2019-09-17 06:45:58 UTC

+++ This bug was initially created as a clone of Bug #1751722 +++

+++ This bug was initially created as a clone of Bug #1751085 +++

Description of problem:

Gluster fuse mount crashes in shard translator while truncating the file from a really high size (Exabytes) to a really low size.

See bt:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=tendrl25.lab.eng.blr.r'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f01260a2eca in shard_common_resolve_shards (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, 
    post_res_handler=0x7f01260ad770 <shard_post_resolve_truncate_handler>) at shard.c:1030
1030	            local->inode_list[i] = inode_ref(res_inode);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libselinux-2.5-14.1.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7_6.8.x86_64
(gdb) bt
#0  0x00007f01260a2eca in shard_common_resolve_shards (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, 
    post_res_handler=0x7f01260ad770 <shard_post_resolve_truncate_handler>) at shard.c:1030
#1  0x00007f01260a3bfd in shard_refresh_internal_dir (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, type=type@entry=SHARD_INTERNAL_DIR_DOT_SHARD) at shard.c:1317
#2  0x00007f01260ad90d in shard_truncate_begin (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60) at shard.c:2596
#3  0x00007f01260b506d in shard_post_lookup_truncate_handler (frame=0x7f0104000a58, this=0x7f0118015b60) at shard.c:2659
#4  0x00007f01260a1f8b in shard_lookup_base_file_cbk (frame=0x7f0104000a58, cookie=<optimized out>, this=0x7f0118015b60, op_ret=<optimized out>, op_errno=<optimized out>, 
    inode=<optimized out>, buf=0x7f0104013bb0, xdata=0x7f010c00faf8, postparent=0x7f0104013c48) at shard.c:1702
#5  0x00007f01265e4922 in afr_discover_unwind (frame=0x7f010400bd28, this=<optimized out>) at afr-common.c:3011
#6  0x00007f01265e4eeb in afr_discover_done (frame=<optimized out>, this=<optimized out>) at afr-common.c:3106
#7  0x00007f01265f27fd in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f010400bd28, this=this@entry=0x7f0118010da0) at afr-common.c:2761
#8  0x00007f01265f3608 in afr_discover_cbk (frame=frame@entry=0x7f010400bd28, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, 
    inode=inode@entry=0x7f011805b9d8, buf=buf@entry=0x7f011f7fcb40, xdata=0x7f010c00faf8, postparent=postparent@entry=0x7f011f7fcbe0) at afr-common.c:3147
#9  0x00007f012687c412 in client4_0_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f010c010838) at client-rpc-fops_v2.c:2641
#10 0x00007f012ee6c021 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f01180510c0, pollin=pollin@entry=0x7f010c001a20) at rpc-clnt.c:755
#11 0x00007f012ee6c387 in rpc_clnt_notify (trans=0x7f0118051380, mydata=0x7f01180510f0, event=<optimized out>, data=0x7f010c001a20) at rpc-clnt.c:922
#12 0x00007f012ee689f3 in rpc_transport_notify (this=this@entry=0x7f0118051380, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f010c001a20) at rpc-transport.c:542
#13 0x00007f0129778875 in socket_event_poll_in (notify_handled=true, this=0x7f0118051380) at socket.c:2522
#14 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f0118051380, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, 
    event_thread_died=0 '\000') at socket.c:2924
#15 0x00007f012f126806 in event_dispatch_epoll_handler (event=0x7f011f7fd130, event_pool=0x55cbb4dc2300) at event-epoll.c:648
#16 event_dispatch_epoll_worker (data=0x7f0118049a60) at event-epoll.c:761
#17 0x00007f012df01dd5 in start_thread (arg=0x7f011f7fe700) at pthread_create.c:307
#18 0x00007f012d7c902d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p local->first_block
$1 = 0
(gdb) p local->last_block 
$2 = -1
(gdb) p local->prebuf.ia_size
$3 = 18446744073709547520
(gdb) p local->num_blocks
$4 = 0
(gdb) p local->block_size 
$5 = 67108864
(gdb) p (local->prebuf.ia_size - 1)/local->block_size
$6 = 274877906943
(gdb) p (int) $6
$7 = -1




Turns out the quotient resulting from division of a really high unsigned 64 int to a relatively low unsigned 64 int is assigned to a signed int32 variable. And sometimes this quotient is larger than the largest signed int 32. In this case, local->last_block gets assigned a '-1' and the resulting local->num_blocks after that becomes 0. This leads to a GF_CALLOC with size 0 of local->inode_list[] here in shard_truncate_begin():

2580	    local->inode_list = GF_CALLOC(local->num_blocks, sizeof(inode_t *),
2581	                                  gf_shard_mt_inode_list);
2582	    if (!local->inode_list)
2583	        goto err;
2584	

When the members of local->inode_list[] beyond its boundary are accessed, there is illegal memory access and the process crashes.

While this explains the cause of the crash, what it doesn't explain is why the size of the file was as big as 16Eb when the writes on the file in question wouldn't extend its size beyond 4k. That is still being investigated. I'll update the bz once I have the RC.

Version-Release number of selected component (if applicable):

master

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from bipin on 2019-09-12 06:43:23 UTC ---

Seeing a similar crash with below steps:

1. Complete the RHHI-V deployment
2. Create 30 vm's using pool 
3. After an hour or so could see the FUSE mount crash


(gdb) bt
    #0  0x00007ff30b541a1a in shard_common_resolve_shards (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, post_res_handler=0x7ff30b54b2d0 <shard_post_resolve_truncate_handler>) at shard.c:1030
    #1  0x00007ff30b5425e5 in shard_refresh_internal_dir (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, type=type@entry=SHARD_INTERNAL_DIR_DOT_SHARD) at shard.c:1321
    #2  0x00007ff30b54b46e in shard_truncate_begin (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470) at shard.c:2573
    #3  0x00007ff30b551cc8 in shard_post_lookup_truncate_handler (frame=0x7ff2d400eba8, this=0x7ff304013470) at shard.c:2637
    #4  0x00007ff30b5409f2 in shard_lookup_base_file_cbk (frame=0x7ff2d400eba8, cookie=<optimized out>, this=0x7ff304013470, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>,
        buf=0x7ff2d402f700, xdata=0x7ff2fc02fd48, postparent=0x7ff2d402f9a0) at shard.c:1705
    #5  0x00007ff30b7a8381 in dht_discover_complete (this=this@entry=0x7ff304011cc0, discover_frame=discover_frame@entry=0x7ff2d400c028) at dht-common.c:754
    #6  0x00007ff30b7a92d4 in dht_discover_cbk (frame=0x7ff2d400c028, cookie=0x7ff30400f330, this=0x7ff304011cc0, op_ret=<optimized out>, op_errno=117, inode=0x7ff2fc00c578, stbuf=0x7ff2d400aba0,
        xattr=0x7ff2fc02fd48, postparent=0x7ff2d400ac10) at dht-common.c:1133
    #7  0x00007ff30ba61315 in afr_discover_done (frame=0x7ff2d4068fe8, this=<optimized out>) at afr-common.c:3027
    #8  0x00007ff30ba6c175 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7ff2d4068fe8, this=this@entry=0x7ff30400f330) at afr-common.c:2769
    #9  0x00007ff30ba6d089 in afr_discover_cbk (frame=frame@entry=0x7ff2d4068fe8, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7ff2fc00c578,
        buf=buf@entry=0x7ff30c6e3900, xdata=0x7ff2fc0166e8, postparent=postparent@entry=0x7ff30c6e3970) at afr-common.c:3077
    #10 0x00007ff30bcacf3d in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ff2d40180a8) at client-rpc-fops.c:2872
    #11 0x00007ff313a90ac0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7ff304049ac0, pollin=pollin@entry=0x7ff2fc00b880) at rpc-clnt.c:778
    #12 0x00007ff313a90e2b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7ff304049af0, event=<optimized out>, data=0x7ff2fc00b880) at rpc-clnt.c:971
    #13 0x00007ff313a8cba3 in rpc_transport_notify (this=this@entry=0x7ff304049e80, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7ff2fc00b880) at rpc-transport.c:557
    #14 0x00007ff30eba55e6 in socket_event_poll_in (this=this@entry=0x7ff304049e80, notify_handled=<optimized out>) at socket.c:2322
    #15 0x00007ff30eba7c2a in socket_event_handler (fd=11, idx=2, gen=4, data=0x7ff304049e80, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482
    #16 0x00007ff313d498b0 in event_dispatch_epoll_handler (event=0x7ff30c6e3e70, event_pool=0x555e43424750) at event-epoll.c:643
    #17 event_dispatch_epoll_worker (data=0x555e4347fca0) at event-epoll.c:759
    #18 0x00007ff312b26ea5 in start_thread (arg=0x7ff30c6e4700) at pthread_create.c:307
    #19 0x00007ff3123ec8cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) f 0 
#0  0x00007ff30b541a1a in shard_common_resolve_shards (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, post_res_handler=0x7ff30b54b2d0 <shard_post_resolve_truncate_handler>) at shard.c:1030
1030	                        local->inode_list[i] = inode_ref (res_inode);
(gdb) p local->num_blocks
 $1 = 0

Comment 2 SATHEESARAN 2019-09-17 07:20:07 UTC

Steps that was done to hit this issue:

1. Complete the RHHI-V deployment
2. Create 30 vm's using pool 
3. After an hour or so could see the FUSE mount crash

So far, this issue is hit only with 1 QE setup and not consistently,
but hit twice so far with more than ~10 attempts

Comment 6 SATHEESARAN 2019-09-23 11:19:57 UTC

As RHHI-V bug is not fixed with oVirt's approach to detect storage block size,
this bug is no longer required urgently with RHGS 3.5.0.

@Krutika, please validate my statement.
If that's true, we can drop it for RHGS 3.5.0 but target the same for RHGS 3.5.z ?

Comment 7 SATHEESARAN 2019-09-25 13:32:31 UTC

(In reply to SATHEESARAN from comment #6)
> As RHHI-V bug is not fixed with oVirt's approach to detect storage block
> size,
> this bug is no longer required urgently with RHGS 3.5.0.
> 
> @Krutika, please validate my statement.
> If that's true, we can drop it for RHGS 3.5.0 but target the same for RHGS
> 3.5.z ?

Please read it as 'RHHI-V bug is fixed with oVirt's approach'

Comment 8 Krutika Dhananjay 2019-09-26 06:15:52 UTC

(In reply to SATHEESARAN from comment #6)
> As RHHI-V bug is not fixed with oVirt's approach to detect storage block
> size,
> this bug is no longer required urgently with RHGS 3.5.0.
> 
> @Krutika, please validate my statement.
> If that's true, we can drop it for RHGS 3.5.0 but target the same for RHGS
> 3.5.z ?

This sounds reasonable to me.

Comment 9 SATHEESARAN 2019-09-27 07:15:46 UTC

Based on comment8, removing the blocker request to include this bug for RHGS 3.5.0.
Also the original issue is now fixed with RHV 4.3.6 and no longer leads to crash.

But the issue is genuine that there exists the problem with sharding + truncate call.
And RCA is also done by Krutika. So requesting to include the fix in RHGS 3.5.z to prevent
any such future usage of shard + truncate doesn't lead to any such problems.

Comment 24 SATHEESARAN 2021-01-04 13:03:28 UTC

When this issue was found, it was initally complained that the fuse mount crash
was seen, later through investigation it was found to be 2 seperate problems

1. Crash in FOPs because of integer overflow
2. Sharding doesn't support truncate

Currently the fix addresses only 1. Excerpt from the commit message:
"This patch fixes a crash in FOPs that operate on really large sharded
files where number of participant shards could sometimes exceed
signed int32 max."

But the other issue of sharding to support truncate is not yet available,
which will be tracked with a separate bug.

Based on this, updating the bug summary.

Comment 25 SATHEESARAN 2021-01-04 13:09:00 UTC

Tested with RHGS 3.5.4 interim build ( glusterfs-6.0-51.el8rhgs )

1. Created a replica 3 volume and enabled sharding
2. Created 2 fuse mounts
3. Created a file of size 512B from one of the fuse mount
4. From one fuse mount truncate that file to 0
5. From another fuse mount truncate the same file to 0 again.

I/O errors are noted as sharding doesn't support truncate from day1.
But there are no crashes observed.

Also tested this scenario in RHHI-V setup, with 10 VMs running workloads
There are no issues seen.

Comment 26 SATHEESARAN 2021-04-27 07:19:36 UTC

Mistakenly passed on the information to Vinayak, that this bug is fixed in RHGS 3.5.3 which 
led him to close this bug.

Moving this bug back to ON_QA

Comment 27 SATHEESARAN 2021-04-27 07:20:13 UTC

As per comment25, this bug is verified for RHGS 3.5.4

Comment 29 errata-xmlrpc 2021-04-29 07:20:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1462

Note You need to log in before you can comment on or make changes to this bug.