This bug was initially created as a copy of Bug #1751722 I am copying this bug because the changes introduced in 4.3.6 for 4K block size detection causes a fuse mount crash in gluster. Description of problem: Gluster fuse mount crashes in shard translator while truncating the file from a really high size (Exabytes) to a really low size. See bt: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --process-name fuse --volfile-server=tendrl25.lab.eng.blr.r'. Program terminated with signal 11, Segmentation fault. #0 0x00007f01260a2eca in shard_common_resolve_shards (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, post_res_handler=0x7f01260ad770 <shard_post_resolve_truncate_handler>) at shard.c:1030 1030 local->inode_list[i] = inode_ref(res_inode); Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libselinux-2.5-14.1.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7_6.8.x86_64 (gdb) bt #0 0x00007f01260a2eca in shard_common_resolve_shards (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, post_res_handler=0x7f01260ad770 <shard_post_resolve_truncate_handler>) at shard.c:1030 #1 0x00007f01260a3bfd in shard_refresh_internal_dir (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60, type=type@entry=SHARD_INTERNAL_DIR_DOT_SHARD) at shard.c:1317 #2 0x00007f01260ad90d in shard_truncate_begin (frame=frame@entry=0x7f0104000a58, this=this@entry=0x7f0118015b60) at shard.c:2596 #3 0x00007f01260b506d in shard_post_lookup_truncate_handler (frame=0x7f0104000a58, this=0x7f0118015b60) at shard.c:2659 #4 0x00007f01260a1f8b in shard_lookup_base_file_cbk (frame=0x7f0104000a58, cookie=<optimized out>, this=0x7f0118015b60, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7f0104013bb0, xdata=0x7f010c00faf8, postparent=0x7f0104013c48) at shard.c:1702 #5 0x00007f01265e4922 in afr_discover_unwind (frame=0x7f010400bd28, this=<optimized out>) at afr-common.c:3011 #6 0x00007f01265e4eeb in afr_discover_done (frame=<optimized out>, this=<optimized out>) at afr-common.c:3106 #7 0x00007f01265f27fd in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f010400bd28, this=this@entry=0x7f0118010da0) at afr-common.c:2761 #8 0x00007f01265f3608 in afr_discover_cbk (frame=frame@entry=0x7f010400bd28, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f011805b9d8, buf=buf@entry=0x7f011f7fcb40, xdata=0x7f010c00faf8, postparent=postparent@entry=0x7f011f7fcbe0) at afr-common.c:3147 #9 0x00007f012687c412 in client4_0_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f010c010838) at client-rpc-fops_v2.c:2641 #10 0x00007f012ee6c021 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f01180510c0, pollin=pollin@entry=0x7f010c001a20) at rpc-clnt.c:755 #11 0x00007f012ee6c387 in rpc_clnt_notify (trans=0x7f0118051380, mydata=0x7f01180510f0, event=<optimized out>, data=0x7f010c001a20) at rpc-clnt.c:922 #12 0x00007f012ee689f3 in rpc_transport_notify (this=this@entry=0x7f0118051380, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f010c001a20) at rpc-transport.c:542 #13 0x00007f0129778875 in socket_event_poll_in (notify_handled=true, this=0x7f0118051380) at socket.c:2522 #14 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f0118051380, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2924 #15 0x00007f012f126806 in event_dispatch_epoll_handler (event=0x7f011f7fd130, event_pool=0x55cbb4dc2300) at event-epoll.c:648 #16 event_dispatch_epoll_worker (data=0x7f0118049a60) at event-epoll.c:761 #17 0x00007f012df01dd5 in start_thread (arg=0x7f011f7fe700) at pthread_create.c:307 #18 0x00007f012d7c902d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) p local->first_block $1 = 0 (gdb) p local->last_block $2 = -1 (gdb) p local->prebuf.ia_size $3 = 18446744073709547520 (gdb) p local->num_blocks $4 = 0 (gdb) p local->block_size $5 = 67108864 (gdb) p (local->prebuf.ia_size - 1)/local->block_size $6 = 274877906943 (gdb) p (int) $6 $7 = -1 Turns out the quotient resulting from division of a really high unsigned 64 int to a relatively low unsigned 64 int is assigned to a signed int32 variable. And sometimes this quotient is larger than the largest signed int 32. In this case, local->last_block gets assigned a '-1' and the resulting local->num_blocks after that becomes 0. This leads to a GF_CALLOC with size 0 of local->inode_list[] here in shard_truncate_begin(): 2580 local->inode_list = GF_CALLOC(local->num_blocks, sizeof(inode_t *), 2581 gf_shard_mt_inode_list); 2582 if (!local->inode_list) 2583 goto err; 2584 When the members of local->inode_list[] beyond its boundary are accessed, there is illegal memory access and the process crashes. While this explains the cause of the crash, what it doesn't explain is why the size of the file was as big as 16Eb when the writes on the file in question wouldn't extend its size beyond 4k. That is still being investigated. I'll update the bz once I have the RC. Version-Release number of selected component (if applicable): master How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from bipin on 2019-09-12 06:43:23 UTC --- Seeing a similar crash with below steps: 1. Complete the RHHI-V deployment 2. Create 30 vm's using pool 3. After an hour or so could see the FUSE mount crash (gdb) bt #0 0x00007ff30b541a1a in shard_common_resolve_shards (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, post_res_handler=0x7ff30b54b2d0 <shard_post_resolve_truncate_handler>) at shard.c:1030 #1 0x00007ff30b5425e5 in shard_refresh_internal_dir (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, type=type@entry=SHARD_INTERNAL_DIR_DOT_SHARD) at shard.c:1321 #2 0x00007ff30b54b46e in shard_truncate_begin (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470) at shard.c:2573 #3 0x00007ff30b551cc8 in shard_post_lookup_truncate_handler (frame=0x7ff2d400eba8, this=0x7ff304013470) at shard.c:2637 #4 0x00007ff30b5409f2 in shard_lookup_base_file_cbk (frame=0x7ff2d400eba8, cookie=<optimized out>, this=0x7ff304013470, op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>, buf=0x7ff2d402f700, xdata=0x7ff2fc02fd48, postparent=0x7ff2d402f9a0) at shard.c:1705 #5 0x00007ff30b7a8381 in dht_discover_complete (this=this@entry=0x7ff304011cc0, discover_frame=discover_frame@entry=0x7ff2d400c028) at dht-common.c:754 #6 0x00007ff30b7a92d4 in dht_discover_cbk (frame=0x7ff2d400c028, cookie=0x7ff30400f330, this=0x7ff304011cc0, op_ret=<optimized out>, op_errno=117, inode=0x7ff2fc00c578, stbuf=0x7ff2d400aba0, xattr=0x7ff2fc02fd48, postparent=0x7ff2d400ac10) at dht-common.c:1133 #7 0x00007ff30ba61315 in afr_discover_done (frame=0x7ff2d4068fe8, this=<optimized out>) at afr-common.c:3027 #8 0x00007ff30ba6c175 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7ff2d4068fe8, this=this@entry=0x7ff30400f330) at afr-common.c:2769 #9 0x00007ff30ba6d089 in afr_discover_cbk (frame=frame@entry=0x7ff2d4068fe8, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7ff2fc00c578, buf=buf@entry=0x7ff30c6e3900, xdata=0x7ff2fc0166e8, postparent=postparent@entry=0x7ff30c6e3970) at afr-common.c:3077 #10 0x00007ff30bcacf3d in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7ff2d40180a8) at client-rpc-fops.c:2872 #11 0x00007ff313a90ac0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7ff304049ac0, pollin=pollin@entry=0x7ff2fc00b880) at rpc-clnt.c:778 #12 0x00007ff313a90e2b in rpc_clnt_notify (trans=<optimized out>, mydata=0x7ff304049af0, event=<optimized out>, data=0x7ff2fc00b880) at rpc-clnt.c:971 #13 0x00007ff313a8cba3 in rpc_transport_notify (this=this@entry=0x7ff304049e80, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7ff2fc00b880) at rpc-transport.c:557 #14 0x00007ff30eba55e6 in socket_event_poll_in (this=this@entry=0x7ff304049e80, notify_handled=<optimized out>) at socket.c:2322 #15 0x00007ff30eba7c2a in socket_event_handler (fd=11, idx=2, gen=4, data=0x7ff304049e80, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2482 #16 0x00007ff313d498b0 in event_dispatch_epoll_handler (event=0x7ff30c6e3e70, event_pool=0x555e43424750) at event-epoll.c:643 #17 event_dispatch_epoll_worker (data=0x555e4347fca0) at event-epoll.c:759 #18 0x00007ff312b26ea5 in start_thread (arg=0x7ff30c6e4700) at pthread_create.c:307 #19 0x00007ff3123ec8cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) f 0 #0 0x00007ff30b541a1a in shard_common_resolve_shards (frame=frame@entry=0x7ff2d400eba8, this=this@entry=0x7ff304013470, post_res_handler=0x7ff30b54b2d0 <shard_post_resolve_truncate_handler>) at shard.c:1030 1030 local->inode_list[i] = inode_ref (res_inode); (gdb) p local->num_blocks $1 = 0 --- Additional comment from Worker Ant on 2019-09-12 08:46:16 UTC --- REVIEW: https://review.gluster.org/23407 (features/shard: Convert shard block indices to uint64) posted (#1) for review on master by Krutika Dhananjay
According to https://bugzilla.redhat.com/show_bug.cgi?id=1751722#c13 the issue is caused by multiple hosts trying to detect block size using the same file (__DIRECT_IO_TEST__). This is expected to work on posix filesystem but it breaks gluster metadata management. Block size detection is implemented now in vdsm, so ioprocess does not need any change. However implementing block size detection in a way which is compatible with existing Gluster is best done in iprocess, so I'm using this bug for the fix in ioprocess.
Created attachment 1617335 [details] stress test for verifying with gluster storage To verify: - Install python2-ioprocess on all hosts - Mount same gluster volume on all hosts - Run attached test script on all hosts for couple of minutes $ python probe_block_size_stress.py /mount/path 512 512 ... Expected results: - Script output the same block size on all hosts - No errors in the scripts output - No leftover hidden files in the gluster mountpoint (.e.g ".probe-xxx-yyy") - Gluster fuse mount does not crash
Created attachment 1617906 [details] stress test for verifying with gluster storage - improved This version flushes output so it is easy to run with nohup.
Created attachment 1617907 [details] yum repo for testing ioprocess 1.3.0-1 build
I'm testing now on Avihai setup - 3 hosts connected to gluster storage. On each host: 1. Copy ioproess.repo (attachment 1617907 [details]) to /etc/yum.repos.d/ 2. yum upgrade ioprocess python2-ioprocess 3. Copy probe_block_size_stress.py (attachment 1617906 [details]) to /root/ 4. Run nohup python probe_block_size_stress.py \ /rhev/data-center/mnt/glusterSD/gluster01.exmpple.com:_storage__local__ge8__volume__0/ & Started around 20:05, probing in tight loop - about 2 seconds per probe, 1800 calls per hour. (this systems are extremly slow). All hosts report correct block size. In strace we see: $ strace -f -tt -T -p 26494 [pid 26500] 20:21:13.893810 open("/rhev/data-center/mnt/glusterSD/gluster01.example.com:_storage__local__ge8__volume__0//.prob-f260033a-694c-4bdf-bcef-4b46f82306b9", O_WRONLY|O_CREAT|O_EXCL|O_DSYNC|O_DIRECT, 0600 <unfinished ...> [pid 26500] 20:21:13.907069 <... open resumed> ) = 0 <0.013147> [pid 26500] 20:21:13.907195 unlink("/rhev/data-center/mnt/glusterSD/gluster01.example.com:_storage__local__ge8__volume__0//.prob-f260033a-694c-4bdf-bcef-4b46f82306b9") = 0 <0.005863> [pid 26500] 20:21:13.913227 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.930909> [pid 26500] 20:21:15.844328 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.011338> Krutika, does it make sense that writing 1 byte takes 1.93 seconds? Will run for couple of hours.
Created attachment 1617908 [details] ioprocess trace showing extremely slow unaligned writes Krutika, should we file a bug about the slow unaligned writes? # grep pwrite64 ioprocess.trace | head -20 26500 20:27:40.037730 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.009804> 26500 20:27:40.133382 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.911269> 26500 20:27:42.044969 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.012527> 26500 20:27:42.105923 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.940483> 26500 20:27:44.046644 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.028278> 26500 20:27:44.162981 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.883644> 26500 20:27:46.046896 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.012689> 26500 20:27:46.135306 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.912911> 26500 20:27:48.048518 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.011936> 26500 20:27:48.105805 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.952041> 26500 20:27:50.058156 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.011621> 26500 20:27:50.133522 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.922091> 26500 20:27:52.055897 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.009861> 26500 20:27:52.100258 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.951614> 26500 20:27:54.052096 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.015949> 26500 20:27:54.125715 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.928027> 26500 20:27:56.054076 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.016307> 26500 20:27:56.142113 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.914674> 26500 20:27:58.057043 pwrite64(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 0) = 512 <0.011393> 26500 20:27:58.099114 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) <1.955300>
Ending the test after ~8 hours, see comment 8 for details. vdsm1: root 14617 0.3 0.0 225540 6596 ? Sl Sep22 2:36 python probe_block_size_stress.py /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage_ root 14618 0.0 0.0 384948 1044 ? Sl Sep22 0:44 \_ /usr/libexec/ioprocess --read-pipe-fd 7 --write-pipe-fd 6 --max-threads 0 --max-queued-requests -1 [root@storage-ge8-vdsm1 ~]# cat nohup.out | sort -u 512 [root@storage-ge8-vdsm1 ~]# wc -l nohup.out 22400 nohup.out vdsm2: root 19989 0.3 0.0 225540 6596 ? Sl Sep22 2:34 python probe_block_size_stress.py /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage_ root 19990 0.0 0.0 384948 2832 ? Sl Sep22 0:37 \_ /usr/libexec/ioprocess --read-pipe-fd 7 --write-pipe-fd 6 --max-threads 0 --max-queued-requests -1 [root@storage-ge8-vdsm2 ~]# cat nohup.out | sort -u 512 [root@storage-ge8-vdsm2 ~]# wc -l nohup.out 22396 nohup.out vdsm3: root 26493 0.3 0.0 225540 8636 ? Sl Sep22 2:35 python probe_block_size_stress.py /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage_ root 26494 0.0 0.0 384948 2832 ? Sl Sep22 0:38 \_ /usr/libexec/ioprocess --read-pipe-fd 7 --write-pipe-fd 6 --max-threads 0 --max-queued-requests -1 [root@storage-ge8-vdsm3 ~]# cat nohup.out | sort -u 512 [root@storage-ge8-vdsm3 ~]# wc -l nohup.out 22390 nohup.out Checking storage for leftover .probe files: [root@storage-ge8-vdsm3 ~]# ls -lh /rhev/data-center/mnt/glusterSD/gluster01.example.com\:_storage__local__ge8__volume__0/.probe* ls: cannot access /rhev/data-center/mnt/glusterSD/gluster01.example.com:_storage__local__ge8__volume__0/.probe*: No such file or directory All hosts issues ~22400 probes during the tests. Since normal probing interval is 5 minutes, this test is equivalent to 1866 hours (77 days).
(In reply to Nir Soffer from comment #10) > Ending the test after ~8 hours, see comment 8 for details. > > vdsm1: > > root 14617 0.3 0.0 225540 6596 ? Sl Sep22 2:36 python > probe_block_size_stress.py > /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage_ > root 14618 0.0 0.0 384948 1044 ? Sl Sep22 0:44 \_ > /usr/libexec/ioprocess --read-pipe-fd 7 --write-pipe-fd 6 --max-threads 0 > --max-queued-requests -1 > > [root@storage-ge8-vdsm1 ~]# cat nohup.out | sort -u > 512 > > [root@storage-ge8-vdsm1 ~]# wc -l nohup.out > 22400 nohup.out > > vdsm2: > > root 19989 0.3 0.0 225540 6596 ? Sl Sep22 2:34 python > probe_block_size_stress.py > /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage_ > root 19990 0.0 0.0 384948 2832 ? Sl Sep22 0:37 \_ > /usr/libexec/ioprocess --read-pipe-fd 7 --write-pipe-fd 6 --max-threads 0 > --max-queued-requests -1 > > [root@storage-ge8-vdsm2 ~]# cat nohup.out | sort -u > 512 > > [root@storage-ge8-vdsm2 ~]# wc -l nohup.out > 22396 nohup.out > > vdsm3: > > root 26493 0.3 0.0 225540 8636 ? Sl Sep22 2:35 python > probe_block_size_stress.py > /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage_ > root 26494 0.0 0.0 384948 2832 ? Sl Sep22 0:38 \_ > /usr/libexec/ioprocess --read-pipe-fd 7 --write-pipe-fd 6 --max-threads 0 > --max-queued-requests -1 > > [root@storage-ge8-vdsm3 ~]# cat nohup.out | sort -u > 512 > > [root@storage-ge8-vdsm3 ~]# wc -l nohup.out > 22390 nohup.out > > Checking storage for leftover .probe files: > > [root@storage-ge8-vdsm3 ~]# ls -lh > /rhev/data-center/mnt/glusterSD/gluster01.example.com\: > _storage__local__ge8__volume__0/.probe* > ls: cannot access > /rhev/data-center/mnt/glusterSD/gluster01.example.com: > _storage__local__ge8__volume__0/.probe*: No such file or directory > > > All hosts issues ~22400 probes during the tests. Since normal probing > interval is 5 minutes, this test is equivalent to 1866 hours (77 days). Hi Nir, So these tests are good enough to prove that there no metadata issues with gluster mount, right ? Do you need QE to run these tests again on our setup ?
(In reply to SATHEESARAN from comment #11) > (In reply to Nir Soffer from comment #10) > So these tests are good enough to prove that there no metadata issues with > gluster mount, right ? I think they are, but I'm not a gluster expert. I'm pretty sure the new code is not affected by the metadata management issue (https://bugzilla.redhat.com/show_bug.cgi?id=1751722#c13), but this code is writing to deleted files, which is not typical use pattern, so it can expose other issues in gluster. > Do you need QE to run these tests again on our setup ? I think it would be nice to run this on real storage which can be 100x times faster than the setup I tested for an hour or so.
Created attachment 1618048 [details] reproduce script for vdsm <= 4.3.6 To reproduce the original issue, run this script on multiple hosts, probing the same gluster volume mount: # python reproduce.py /rhev/data-center/mnt/glusterSD/server:_path 512 512 ... Expected results: - After some time gluster mount fuse helper will crash. - size of /rhev/data-center/mnt/glusterSD/server:_path/__DIRECT_IO_TEST__ will be negative, seen as 16 EiB in ls.
(In reply to SATHEESARAN from comment #11) >> Status: POST → NEW >> CC: nsoffer >> Flags: needinfo?(sasundar) needinfo?(kdhananj) → needinfo?(nsoffer) Accidentally changed the state of the bug from POST to NEW. Restoring the same
(In reply to Nir Soffer from comment #9) > Created attachment 1617908 [details] > ioprocess trace showing extremely slow unaligned writes > > Krutika, should we file a bug about the slow unaligned writes? > > # grep pwrite64 ioprocess.trace | head -20 > 26500 20:27:40.037730 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.009804> > 26500 20:27:40.133382 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.911269> > 26500 20:27:42.044969 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.012527> > 26500 20:27:42.105923 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.940483> > 26500 20:27:44.046644 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.028278> > 26500 20:27:44.162981 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.883644> > 26500 20:27:46.046896 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.012689> > 26500 20:27:46.135306 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.912911> > 26500 20:27:48.048518 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.011936> > 26500 20:27:48.105805 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.952041> > 26500 20:27:50.058156 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.011621> > 26500 20:27:50.133522 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.922091> > 26500 20:27:52.055897 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.009861> > 26500 20:27:52.100258 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.951614> > 26500 20:27:54.052096 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.015949> > 26500 20:27:54.125715 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.928027> > 26500 20:27:56.054076 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.016307> > 26500 20:27:56.142113 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.914674> > 26500 20:27:58.057043 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.011393> > 26500 20:27:58.099114 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.955300> Restoring needinfo on Krutika, which I removed accidentally while adddressing a needinfo on me
Nir / Avihai, I see there are good amount of testing with the latest ioprocess. As the next step, if these fixes are available as part of the build - vdsm & ioprocess - we can do a full fledged sanity testing with these changes in place.
(In reply to SATHEESARAN from comment #17) We will release new ioprocess today with these changes. New vdsm requiring this ioprocess will be available later this week (bug 1753898).
(In reply to Nir Soffer from comment #9) > Created attachment 1617908 [details] > ioprocess trace showing extremely slow unaligned writes > > Krutika, should we file a bug about the slow unaligned writes? > > # grep pwrite64 ioprocess.trace | head -20 > 26500 20:27:40.037730 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.009804> > 26500 20:27:40.133382 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.911269> > 26500 20:27:42.044969 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.012527> > 26500 20:27:42.105923 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.940483> > 26500 20:27:44.046644 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.028278> > 26500 20:27:44.162981 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.883644> > 26500 20:27:46.046896 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.012689> > 26500 20:27:46.135306 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.912911> > 26500 20:27:48.048518 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.011936> > 26500 20:27:48.105805 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.952041> > 26500 20:27:50.058156 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.011621> > 26500 20:27:50.133522 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.922091> > 26500 20:27:52.055897 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.009861> > 26500 20:27:52.100258 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.951614> > 26500 20:27:54.052096 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.015949> > 26500 20:27:54.125715 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.928027> > 26500 20:27:56.054076 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.016307> > 26500 20:27:56.142113 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.914674> > 26500 20:27:58.057043 pwrite64(0, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, > 0) = 512 <0.011393> > 26500 20:27:58.099114 pwrite64(0, "\0", 1, 0) = -1 EINVAL (Invalid argument) > <1.955300> Sorry, was out sick. Can you capture and share the gluster volume profile output? # gluster volume profile $VOLNAME start # run the test # gluster volume profile $VOLNAME info > brick-profile.out # gluster volume profile $VOLNAME stop and share the "brick-profile.out" file? -Krutika
Moving the bug to verified based on the below results: Version: ======= glusterfs-3.12.2-47.5.el7rhgs.x86_64 ioprocess-1.3.0-1.el7ev.x86_64 vdsm-4.30.33-1.el7ev.x86_64 1. Complete the RHHI-V deployment 2. Create a RHEL 7.6 template 3. Create 30 vm's using pool 4. After an hour delete the vm's in the pool 5. Repeat step 3 and step 4 for 3 iteration's 6. Create vm's using template and ran kernel untar and FIO Couldn't see any crash during the test. So moving the bug to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3030