Bug 1655000 - calling fstrim on XFS gets stuck
Summary: calling fstrim on XFS gets stuck
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kmod-kvdo
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: sclafani
QA Contact: Jakub Krysl
URL:
Whiteboard:
Depends On: 1657340
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-30 10:08 UTC by Jakub Krysl
Modified: 2019-05-22 05:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-20 01:44:45 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Jakub Krysl 2018-11-30 10:08:53 UTC
Description of problem:
When testing for BZ 1654997 I also tried mounting without discards and calling fstrim, but the command was completely stuck and 100% CPU power was taken by kvdo0:logQ0 process. 
# mount /dev/mapper/vdo vdo/
[root@storageqe-90 ~]# dd if=/dev/urandom of=data.file count=1800 bs=1M status=progress                                                                                                                           
1831862272 bytes (1.8 GB, 1.7 GiB) copied, 29 s, 63.2 MB/s
1800+0 records in
1800+0 records out
1887436800 bytes (1.9 GB, 1.8 GiB) copied, 29.8898 s, 63.1 MB/s
[root@storageqe-90 ~]# cd vdo
[root@storageqe-90 vdo]# cp ../data.file .
[root@storageqe-90 vdo]# time sync

real    1m20.447s
user    0m0.000s
sys     0m0.002s
[root@storageqe-90 vdo]# rm -f data.file; time fstrim .
^C
real    26m35.559s
user    0m0.001s
sys     0m0.011s

Also some kernel hung task calltraces appeared (this is the 1st one):
[  861.194401] INFO: task fstrim:13080 blocked for more than 120 seconds.
[  861.200930]       Tainted: G           OE    --------- ---  4.18.0-40.el8.x86_64 #1
[  861.208586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  861.216413] Call Trace:
[  861.218880]  ? __schedule+0x254/0x840
[  861.222546]  schedule+0x28/0x80
[  861.225692]  schedule_timeout+0x26d/0x390
[  861.229707]  io_schedule_timeout+0x19/0x40
[  861.233805]  wait_for_completion_io+0x11f/0x190
[  861.238342]  ? wake_up_q+0x70/0x70
[  861.241759]  submit_bio_wait+0x5b/0x80
[  861.245515]  blkdev_issue_discard+0x7a/0xd0
[  861.249755]  xfs_trim_extents+0x1a0/0x420 [xfs]
[  861.254326]  xfs_ioc_trim+0x162/0x200 [xfs]
[  861.258543]  xfs_file_ioctl+0x1e8/0xbb0 [xfs]
[  861.262904]  ? _copy_to_user+0x26/0x30
[  861.266663]  ? cp_new_stat+0x150/0x180
[  861.270426]  ? selinux_file_ioctl+0x161/0x200
[  861.274792]  do_vfs_ioctl+0xa4/0x620
[  861.278372]  ksys_ioctl+0x60/0x90
[  861.281693]  __x64_sys_ioctl+0x16/0x20
[  861.285454]  do_syscall_64+0x5b/0x1b0
[  861.289120]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  861.294181] RIP: 0033:0x7fca6fafe3fb
[  861.297772] Code: Bad RIP value.
[  861.301008] RSP: 002b:00007ffe1bddcdd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  861.308577] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fca6fafe3fb
[  861.315708] RDX: 00007ffe1bddcde0 RSI: 00000000c0185879 RDI: 0000000000000003
[  861.322842] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[  861.329976] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe1bdde3b0
[  861.337115] R13: 00007fca70acbec0 R14: 0000000000000000 R15: 0000000000000000

Though in this case it seems the discarding finished long time ago, just the process never exited:
# cd ..
# time umount vdo

real    0m0.150s
user    0m0.004s
sys     0m0.000s


Version-Release number of selected component (if applicable):
vdo-6.2.0.273-9.el8.x86_64
kmod-kvdo-6.2.0.273-35.el8.x86_66

How reproducible:
100%

Steps to Reproduce:
1. vdo create --name vdo --device /dev/sda
2. mkfs.xfs -K /dev/mapper/vdo
3. mkdir vdo
4. mount /dev/mapper/vdo vdo
5. dd if=/dev/urandom of=vdo/data.file count=1800 bs=1M
6. rm -f vdo/data.file
7. fstrim vdo

Actual results:
fstrim gets stuck producing hung call traces, process kvdo0:logQ0 takes 100% CPU

Expected results:
fstrim passes without issues

Additional info:

Comment 1 Andy Walsh 2018-12-15 15:53:20 UTC
kernel-4.18.0-53.el8 should have provided the fix for this.  Jakub, can you verify that?

Comment 2 Jakub Krysl 2018-12-17 11:41:09 UTC
(In reply to Andy Walsh from comment #1)
> kernel-4.18.0-53.el8 should have provided the fix for this.  Jakub, can you
> verify that?

I ran on the same compose with new kernel, just had to upgrade kmod-kvdo to newer patch because of the new kernel.
# rpm -qa *vdo*
vdo-6.2.0.273-9.el8.x86_64
kmod-kvdo-6.2.0.273-36.el8.x86_64
# uname -r
4.18.0-53.el8.x86_64

# time fstrim vdo                                               
^C
                                                                                        
real    22m22.984s                                                                   
user    0m0.001s                                                                     
sys     22m13.333s  

(fstrim the whole 10TB drive would take really long...)

No calltraces appeared during fstrim, so the issue is gone with the new kernel.

Comment 4 Andy Walsh 2019-02-20 01:44:45 UTC
This was caused by a kernel bug that was resolved in 4.18.0-53.el8.x86_64.

Comment 5 Andy Walsh 2019-02-20 01:46:28 UTC
Oops.  All supported architectures, not just x86_64.


Note You need to log in before you can comment on or make changes to this bug.