Bug 1659605 - ext4 hung blk_mq_make_request
Summary: ext4 hung blk_mq_make_request
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-14 19:03 UTC by Sami Farin
Modified: 2019-02-21 21:06 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-21 21:06:45 UTC


Attachments (Terms of Use)

Description Sami Farin 2018-12-14 19:03:47 UTC
Description of problem:
[  *** ] A start job is running for Create Vā€¦ and Directories (4min / no limit)[  247.178485] INFO: task kworker/u12:2:124 blocked for more than 120 seconds.
[  247.282884]       Not tainted 4.19.8-300.fc29.x86_64+debug #1
[  247.334594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.387284] kworker/u12:2   D12672   124      2 0x80000000
[  247.440122] Workqueue: writeback wb_workfn (flush-8:0)
[  247.493112] Call Trace:
[  247.545317]  ? __schedule+0x2e2/0xa70
[  247.545324]  schedule+0x2f/0x90
[  247.545327]  io_schedule+0x12/0x40
[  247.545330]  wbt_wait+0x1fe/0x300
[  247.545333]  ? trace_event_raw_event_wbt_step+0x130/0x130
[  247.796469]  rq_qos_throttle+0x31/0x40
[  247.796474]  blk_mq_make_request+0x111/0x6e0
[  247.796480]  ? generic_make_request+0x3c1/0x6c0
[  247.796483]  generic_make_request+0x2d5/0x6c0
[  247.796490]  ? submit_bio+0x45/0x140
[  247.796491]  submit_bio+0x45/0x140
[  247.796496]  ext4_io_submit+0x49/0x60
[  247.796498]  ext4_bio_write_page+0x202/0x5fa
[  247.796504]  mpage_submit_page+0x53/0x70
[  247.796506]  mpage_process_page_bufs+0xe7/0xf0
[  247.796508]  mpage_prepare_extent_to_map+0x254/0x3e0
[  247.796513]  ? cpuacct_account_field+0x62/0x1c0
[  247.796515]  ? __lock_is_held+0x5a/0xa0
[  247.796518]  ? ext4_init_io_end+0x1b/0x40
[  247.796522]  ? rcu_read_lock_sched_held+0x6b/0x80
[  247.796524]  ? kmem_cache_alloc+0x263/0x290
[  247.796529]  ext4_writepages+0x4d7/0x1170
[  247.796532]  ? __bfs+0x118/0x230
[  247.796549]  ? do_writepages+0x41/0xe0
[  247.796551]  do_writepages+0x41/0xe0
[  247.796554]  ? writeback_sb_inodes+0x12f/0x500
[  247.796555]  ? __lock_is_held+0x5a/0xa0
[  247.796561]  __writeback_single_inode+0x54/0x5f0
[  247.796566]  writeback_sb_inodes+0x1ff/0x500
[  247.796577]  __writeback_inodes_wb+0x5d/0xb0
[  247.796582]  wb_writeback+0x31f/0x470
[  247.796591]  wb_workfn+0x3ee/0x590
[  247.796600]  process_one_work+0x27d/0x600
[  247.796608]  worker_thread+0x3c/0x390
[  247.796612]  ? process_one_work+0x600/0x600
[  247.796614]  kthread+0x120/0x140
[  247.796616]  ? kthread_create_worker_on_cpu+0x70/0x70
[  247.796621]  ret_from_fork+0x27/0x50
[  247.796640] INFO: task journal-offline:760 blocked for more than 120 seconds.
[  247.796641]       Not tainted 4.19.8-300.fc29.x86_64+debug #1

[  370.547505] Showing all locks held in the system:
[  370.547513] 1 lock held by khungtaskd/44:
[  370.547514]  #0: 00000000f6b2a9f0 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x183
[  370.547524] 4 locks held by kworker/u12:2/124:
[  370.547525]  #0: 000000001e434792 ((wq_completion)"writeback"){+.+.}, at: process_one_work+0x1f3/0x600
[  370.547530]  #1: 000000002b4c1e03 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0x1f3/0x600
[  370.547534]  #2: 0000000063067612 (&type->s_umount_key#38){++++}, at: trylock_super+0x16/0x50
[  370.547540]  #3: 00000000b87b6163 (&sbi->s_journal_flag_rwsem){.+.+}, at: do_writepages+0x41/0xe0
[  370.547545] 5 locks held by plymouthd/549:
[  370.547546] 1 lock held by jbd2/sda1-8/600:
[  370.547548] 4 locks held by systemd-tmpfile/841:
[  370.547548]  #0: 000000002616d23a (sb_writers#12){.+.+}, at: mnt_want_write+0x20/0x50
[  370.547554]  #1: 0000000031d9d5d8 (&type->i_mutex_dir_key#9/1){+.+.}, at: do_unlinkat+0x142/0x310
[  370.547559]  #2: 00000000da65ddc4 (&sb->s_type->i_mutex_key#13){+.+.}, at: vfs_unlink+0x48/0x1a0
[  370.547564]  #3: 000000000d9b33b6 (jbd2_handle){++++}, at: start_this_handle+0x196/0x530
[  370.547569] 

[    0.292561] Running RCU self tests
[    0.292565] rcu: Hierarchical RCU implementation.
[    0.292567] rcu: 	RCU lockdep checking is enabled.
[    0.292569] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=6.
[    0.292571] rcu: 	RCU callback double-/use-after-free debug enabled.
[    0.292573] 	Tasks RCU enabled.
[    0.292575] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
[    0.295279] NR_IRQS: 524544, nr_irqs: 472, preallocated irqs: 16
[    0.295508] rcu: 	Offload RCU callbacks from CPUs: (none).


Dec 14 16:45:22 cloudimg.localdomain kernel: rcu-torture:--- End of test: SUCCESS: nreaders=5 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0

Version-Release number of selected component (if applicable):
4.19.8-300.fc29.x86_64+debug

How reproducible:
100%

Steps to Reproduce:
1. reboot
2.
3.

Actual results:
hang

Expected results:
no hang

Additional info:
Ryzen 1600X, using qemu+kvm

I did fsck.ext4 -f for this partition (booted F29 live).

kernel params console=tty1 console=ttyS0,115200n8 LANG=en_US.utf8 vconsole.keymap=fi scsi_mod.use_blk_mq=Y rcu_nocbs=0-11 initrd=/boot/initramfs-4.19.8-300.fc29.x86_64+debug.img

Comment 1 Sami Farin 2018-12-15 13:49:00 UTC
4.19.9 boots without freezing

Comment 2 Sami Farin 2018-12-24 14:03:24 UTC
4.19.10 froze, but booted with scsi_mod.use_blk_mq=N

Comment 3 Justin M. Forbes 2019-01-29 16:13:03 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 4.20.5-200.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 4 Justin M. Forbes 2019-02-21 21:06:45 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.