Bug 1659605

Summary: ext4 hung blk_mq_make_request
Product: [Fedora] Fedora Reporter: Sami Farin <hvtaifwkbgefbaei>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 29CC: airlied, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-21 21:06:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sami Farin 2018-12-14 19:03:47 UTC
Description of problem:
[  *** ] A start job is running for Create V… and Directories (4min / no limit)[  247.178485] INFO: task kworker/u12:2:124 blocked for more than 120 seconds.
[  247.282884]       Not tainted 4.19.8-300.fc29.x86_64+debug #1
[  247.334594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.387284] kworker/u12:2   D12672   124      2 0x80000000
[  247.440122] Workqueue: writeback wb_workfn (flush-8:0)
[  247.493112] Call Trace:
[  247.545317]  ? __schedule+0x2e2/0xa70
[  247.545324]  schedule+0x2f/0x90
[  247.545327]  io_schedule+0x12/0x40
[  247.545330]  wbt_wait+0x1fe/0x300
[  247.545333]  ? trace_event_raw_event_wbt_step+0x130/0x130
[  247.796469]  rq_qos_throttle+0x31/0x40
[  247.796474]  blk_mq_make_request+0x111/0x6e0
[  247.796480]  ? generic_make_request+0x3c1/0x6c0
[  247.796483]  generic_make_request+0x2d5/0x6c0
[  247.796490]  ? submit_bio+0x45/0x140
[  247.796491]  submit_bio+0x45/0x140
[  247.796496]  ext4_io_submit+0x49/0x60
[  247.796498]  ext4_bio_write_page+0x202/0x5fa
[  247.796504]  mpage_submit_page+0x53/0x70
[  247.796506]  mpage_process_page_bufs+0xe7/0xf0
[  247.796508]  mpage_prepare_extent_to_map+0x254/0x3e0
[  247.796513]  ? cpuacct_account_field+0x62/0x1c0
[  247.796515]  ? __lock_is_held+0x5a/0xa0
[  247.796518]  ? ext4_init_io_end+0x1b/0x40
[  247.796522]  ? rcu_read_lock_sched_held+0x6b/0x80
[  247.796524]  ? kmem_cache_alloc+0x263/0x290
[  247.796529]  ext4_writepages+0x4d7/0x1170
[  247.796532]  ? __bfs+0x118/0x230
[  247.796549]  ? do_writepages+0x41/0xe0
[  247.796551]  do_writepages+0x41/0xe0
[  247.796554]  ? writeback_sb_inodes+0x12f/0x500
[  247.796555]  ? __lock_is_held+0x5a/0xa0
[  247.796561]  __writeback_single_inode+0x54/0x5f0
[  247.796566]  writeback_sb_inodes+0x1ff/0x500
[  247.796577]  __writeback_inodes_wb+0x5d/0xb0
[  247.796582]  wb_writeback+0x31f/0x470
[  247.796591]  wb_workfn+0x3ee/0x590
[  247.796600]  process_one_work+0x27d/0x600
[  247.796608]  worker_thread+0x3c/0x390
[  247.796612]  ? process_one_work+0x600/0x600
[  247.796614]  kthread+0x120/0x140
[  247.796616]  ? kthread_create_worker_on_cpu+0x70/0x70
[  247.796621]  ret_from_fork+0x27/0x50
[  247.796640] INFO: task journal-offline:760 blocked for more than 120 seconds.
[  247.796641]       Not tainted 4.19.8-300.fc29.x86_64+debug #1

[  370.547505] Showing all locks held in the system:
[  370.547513] 1 lock held by khungtaskd/44:
[  370.547514]  #0: 00000000f6b2a9f0 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x183
[  370.547524] 4 locks held by kworker/u12:2/124:
[  370.547525]  #0: 000000001e434792 ((wq_completion)"writeback"){+.+.}, at: process_one_work+0x1f3/0x600
[  370.547530]  #1: 000000002b4c1e03 ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0x1f3/0x600
[  370.547534]  #2: 0000000063067612 (&type->s_umount_key#38){++++}, at: trylock_super+0x16/0x50
[  370.547540]  #3: 00000000b87b6163 (&sbi->s_journal_flag_rwsem){.+.+}, at: do_writepages+0x41/0xe0
[  370.547545] 5 locks held by plymouthd/549:
[  370.547546] 1 lock held by jbd2/sda1-8/600:
[  370.547548] 4 locks held by systemd-tmpfile/841:
[  370.547548]  #0: 000000002616d23a (sb_writers#12){.+.+}, at: mnt_want_write+0x20/0x50
[  370.547554]  #1: 0000000031d9d5d8 (&type->i_mutex_dir_key#9/1){+.+.}, at: do_unlinkat+0x142/0x310
[  370.547559]  #2: 00000000da65ddc4 (&sb->s_type->i_mutex_key#13){+.+.}, at: vfs_unlink+0x48/0x1a0
[  370.547564]  #3: 000000000d9b33b6 (jbd2_handle){++++}, at: start_this_handle+0x196/0x530
[  370.547569] 

[    0.292561] Running RCU self tests
[    0.292565] rcu: Hierarchical RCU implementation.
[    0.292567] rcu: 	RCU lockdep checking is enabled.
[    0.292569] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=6.
[    0.292571] rcu: 	RCU callback double-/use-after-free debug enabled.
[    0.292573] 	Tasks RCU enabled.
[    0.292575] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
[    0.295279] NR_IRQS: 524544, nr_irqs: 472, preallocated irqs: 16
[    0.295508] rcu: 	Offload RCU callbacks from CPUs: (none).


Dec 14 16:45:22 cloudimg.localdomain kernel: rcu-torture:--- End of test: SUCCESS: nreaders=5 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0

Version-Release number of selected component (if applicable):
4.19.8-300.fc29.x86_64+debug

How reproducible:
100%

Steps to Reproduce:
1. reboot
2.
3.

Actual results:
hang

Expected results:
no hang

Additional info:
Ryzen 1600X, using qemu+kvm

I did fsck.ext4 -f for this partition (booted F29 live).

kernel params console=tty1 console=ttyS0,115200n8 LANG=en_US.utf8 vconsole.keymap=fi scsi_mod.use_blk_mq=Y rcu_nocbs=0-11 initrd=/boot/initramfs-4.19.8-300.fc29.x86_64+debug.img

Comment 1 Sami Farin 2018-12-15 13:49:00 UTC
4.19.9 boots without freezing

Comment 2 Sami Farin 2018-12-24 14:03:24 UTC
4.19.10 froze, but booted with scsi_mod.use_blk_mq=N

Comment 3 Justin M. Forbes 2019-01-29 16:13:03 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.

Fedora 29 has now been rebased to 4.20.5-200.fc29.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 4 Justin M. Forbes 2019-02-21 21:06:45 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.