Since kernel-5.3.0-0.rc0.git7.1.fc31 appeared in Rawhide, openQA btrfs install tests often seem to fail. The failure happens during package install; the installer just stops proceeding, either freezing or showing a spinner forever. Sometimes switching to a VT works, sometimes it doesn't. This appears to be the result of a circular locking problem (or at least, every instance of it where I have the logs seems to have a circular locking problem in the logs). The tracebacks look like this: 10:53:28,941 WARNING kernel: 10:53:28,941 WARNING kernel:====================================================== 10:53:28,941 WARNING kernel:WARNING: possible circular locking dependency detected 10:53:28,941 WARNING kernel:5.3.0-0.rc1.git2.1.fc31.x86_64 #1 Not tainted 10:53:28,941 WARNING kernel:------------------------------------------------------ 10:53:28,941 WARNING kernel:anaconda/2347 is trying to acquire lock: 10:53:28,941 WARNING kernel:0000000014191eab (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,941 WARNING kernel:#012but task is already holding lock: 10:53:28,942 WARNING kernel:00000000761cc7de (sb_pagefaults#2){.+.+}, at: btrfs_page_mkwrite+0x69/0x570 [btrfs] 10:53:28,942 WARNING kernel:#012which lock already depends on the new lock. 10:53:28,942 WARNING kernel:#012the existing dependency chain (in reverse order) is: 10:53:28,942 WARNING kernel:#012-> #6 (sb_pagefaults#2){.+.+}: 10:53:28,942 WARNING kernel: __sb_start_write+0x131/0x240 10:53:28,942 WARNING kernel: btrfs_page_mkwrite+0x69/0x570 [btrfs] 10:53:28,942 WARNING kernel: do_page_mkwrite+0x2f/0x100 10:53:28,942 WARNING kernel: do_wp_page+0x311/0x5b0 10:53:28,942 WARNING kernel: __handle_mm_fault+0xce8/0x1730 10:53:28,942 WARNING kernel: handle_mm_fault+0x16e/0x360 10:53:28,942 WARNING kernel: do_user_addr_fault+0x20d/0x490 10:53:28,942 WARNING kernel: do_page_fault+0x31/0x210 10:53:28,942 WARNING kernel: async_page_fault+0x43/0x50 10:53:28,943 WARNING kernel:#012-> #5 (&mm->mmap_sem#2){++++}: 10:53:28,943 WARNING kernel: __might_fault+0x60/0x80 10:53:28,943 WARNING kernel: _copy_from_user+0x1e/0xa0 10:53:28,943 WARNING kernel: scsi_cmd_ioctl+0x218/0x440 10:53:28,943 WARNING kernel: cdrom_ioctl+0x3c/0x1272 10:53:28,943 WARNING kernel: sr_block_ioctl+0xa0/0xd0 10:53:28,943 WARNING kernel: blkdev_ioctl+0x32b/0xacf 10:53:28,943 WARNING kernel: block_ioctl+0x3f/0x50 10:53:28,943 WARNING kernel: do_vfs_ioctl+0x411/0x750 10:53:28,943 WARNING kernel: ksys_ioctl+0x5e/0x90 10:53:28,943 WARNING kernel: __x64_sys_ioctl+0x16/0x20 10:53:28,943 WARNING kernel: do_syscall_64+0x5c/0xb0 10:53:28,943 WARNING kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe 10:53:28,944 WARNING kernel:#012-> #4 (sr_mutex){+.+.}: 10:53:28,944 WARNING kernel: __mutex_lock+0x92/0x930 10:53:28,944 WARNING kernel: sr_block_open+0x81/0x100 10:53:28,944 WARNING kernel: __blkdev_get+0xed/0x590 10:53:28,944 WARNING kernel: blkdev_get+0x4a/0x380 10:53:28,944 WARNING kernel: do_dentry_open+0x14c/0x3c0 10:53:28,944 WARNING kernel: path_openat+0x4e6/0xc50 10:53:28,944 WARNING kernel: do_filp_open+0x91/0x100 10:53:28,944 WARNING kernel: do_sys_open+0x184/0x220 10:53:28,944 WARNING kernel: do_syscall_64+0x5c/0xb0 10:53:28,944 WARNING kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe 10:53:28,944 WARNING kernel:#012-> #3 (&bdev->bd_mutex){+.+.}: 10:53:28,944 WARNING kernel: __mutex_lock+0x92/0x930 10:53:28,944 WARNING kernel: __blkdev_get+0x7a/0x590 10:53:28,945 WARNING kernel: blkdev_get+0x214/0x380 10:53:28,945 WARNING kernel: blkdev_get_by_path+0x46/0x80 10:53:28,945 WARNING kernel: btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] 10:53:28,945 WARNING kernel: open_fs_devices+0x7a/0x2a0 [btrfs] 10:53:28,945 WARNING kernel: btrfs_open_devices+0x94/0xa0 [btrfs] 10:53:28,945 WARNING kernel: btrfs_mount_root+0x309/0x6d0 [btrfs] 10:53:28,945 WARNING kernel: legacy_get_tree+0x30/0x50 10:53:28,945 WARNING kernel: vfs_get_tree+0x28/0xf0 10:53:28,945 WARNING kernel: fc_mount+0xe/0x40 10:53:28,945 WARNING kernel: vfs_kern_mount.part.0+0x71/0x90 10:53:28,945 WARNING kernel: btrfs_mount+0x155/0x8b0 [btrfs] 10:53:28,945 WARNING kernel: legacy_get_tree+0x30/0x50 10:53:28,946 WARNING kernel: vfs_get_tree+0x28/0xf0 10:53:28,946 WARNING kernel: do_mount+0x800/0xaa0 10:53:28,946 WARNING kernel: ksys_mount+0x7e/0xc0 10:53:28,946 WARNING kernel: __x64_sys_mount+0x21/0x30 10:53:28,946 WARNING kernel: do_syscall_64+0x5c/0xb0 10:53:28,946 WARNING kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe 10:53:28,946 WARNING kernel:#012-> #2 (&fs_devs->device_list_mutex){+.+.}: 10:53:28,946 WARNING kernel: __mutex_lock+0x92/0x930 10:53:28,946 WARNING kernel: btrfs_run_dev_stats+0x46/0x4b0 [btrfs] 10:53:28,946 WARNING kernel: commit_cowonly_roots+0xb5/0x300 [btrfs] 10:53:28,946 WARNING kernel: btrfs_commit_transaction+0x4e7/0xa30 [btrfs] 10:53:28,946 WARNING kernel: create_subvol+0x392/0x760 [btrfs] 10:53:28,947 WARNING kernel: btrfs_mksubvol+0x508/0x570 [btrfs] 10:53:28,947 WARNING kernel: btrfs_ioctl_snap_create_transid+0x11d/0x190 [btrfs] 10:53:28,947 WARNING kernel: btrfs_ioctl_snap_create+0x5a/0x80 [btrfs] 10:53:28,947 WARNING kernel: btrfs_ioctl+0x1ac7/0x2ce0 [btrfs] 10:53:28,947 WARNING kernel: do_vfs_ioctl+0x411/0x750 10:53:28,947 WARNING kernel: ksys_ioctl+0x5e/0x90 10:53:28,947 WARNING kernel: __x64_sys_ioctl+0x16/0x20 10:53:28,947 WARNING kernel: do_syscall_64+0x5c/0xb0 10:53:28,947 WARNING kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe 10:53:28,947 WARNING kernel:#012-> #1 (&fs_info->tree_log_mutex){+.+.}: 10:53:28,947 WARNING kernel: __mutex_lock+0x92/0x930 10:53:28,947 WARNING kernel: btrfs_commit_transaction+0x48f/0xa30 [btrfs] 10:53:28,947 WARNING kernel: create_subvol+0x392/0x760 [btrfs] 10:53:28,948 WARNING kernel: btrfs_mksubvol+0x508/0x570 [btrfs] 10:53:28,948 WARNING kernel: btrfs_ioctl_snap_create_transid+0x11d/0x190 [btrfs] 10:53:28,948 WARNING kernel: btrfs_ioctl_snap_create+0x5a/0x80 [btrfs] 10:53:28,948 WARNING kernel: btrfs_ioctl+0x1ac7/0x2ce0 [btrfs] 10:53:28,948 WARNING kernel: do_vfs_ioctl+0x411/0x750 10:53:28,948 WARNING kernel: ksys_ioctl+0x5e/0x90 10:53:28,948 WARNING kernel: __x64_sys_ioctl+0x16/0x20 10:53:28,948 WARNING kernel: do_syscall_64+0x5c/0xb0 10:53:28,948 WARNING kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe 10:53:28,948 WARNING kernel:#012-> #0 (&fs_info->reloc_mutex){+.+.}: 10:53:28,948 WARNING kernel: __lock_acquire+0xdd1/0x1910 10:53:28,949 WARNING kernel: lock_acquire+0xa2/0x1b0 10:53:28,949 WARNING kernel: __mutex_lock+0x92/0x930 10:53:28,949 WARNING kernel: btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,949 WARNING kernel: start_transaction+0x95/0x4e0 [btrfs] 10:53:28,949 WARNING kernel: btrfs_dirty_inode+0x44/0xd0 [btrfs] 10:53:28,949 WARNING kernel: file_update_time+0xeb/0x140 10:53:28,949 WARNING kernel: btrfs_page_mkwrite+0xfe/0x570 [btrfs] 10:53:28,949 WARNING kernel: do_page_mkwrite+0x2f/0x100 10:53:28,949 WARNING kernel: do_wp_page+0x311/0x5b0 10:53:28,949 WARNING kernel: __handle_mm_fault+0xce8/0x1730 10:53:28,950 WARNING kernel: handle_mm_fault+0x16e/0x360 10:53:28,950 WARNING kernel: do_user_addr_fault+0x20d/0x490 10:53:28,950 WARNING kernel: do_page_fault+0x31/0x210 10:53:28,950 WARNING kernel: async_page_fault+0x43/0x50 10:53:28,950 WARNING kernel:#012other info that might help us debug this: 10:53:28,950 WARNING kernel:Chain exists of:#012 &fs_info->reloc_mutex --> &mm->mmap_sem#2 --> sb_pagefaults#2 10:53:28,950 WARNING kernel: Possible unsafe locking scenario: 10:53:28,950 WARNING kernel: CPU0 CPU1 10:53:28,950 WARNING kernel: ---- ---- 10:53:28,950 WARNING kernel: lock(sb_pagefaults#2); 10:53:28,950 WARNING kernel: lock(&mm->mmap_sem#2); 10:53:28,950 WARNING kernel: lock(sb_pagefaults#2); 10:53:28,950 WARNING kernel: lock(&fs_info->reloc_mutex); 10:53:28,950 WARNING kernel:#012 *** DEADLOCK *** 10:53:28,950 WARNING kernel:3 locks held by anaconda/2347: 10:53:28,950 WARNING kernel: #0: 0000000084975276 (&mm->mmap_sem#2){++++}, at: do_user_addr_fault+0x126/0x490 10:53:28,950 WARNING kernel: #1: 00000000761cc7de (sb_pagefaults#2){.+.+}, at: btrfs_page_mkwrite+0x69/0x570 [btrfs] 10:53:28,951 WARNING kernel: #2: 0000000051bb0e1b (sb_internal#2){.+.+}, at: start_transaction+0x381/0x4e0 [btrfs] 10:53:28,951 WARNING kernel:#012stack backtrace: 10:53:28,951 WARNING kernel:CPU: 1 PID: 2347 Comm: anaconda Not tainted 5.3.0-0.rc1.git2.1.fc31.x86_64 #1 10:53:28,951 WARNING kernel:Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 10:53:28,951 WARNING kernel:Call Trace: 10:53:28,951 WARNING kernel: dump_stack+0x85/0xc0 10:53:28,951 WARNING kernel: check_noncircular+0x17c/0x1b0 10:53:28,951 WARNING kernel: __lock_acquire+0xdd1/0x1910 10:53:28,951 WARNING kernel: lock_acquire+0xa2/0x1b0 10:53:28,951 WARNING kernel: ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,951 WARNING kernel: ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,951 WARNING kernel: __mutex_lock+0x92/0x930 10:53:28,952 WARNING kernel: ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,952 WARNING kernel: ? rcu_read_lock_sched_held+0x6b/0x80 10:53:28,952 WARNING kernel: ? module_assert_mutex_or_preempt+0x14/0x40 10:53:28,952 WARNING kernel: ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,952 WARNING kernel: ? kvm_sched_clock_read+0x14/0x30 10:53:28,952 WARNING kernel: ? sched_clock_cpu+0xc/0xc0 10:53:28,952 WARNING kernel: ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,952 WARNING kernel: btrfs_record_root_in_trans+0x44/0x70 [btrfs] 10:53:28,952 WARNING kernel: start_transaction+0x95/0x4e0 [btrfs] 10:53:28,952 WARNING kernel: btrfs_dirty_inode+0x44/0xd0 [btrfs] 10:53:28,953 WARNING kernel: file_update_time+0xeb/0x140 10:53:28,953 WARNING kernel: btrfs_page_mkwrite+0xfe/0x570 [btrfs] 10:53:28,953 WARNING kernel: ? sched_clock+0x5/0x10 10:53:28,953 WARNING kernel: do_page_mkwrite+0x2f/0x100 10:53:28,953 WARNING kernel: do_wp_page+0x311/0x5b0 10:53:28,953 WARNING kernel: __handle_mm_fault+0xce8/0x1730 10:53:28,953 WARNING kernel: handle_mm_fault+0x16e/0x360 10:53:28,953 WARNING kernel: do_user_addr_fault+0x20d/0x490 10:53:28,953 WARNING kernel: do_page_fault+0x31/0x210 10:53:28,953 WARNING kernel: async_page_fault+0x43/0x50 10:53:28,953 WARNING kernel:RIP: 0033:0x7fce753f0f44 10:53:28,954 WARNING kernel:Code: 77 a1 48 83 fa 40 77 16 f3 0f 7f 07 f3 0f 7f 47 10 f3 0f 7f 44 17 f0 f3 0f 7f 44 17 e0 c3 48 8d 4f 40 f3 0f 7f 07 48 83 e1 c0 <f3> 0f 7f 44 17 f0 f3 0f 7f 47 10 f3 0f 7f 44 17 e0 f3 0f 7f 47 20 10:53:28,954 WARNING kernel:RSP: 002b:00007fce55f51208 EFLAGS: 00010206 10:53:28,954 WARNING kernel:RAX: 00007fce57197088 RBX: 0000000000000002 RCX: 00007fce571970c0 10:53:28,954 WARNING kernel:RDX: 0000000000007f78 RSI: 0000000000000000 RDI: 00007fce57197088 10:53:28,954 WARNING kernel:RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000e5ad8ec1 10:53:28,954 WARNING kernel:R10: 0000000000001050 R11: 0000000000000000 R12: 0000000000000000 10:53:28,954 WARNING kernel:R13: 0000000000000004 R14: 000056081a5270d8 R15: 000056081a5270d8
Proposing as a Beta blocker per criterion "When using both the installer-native and the blivet-gui-based custom partitioning flow, the installer must be able to: Correctly interpret, and modify as described below, any disk with a valid ms-dos or gpt disk label and partition table containing ext4 partitions, LVM and/or btrfs volumes, and/or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions..." It doesn't actually say that we should then be able to successfully complete an install to devices of those types, but I'm gonna say it's kinda implied.
I'm strongly against anything with btrfs being a blocker. If that's in the criteria I think we should see about removing btrfs simply because we don't have the resources to actually deal with btrfs besides reporting bugs upstream.
Agreed, btrfs has been a gamble pretty much always. See previous discussion around proposals to make btrfs default. Ext4 and xfs should be the only release blocking.
We can revisit that, sure. The storage criteria have always been...fun. The basic principle, though, is that stuff the installer offers prominently ought to work: so perhaps this could also be a reason to revisit dropping btrfs from the installer...
I wouldn't be opposed to that, though it is probably a bit late for F31. Either way, there is nothing about btrfs that should be release blocking. We have pretty much always discussed it as "use at your own risk"
This isn't a new lockdep splat. I've seen it in debug kernels since 5.0.0 and haven't seen it actually cause any problems in hundreds of VM and baremetall installations, and multiple production systems. Upstream is aware of it, and based on this explanation I'm not sure that it's actually a Btrfs problem or if it's just exposed by Btrfs. An actual deadlock attributed to this would get it a lot more attention. https://lore.kernel.org/linux-btrfs/20190703211210.GJ16275@worktop.programming.kicks-ass.net/
Well, after filing the bug report, I found several other cases where the bug happened but that circular locking backtrace isn't in the logs. So I'm not sure now if it's really related to the bug. But, the bug definitely seems real. After a long history of the btrfs tests either passing or failing for some other obviously identifiable reason, they suddenly started getting flaky on both prod and staging from the Fedora-Rawhide-20190720.n.1 onwards. Each compose since then at least one out of the four tests (we test UEFI and BIOS installs via both 'custom partitioning' and 'advanced custom partitioning') has failed.
OK it looks like this has been fixed already. https://lore.kernel.org/linux-btrfs/35b5e6a8-8e9b-037d-b248-36fee9da8717@suse.com/
OK, I'll see if the actual install-hangy-bug goes away over the next few composes, then (assuming that patch will land in our kernel builds soonish).
It's been merged for rc2. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4792ba1f1ff0db30369f7016c1611fda3f84b895 It's not been pulled in for 5.3.0-0.rc1.git4.1, but might be if there's a git5.1.
Discussed during the 2019-07-29 blocker review meeting: [1] The decision to delay the classification of this as a blocker bug was made as, while this seems fairly likely to be a blocker under current policy, the Anaconda team believes our current policy casts too wide a net. We will start a discussion of storage criteria on the mailing lists and reconvene on this bug next meeting. [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-07-29/f31-blocker-review.2019-07-29-16.02.txt
I was able to reproduce this 1 in 2 attempts with Fedora-Workstation-Live-x86_64-Rawhide-20190730.n.0.iso (kernel 5.3.0-0.rc1.git3.1), and not at all since rc2 landed in Fedora-Workstation-Live-x86_64-Rawhide-20190731.n.0.iso, and four tests in openqa on 20190802 also succeeded. I think this can be set to CLOSED RAWHIDE.
yeah, this failure has not happened in openQA for some time. Let's call it fixed.