1956276 – VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds. Have a nice day...

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1956276 - VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds. Have a nice day...

Summary: VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds. Have ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	beta
Target Release:	---
Assignee:	fs-maint
QA Contact:	Kun Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-03 11:28 UTC by Frantisek Sumsal
Modified:	2022-07-30 07:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-07-30 07:27:36 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Frantisek Sumsal 2021-05-03 11:28:07 UTC

Description of problem:
After moving one of our systemd upstream CI jobs from CentOS 7 to CentOS 8, I noticed that sometimes the machines suddenly die without any indication of what went wrong. After further debugging, the issue appears to be a kernel panic during loopdev cleanup:

```
[  589.071352] systemd-journald[868]: Sent WATCHDOG=1 notification.
[  589.146592] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[  589.862623] EXT4-fs (loop1p1): mounted filesystem with ordered data mode. Opts: (null)
[  590.706003] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[  593.106680] VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds.  Have a nice day...
[  593.116077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000210
[  593.123932] PGD 0 P4D 0 
[  593.126479] Oops: 0000 [#1] SMP NOPTI
[  593.130158] CPU: 0 PID: 552 Comm: kworker/0:3 Kdump: loaded Tainted: G                 ---------r-  - 4.18.0-240.22.1.el8_3.x86_64 #1
[  593.142189] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
[  593.152048] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
[  593.158110] RIP: 0010:ext4_evict_inode+0x7e/0x670 [ext4]
[  593.163441] Code: 02 00 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 0f 85 fe 05 00 00 48 83 c4 10 5b 5d 41 5c c3 48 8b 43 28 48 8b 90 00 04 00 00 <48> 8b ba 10 02 00 00 48 85 ff 74 b5 0f b7 03 89 c1 66 81 e1 00 f0
[  593.182262] RSP: 0018:ffffac5783f37dc0 EFLAGS: 00010212
[  593.187504] RAX: ffff946193d8e800 RBX: ffff9461f46b8a18 RCX: 0000000000000011
[  593.194661] RDX: 0000000000000000 RSI: 000000000000009e RDI: ffff9461f46b8a18
[  593.201817] RBP: ffff9461f46b8b38 R08: ffff945b04b017c0 R09: 0000000000000000
[  593.208973] R10: 00000000000001a3 R11: 0000000000126c50 R12: ffffffffc08e6740
[  593.216131] R13: ffff94602669b800 R14: ffff94617451fe60 R15: ffffffff93ece780
[  593.223290] FS:  0000000000000000(0000) GS:ffff94621fa00000(0000) knlGS:0000000000000000
[  593.231405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  593.237167] CR2: 0000000000000210 CR3: 0000000815d28000 CR4: 00000000000406f0
[  593.244323] Call Trace:
[  593.246788]  evict+0xd2/0x1a0
[  593.249770]  inode_switch_wbs_work_fn+0x395/0x590
[  593.254492]  ? add_timer+0x13f/0x1f0
[  593.258084]  process_one_work+0x1a7/0x360
[  593.262111]  worker_thread+0x30/0x390
[  593.265786]  ? create_worker+0x1a0/0x1a0
[  593.269723]  kthread+0x112/0x130
[  593.272965]  ? kthread_flush_work_fn+0x10/0x10
[  593.277427]  ret_from_fork+0x22/0x40
[  593.281015] Modules linked in: xt_nat xt_addrtype ipt_MASQUERADE iptable_nat nft_fib_ipv6 nft_masq nft_nat nft_fib_ipv4 nft_fib nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink dm_crypt loop ext4 mbcache jbd2 amd64_edac_mod edac_mce_amd kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr sp5100_tco k10temp fam15h_power i2c_piix4 acpi_cpufreq drm fuse ip_tables xfs libcrc32c dm_service_time sd_mod sg crc32c_intel e1000 ahci libahci libata dm_multipath dm_mirror dm_region_hash dm_log dm_mod
[  593.331370] Features: eBPF/cgroup
[  593.334704] CR2: 0000000000000210
```

The "phase" during which this happens, contains a lot of loopdev shenanigans (basically each tests creates an OS image, runs a test in it via QEMU or systemd nspawn, and then does a cleanup - all steps involve calling `losetup`).

Version-Release number of selected component (if applicable):
# uname -a
Linux n39.crusty.ci.centos.org 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Apr 8 19:01:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release 
CentOS Linux release 8.3.2011


How reproducible:
Unfortunately, there's no reliable reproducer, the issue may not appear for a couple of days and then happen multiple times in a short time span. I managed to harvest kdump's vmcore from one of the crashes, hopefully it'll be of some help, please see: https://mrc0mmand.fedorapeople.org/vfs-loopdev-panic.tar.gz

Since this worked flawlessly on RHEL 7, I'd consider this as an regression, but I'm not sure if I should mark it as one, since it's a different release.

Comment 1 Frantisek Sumsal 2021-05-03 14:15:40 UTC

Symbolized stack trace from the aforementioned vmcore (for completeness):

```
crash> bt -Flps
PID: 552    TASK: ffff9461ffb497c0  CPU: 0   COMMAND: "kworker/0:3"
 #0 [ffffac5783f37b30] machine_kexec+446 at ffffffff92c5bf2e
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
    ffffac5783f37b38: 0000040faae1a200 ffff945a00000000 
    ffffac5783f37b48: 00000000b2003000 ffff945ab2003000 
    ffffac5783f37b58: 00000000b2002000 3e98320b00080800 
    ffffac5783f37b68: 0fcd040faae1a200 ffffac5783f37d18 
    ffffac5783f37b78: ffffac5783f37b90 ffffac5783f37d18 
    ffffac5783f37b88: __crash_kexec+109 
 #1 [ffffac5783f37b88] __crash_kexec+109 at ffffffff92d6051d
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/kernel/kexec_core.c: 956
    ffffac5783f37b90: noop_backing_dev_info [kmalloc-96]     
    ffffac5783f37ba0: [kmalloc-1k]     ext4_sops        
    ffffac5783f37bb0: [ext4_inode_cache(523:session-1.scope)] [ext4_inode_cache(523:session-1.scope)] 
    ffffac5783f37bc0: 0000000000126c50 00000000000001a3 
    ffffac5783f37bd0: 0000000000000000 [task_struct]    
    ffffac5783f37be0: [kmalloc-2k]     0000000000000011 
    ffffac5783f37bf0: 0000000000000000 000000000000009e 
    ffffac5783f37c00: [ext4_inode_cache(523:session-1.scope)] ffffffffffffffff 
    ffffac5783f37c10: ext4_evict_inode+126 0000000000000010 
    ffffac5783f37c20: 0000000000010212 ffffac5783f37dc0 
    ffffac5783f37c30: 0000000000000018 0fcd040faae1a200 
    ffffac5783f37c40: ffffac5783f37d18 0000000000000009 
    ffffac5783f37c50: crash_kexec+61   
 #2 [ffffac5783f37c50] crash_kexec+61 at ffffffff92d613fd
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/./include/linux/compiler.h: 219
    ffffac5783f37c58: ffffffff00000210 0000000000000046 
    ffffac5783f37c68: oops_end+189     
 #3 [ffffac5783f37c68] oops_end+189 at ffffffff92c22d4d
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/arch/x86/kernel/dumpstack.c: 334
    ffffac5783f37c70: ffffac5783f37d18 0000000000000210 
    ffffac5783f37c80: [task_struct]    no_context+446   
 #4 [ffffac5783f37c88] no_context+446 at ffffffff92c6ba9e
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/arch/x86/mm/fault.c: 773
    ffffac5783f37c90: 0000000000000046 0000000000000010 
    ffffac5783f37ca0: 0000000000029dc0 0fcd040faae1a200 
    ffffac5783f37cb0: 0000000000000000 ffffac5783f37d18 
    ffffac5783f37cc0: 0000000000000000 0000000000000210 
    ffffac5783f37cd0: 0000000000000000 0000000000000000 
    ffffac5783f37ce0: do_page_fault+50 
 #5 [ffffac5783f37ce0] do_page_fault+50 at ffffffff92c6c5c2
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/./arch/x86/include/asm/jump_label.h: 38
    ffffac5783f37ce8: 0000000000000000 0000000000000000 
    ffffac5783f37cf8: 0000000000000000 0000000000000000 
    ffffac5783f37d08: 0000000000000000 page_fault+30    
 #6 [ffffac5783f37d10] page_fault+30 at ffffffff9360122e
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/arch/x86/entry/entry_64.S: 1183
    [exception RIP: ext4_evict_inode+126]
    RIP: ffffffffc08a953e  RSP: ffffac5783f37dc0  RFLAGS: 00010212
    RAX: ffff946193d8e800  RBX: ffff9461f46b8a18  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: 000000000000009e  RDI: ffff9461f46b8a18
    RBP: ffff9461f46b8b38   R8: ffff945b04b017c0   R9: 0000000000000000
    R10: 00000000000001a3  R11: 0000000000126c50  R12: ffffffffc08e6740
    R13: ffff94602669b800  R14: ffff94617451fe60  R15: ffffffff93ece780
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/fs/ext4/ext4.h: 1531
    ffffac5783f37d18: noop_backing_dev_info [kmalloc-96]     
    ffffac5783f37d28: [kmalloc-1k]     ext4_sops        
    ffffac5783f37d38: [ext4_inode_cache(523:session-1.scope)] [ext4_inode_cache(523:session-1.scope)] 
    ffffac5783f37d48: 0000000000126c50 00000000000001a3 
    ffffac5783f37d58: 0000000000000000 [task_struct]    
    ffffac5783f37d68: [kmalloc-2k]     0000000000000011 
    ffffac5783f37d78: 0000000000000000 000000000000009e 
    ffffac5783f37d88: [ext4_inode_cache(523:session-1.scope)] ffffffffffffffff 
    ffffac5783f37d98: ext4_evict_inode+126 0000000000000010 
    ffffac5783f37da8: 0000000000010212 ffffac5783f37dc0 
    ffffac5783f37db8: 0000000000000018 0000000000000000 
    ffffac5783f37dc8: 0fcd040faae1a200 [ext4_inode_cache(523:session-1.scope)] 
    ffffac5783f37dd8: [ext4_inode_cache(523:session-1.scope)] ext4_sops        
    ffffac5783f37de8: evict+210        
 #7 [ffffac5783f37de8] evict+210 at ffffffff92ef8422
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/fs/inode.c: 566
    ffffac5783f37df0: [ext4_inode_cache(523:session-1.scope)] [ext4_inode_cache(523:session-1.scope)] 
    ffffac5783f37e00: [kmalloc-1k]     inode_switch_wbs_work_fn+917 
 #8 [ffffac5783f37e08] inode_switch_wbs_work_fn+917 at ffffffff92f0bc85
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/fs/fs-writeback.c: 471
    ffffac5783f37e10: [kmalloc-1k]     [ext4_inode_cache(523:session-1.scope)] 
    ffffac5783f37e20: [kmalloc-1k]     0000000100047901 
    ffffac5783f37e30: 0000000000000000 0000000000000000 
    ffffac5783f37e40: 0000000000000006 add_timer+319    
    ffffac5783f37e50: 0000000000000006 0fcd040faae1a200 
    ffffac5783f37e60: 0fcd040faae1a200 [kmalloc-96]     
    ffffac5783f37e70: ffff94621fa296c0 ffff94621fa33000 
    ffffac5783f37e80: 0000000000000000 [kmalloc-192]    
    ffffac5783f37e90: [kmalloc-96]     process_one_work+423 
 #9 [ffffac5783f37e98] process_one_work+423 at ffffffff92cd34b7
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/./arch/x86/include/asm/jump_label.h: 38
    ffffac5783f37ea0: 0000000a1fa32400 ffff94621fa296c0 
    ffffac5783f37eb0: ffff94621fa296c0 [kmalloc-192]    
    ffffac5783f37ec0: ffff94621fa296e0 [kmalloc-192]    
    ffffac5783f37ed0: [kmalloc-192]    worker_thread+48 
#10 [ffffac5783f37ed8] worker_thread+48 at ffffffff92cd3b80
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/./include/linux/compiler.h: 193
    ffffac5783f37ee0: [kmalloc-96]     [kmalloc-64]     
    ffffac5783f37ef0: ffffac5783b17dc8 [task_struct]    
    ffffac5783f37f00: [kmalloc-192]    worker_thread    
    ffffac5783f37f10: kthread+274      
#11 [ffffac5783f37f10] kthread+274 at ffffffff92cd9542
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/kernel/kthread.c: 280
    ffffac5783f37f18: [kmalloc-96]     kthread          
    ffffac5783f37f28: 0000000000000000 [kmalloc-64]     
    ffffac5783f37f38: 0000000000000000 0000000000000000 
    ffffac5783f37f48: 0000000000000000 ret_from_fork+34 
#12 [ffffac5783f37f50] ret_from_fork+34 at ffffffff93600242
    /usr/src/debug/kernel-4.18.0-240.22.1.el8_3/linux-4.18.0-240.22.1.el8_3.x86_64/arch/x86/entry/entry_64.S: 354

```

Comment 2 Eric Sandeen 2021-05-03 17:43:53 UTC

I put the vmcore on galvatron

# retrace-server-interact 568186200 crash

Comment 3 Ian Kent 2021-05-04 00:54:02 UTC

(In reply to Eric Sandeen from comment #2)
> I put the vmcore on galvatron
> 
> # retrace-server-interact 568186200 crash

This reads like there's an extra dput() but if we get this message
then those dentries shouldn't be in use. IIRC the place where this
message occurs is called during umount to dispose of the mount
dentry tree.

Once I get some time I'll see if I can track down the loopback
device related code and see if I can see anything obvious.

Without a reliable reproducer we are missing the actions leading
up to the problem occurring so we don't really have any leads to
point us to where to look ... unfortunately.

Comment 4 Ian Kent 2021-05-04 01:32:22 UTC

(In reply to Frantisek Sumsal from comment #0)
> Description of problem:
> After moving one of our systemd upstream CI jobs from CentOS 7 to CentOS 8,
> I noticed that sometimes the machines suddenly die without any indication of
> what went wrong. After further debugging, the issue appears to be a kernel
> panic during loopdev cleanup:

As we don't have a reproducer is it possible for us to look at the test(s)?

Even that is not really enough because, if this message occurs, the damage
has most likely been done at some (unknown) point before the back trace
triggers.

Comment 7 Frantisek Sumsal 2021-05-04 10:15:11 UTC

(In reply to Ian Kent from comment #4)
> (In reply to Frantisek Sumsal from comment #0)
> > Description of problem:
> > After moving one of our systemd upstream CI jobs from CentOS 7 to CentOS 8,
> > I noticed that sometimes the machines suddenly die without any indication of
> > what went wrong. After further debugging, the issue appears to be a kernel
> > panic during loopdev cleanup:
> 
> As we don't have a reproducer is it possible for us to look at the test(s)?
> 
> Even that is not really enough because, if this message occurs, the damage
> has most likely been done at some (unknown) point before the back trace
> triggers.

I agree that looking through the tests might not be enough. It's an entanglement of various
bash shenanigans, where the (sort of) main pinpoints are:

create an image: https://github.com/systemd/systemd/blob/d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L1000
mount it: https://github.com/systemd/systemd/blob/d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L1035
and the respective cleanup functions:
  * https://github.com/systemd/systemd/blob/d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L990
  * https://github.com/systemd/systemd/blob/d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L1051

Anyway, is there anything I could do to trace the culprit to make all this easier? I have a test/debug PR which I used
to get the vmcore in the first place, so if there's anything I can run along with the tests to gather more data, it
shouldn't be a problem.

Comment 8 Ian Kent 2021-05-04 10:47:47 UTC

(In reply to Frantisek Sumsal from comment #7)
> (In reply to Ian Kent from comment #4)
> > (In reply to Frantisek Sumsal from comment #0)
> > > Description of problem:
> > > After moving one of our systemd upstream CI jobs from CentOS 7 to CentOS 8,
> > > I noticed that sometimes the machines suddenly die without any indication of
> > > what went wrong. After further debugging, the issue appears to be a kernel
> > > panic during loopdev cleanup:
> > 
> > As we don't have a reproducer is it possible for us to look at the test(s)?
> > 
> > Even that is not really enough because, if this message occurs, the damage
> > has most likely been done at some (unknown) point before the back trace
> > triggers.
> 
> I agree that looking through the tests might not be enough. It's an
> entanglement of various
> bash shenanigans, where the (sort of) main pinpoints are:

It might help though.

> 
> create an image:
> https://github.com/systemd/systemd/blob/
> d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L1000
> mount it:
> https://github.com/systemd/systemd/blob/
> d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L1035
> and the respective cleanup functions:
>   *
> https://github.com/systemd/systemd/blob/
> d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L990
>   *
> https://github.com/systemd/systemd/blob/
> d7ffd4f334ea8b0a7621c754e91fe146dd597ba3/test/test-functions#L1051
> 
> Anyway, is there anything I could do to trace the culprit to make all this
> easier? I have a test/debug PR which I used
> to get the vmcore in the first place, so if there's anything I can run along
> with the tests to gather more data, it
> shouldn't be a problem.

Unfortunately, based on past experience with this type of problem, there's
not much that can be done.

It's a matter of going through changes between kernel versions manually,
and hope your suspicions about where it's happening are right, and hope
you don't miss it if it's there.

About all we can do is narrow down the search by using different kernel
revisions.

If we could setup a test system and run unaltered upstream kernels to
see if there's a regression, when it happened, and if there has been
a resolution, that would make things much more straight forward.

If there's no regression then it has to be something we've done which,
while tedious, also narrows the search.

Comment 9 Ian Kent 2021-05-04 10:50:48 UTC

(In reply to Ian Kent from comment #8)
> 
> About all we can do is narrow down the search by using different kernel
> revisions.
> 
> If we could setup a test system and run unaltered upstream kernels to
> see if there's a regression, when it happened, and if there has been
> a resolution, that would make things much more straight forward.

If we can use upstream kernels, built from a git checkout, and we do
find a regression then we can do a git bisect, also very tedious, but
gets you the git commit that broke it.

Comment 10 Ian Kent 2021-05-04 10:56:14 UTC

(In reply to Ian Kent from comment #9)
> (In reply to Ian Kent from comment #8)
> > 
> > About all we can do is narrow down the search by using different kernel
> > revisions.
> > 
> > If we could setup a test system and run unaltered upstream kernels to
> > see if there's a regression, when it happened, and if there has been
> > a resolution, that would make things much more straight forward.
> 
> If we can use upstream kernels, built from a git checkout, and we do
> find a regression then we can do a git bisect, also very tedious, but
> gets you the git commit that broke it.

It might be a little bit harder but we could use the RHEL source from
git for a bisect too ... I haven't done that before but it should work.
The principle is the same.

Comment 11 Ian Kent 2021-05-04 11:08:00 UTC

(In reply to Ian Kent from comment #10)
> (In reply to Ian Kent from comment #9)
> > (In reply to Ian Kent from comment #8)
> > > 
> > > About all we can do is narrow down the search by using different kernel
> > > revisions.
> > > 
> > > If we could setup a test system and run unaltered upstream kernels to
> > > see if there's a regression, when it happened, and if there has been
> > > a resolution, that would make things much more straight forward.
> > 
> > If we can use upstream kernels, built from a git checkout, and we do
> > find a regression then we can do a git bisect, also very tedious, but
> > gets you the git commit that broke it.
> 
> It might be a little bit harder but we could use the RHEL source from
> git for a bisect too ... I haven't done that before but it should work.
> The principle is the same.

Obvious first thing to do is check if it happens on 8.0 GA, then 8.1 etc.

Comment 12 Ian Kent 2021-05-04 11:11:28 UTC

(In reply to Ian Kent from comment #11)
> (In reply to Ian Kent from comment #10)
> > (In reply to Ian Kent from comment #9)
> > > (In reply to Ian Kent from comment #8)
> > > > 
> > > > About all we can do is narrow down the search by using different kernel
> > > > revisions.
> > > > 
> > > > If we could setup a test system and run unaltered upstream kernels to
> > > > see if there's a regression, when it happened, and if there has been
> > > > a resolution, that would make things much more straight forward.
> > > 
> > > If we can use upstream kernels, built from a git checkout, and we do
> > > find a regression then we can do a git bisect, also very tedious, but
> > > gets you the git commit that broke it.
> > 
> > It might be a little bit harder but we could use the RHEL source from
> > git for a bisect too ... I haven't done that before but it should work.
> > The principle is the same.
> 
> Obvious first thing to do is check if it happens on 8.0 GA, then 8.1 etc.

We should be able to use existing kernel packages for these but reproducing
it could take a while. There may be kernel enhancements that have been added
but for the specific functionality needed here we can probably get away with
it.

Comment 13 Frantisek Sumsal 2021-05-04 11:20:38 UTC

Thanks for all the hints, Ian! Right now I'm trying to minimize the reproducer to make it easier to debug further with other kernel packages. I'll post an update once I have some fresh news.

Comment 14 Frantisek Sumsal 2021-05-06 21:18:18 UTC

After several tries I managed to reproduce the issue with both kernel-4.18.0-80.el8 (RHEL 8.0 GA) and 4.18.0-80.11.2.el8_0 (latest RHEL 8.0, just to be sure). It indeed took a while (~2 hours in the first case and ~30 hours in the second one). Unfortunately, I can't provide vmcores for these two occurrences, since the machines won't come back after the panic with these kernels (and I don't have any other means to access them).


kernel-4.18.0-80.el8:
```
[    0.000000] Linux version 4.18.0-80.el8.x86_64 (mockbuild.centos.org) (gcc version 8.2.1 20180905 (Red Hat 8.2.1-3) (GCC)) #1 SMP Tue Jun 4 09:19:46 UTC 2019
[    0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-80.el8.x86_64 root=/dev/mapper/vg_n4-root ro crashkernel=auto resume=/dev/mapper/vg_n4-swap rd.lvm.lv=vg_n4/root rd.lvm.lv=vg_n4/swap biosdevname=0 net.ifnames=0 console=ttyS0,115200n8
[    0.000000] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with crng_init=0
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009cc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bef62fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bef63000-0x00000000bf162fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf163000-0x00000000bf5befff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf5bf000-0x00000000bf6befff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf6bf000-0x00000000bf7befff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf7bf000-0x00000000bf7fefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bf7ff000-0x00000000bf7fffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf800000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feb00000-0x00000000feb00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
...
[  473.261784] VFS: Busy inodes after unmount of dm-5. Self-destruct in 5 seconds.  Have a nice day...
[  473.262119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000210
```

4.18.0-80.11.2.el8_0:
```
[    0.000000] Linux version 4.18.0-80.11.2.el8_0.x86_64 (mockbuild.centos.org) (gcc version 8.2.1 20180905 (Red Hat 8.2.1-3) (GCC)) #1 SMP Tue Sep 24 11:32:19 UTC 2019
[    0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-80.11.2.el8_0.x86_64 root=/dev/mapper/vg_n29-root ro crashkernel=auto resume=/dev/mapper/vg_n29-swap rd.lvm.lv=vg_n29/root rd.lvm.lv=vg_n29/swap biosdevname=0 net.ifnames=0 console=ttyS0,115200n8
[    0.000000] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with crng_init=0
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009cc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bef62fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bef63000-0x00000000bf162fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf163000-0x00000000bf5befff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf5bf000-0x00000000bf6befff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf6bf000-0x00000000bf7befff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf7bf000-0x00000000bf7fefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bf7ff000-0x00000000bf7fffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf800000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feb00000-0x00000000feb00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
...
[  580.498205] VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds.  Have a nice day...
[  580.500162] BUG: unable to handle kernel NULL pointer dereference at 0000000000000210
```
```

Note that in the first case the failing device is "dm-5", but that's just the second loop partition (loopXp2) encrypted with LUKS in one of the tests.

Comment 15 Ian Kent 2021-05-07 02:08:21 UTC

(In reply to Frantisek Sumsal from comment #14)
> After several tries I managed to reproduce the issue with both
> kernel-4.18.0-80.el8 (RHEL 8.0 GA) and 4.18.0-80.11.2.el8_0 (latest RHEL
> 8.0, just to be sure). It indeed took a while (~2 hours in the first case
> and ~30 hours in the second one). Unfortunately, I can't provide vmcores for
> these two occurrences, since the machines won't come back after the panic
> with these kernels (and I don't have any other means to access them).
> 

Right, that's probably the worst outcome we could have because it implies
the problem was introduced between RHEL-7 and RHEL-8.

Continuing on the narrow down the kernel versions between where
it's broken we need to try upstream 4.18.0 and see if it's broken
in that kernel.

You will want a decent number of CPU's on the test machine so building
the kernel doesn't take ages, 16 isn't too bad.

I use a somewhat manual approach to building and testing upstream kernels
so I know things are put in the right place but if your familiar with your
welcome to what works for you.

First clone the Linux tree:
cd <some working directory>
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git upstream.git
cd upstream.git
git checkout v4.18
cp /boot/config-`uname -r` .config
make -j16
<hit return on any new or changed default prompts to get an updated config>

kernel builds ...

Become root and:
make modules_install

Note the version on the last line of the modules_install output.

cp arch/x86_64/boot/bzImage /boot/vmlinuz-<version>
cp System.map /boot/System.map-<version>

cd /boot
dracut initramfs-<version>.img

grubby --copy-default --add-kernel=/boot/vmlinuz-<version> --initrd=/boot/initramfs-<version>.img --title="Upstream <version>"

And here's the catch.

The kernel we added to the boot menu above isn't the default boot kernel.

So you either you need access to the boot menu or you need to set it as the default boot
kernel. The risk is if the kernel doesn't boot (in this case it should) you've lost your
machine unless you can't get to the boot menu.

Ideally, if your working with a VM, you would take a snapshot and revert it if the machine
doesn't boot.

Anyway to set this kernel as the default boot kernel:
grubby --set-default=/boot/vmlinuz-<version>
reboot and cross fingers.

Umm ... I often use quotes around those grubby parameters so keep that in mind if
you see a problem.

Also, you can look at list of boot kernels with:
grubby --info=ALL

Comment 16 Frantisek Sumsal 2021-05-07 18:45:38 UTC

Thanks a lot for the detailed guide, it's been a while since I built kernel from scratch,
so this definitely helped a lot. After some tinkering I was able to build & run the mainline
4.18 kernel:

```
dnf -y builddep kernel
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git upstream-kernel
pushd upstream-kernel
git checkout v4.18
make defconfig
cp "/boot/config-$(uname -r)" .config-local
# Get the SB cert (CONFIG_MODULE_SIG_KEY=certs/rhel.pem)
dnf download --source kernel
rpm -i kernel*.src.rpm
cp -v /root/rpmbuild/SOURCES/centos.pem certs/rhel.pem
sed -i 's/CONFIG_IOSCHED_BFQ=y/CONFIG_IOSCHED_BFQ=n/' .config-local
./scripts/kconfig/merge_config.sh .config .config-local
make -j16
make modules_install
cp -v arch/x86_64/boot/bzImage /boot/vmlinuz-4.18.0
chmod +x /boot/vmlinuz-4.18.0
cp -v System.map /boot/System.map-4.18.0
dracut /boot/initramfs-4.18.0.img
grubby --make-default --copy-default --add-kernel=/boot/vmlinuz-4.18.0 --initrd=/boot/initramfs-4.18.0.img --title="Upstream 4.18.0"
popd
```

I had to explicitly disable the BFQ scheduler, since it uncovered a potential bug in upstream
systemd, but that's something I'll have to bisect later.

For now I unleashed the "reproducer" on the mainline kernel, so I'll get back in touch once
there's something interesting to report.

Comment 17 Frantisek Sumsal 2021-05-10 19:02:17 UTC

I can confirm the issue is reproducible even with the ~vanilla 4.18 kernel:

```
[    0.000000] Linux version 4.18.0 (root.ci.centos.org) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Sun May 9 17:13:21 BST 2021
[    0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0 root=/dev/mapper/vg_n12-root ro crashkernel=auto resume=/dev/mapper/vg_n12-swap rd.lvm.lv=vg_n12/root rd.lvm.lv=vg_n12/swap biosdevname=0 net.ifnames=0 console=ttyS0,115200n8 user_namespace.enable=1 systemd.condition-needs-update=1 systemd.clock_usec=1620579552233381 debug
[    0.000000] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with crng_init=0
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009cc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bef62fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bef63000-0x00000000bf162fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf163000-0x00000000bf5befff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf5bf000-0x00000000bf6befff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf6bf000-0x00000000bf7befff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf7bf000-0x00000000bf7fefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bf7ff000-0x00000000bf7fffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf800000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feb00000-0x00000000feb00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
...
[ 4352.454153]  loop2: p1 p2
[ 4352.546624] EXT4-fs (loop2p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4353.417607] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4353.652129]  loop2: p1 p2
[ 4353.738828] EXT4-fs (loop2p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4354.449487] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4354.811482] systemd-journald[796]: Sent WATCHDOG=1 notification.
[ 4355.884760] EXT4-fs (loop3p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4356.872828] VFS: Busy inodes after unmount of loop3p1. Self-destruct in 5 seconds.  Have a nice day...
[ 4356.873755] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f0
```

```
[    0.000000] Linux version 4.18.0 (root.ci.centos.org) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Sun May 9 23:46:07 BST 2021
[    0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0 root=/dev/mapper/vg_n18-root ro crashkernel=auto resume=/dev/mapper/vg_n18-swap rd.lvm.lv=vg_n18/root rd.lvm.lv=vg_n18/swap biosdevname=0 net.ifnames=0 console=ttyS0,115200n8 user_namespace.enable=1 systemd.condition-needs-update=1 systemd.clock_usec=1620603112953426 debug
[    0.000000] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with crng_init=0
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009cc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bef62fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bef63000-0x00000000bf162fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf163000-0x00000000bf5befff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf5bf000-0x00000000bf6befff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bf6bf000-0x00000000bf7befff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bf7bf000-0x00000000bf7fefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bf7ff000-0x00000000bf7fffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bf800000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feb00000-0x00000000feb00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
...
[ 4155.297055]  loop3: p1 p2
[ 4155.371571] EXT4-fs (loop3p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4155.624045]  loop3: p1 p2
[ 4155.698384] EXT4-fs (loop3p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4171.861834] EXT4-fs (loop2p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4172.101375] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4172.328740] EXT4-fs (loop2p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4172.627616] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[ 4174.914588] systemd-journald[798]: Sent WATCHDOG=1 notification.
[ 4175.362209] VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds.  Have a nice day...
[ 4175.364056] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f0
```

I managed to reproduce it twice so far (after ~36 hours and ~6 hours). Right now it's running for the third time,
just to get the vmcore and full stack trace, just to be completely sure (I forgot to set `kernel.panic` to > 0
in both cases, so I lost the machines along with the respective cores, *sigh*).

Comment 18 Ian Kent 2021-05-11 02:06:12 UTC

(In reply to Frantisek Sumsal from comment #17)
> I can confirm the issue is reproducible even with the ~vanilla 4.18 kernel:

So we it sounds like an upstream regression.
At least we know that.

snip ...

> [ 4174.914588] systemd-journald[798]: Sent WATCHDOG=1 notification.
> [ 4175.362209] VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5
> seconds.  Have a nice day...
> [ 4175.364056] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000001f0
> ```
> 
> I managed to reproduce it twice so far (after ~36 hours and ~6 hours). Right
> now it's running for the third time,
> just to get the vmcore and full stack trace, just to be completely sure (I
> forgot to set `kernel.panic` to > 0
> in both cases, so I lost the machines along with the respective cores,
> *sigh*).

A vmcore may be useful to confirm it's a reference count bug but
it will have happened at some earlier time so we still need to do
a bisect. Which means finding adjacent kernel versions where it
works and where it doesn't for that process.

There's also knowing what processes are active and might be
interacting poorly with one another.

I'd start with an upstream kernel about half way between the RHEL-7
base version and 4.18 and go back further if it fails or forward if
it doesn't just like what the git bisect does.

Comment 19 Frantisek Sumsal 2021-05-12 11:00:58 UTC

(In reply to Ian Kent from comment #18) 
> I'd start with an upstream kernel about half way between the RHEL-7
> base version and 4.18 and go back further if it fails or forward if
> it doesn't just like what the git bisect does.

Will do. The first issue I've encountered right away is that the older
kernels are not compatible with gcc-8 (e.g. 3.10 compiles only with
gcc-{3,4} [0], 4.0 with gcc-{3,4,5} [1], etc.).

To work around this I spent couple of hours hunting down and rewriting
patches for gcc-4 so it compiles with gcc-8 [3], hopefully it'll build
this time (it did at least locally). I'm not sure if there's an easier
way I'm missing, but I suspect the "fix" is included in 4.2 with
the `compiler-gcc*.h` merge commit [4].


[0]
https://github.com/torvalds/linux/blob/8bb495e3f02401ee6f76d1b1d77f3ac9f079e376/include/linux/compiler-gcc3.h
https://github.com/torvalds/linux/blob/8bb495e3f02401ee6f76d1b1d77f3ac9f079e376/include/linux/compiler-gcc4.h 

[1]
https://github.com/torvalds/linux/blob/39a8804455fb23f09157341d3ba7db6d7ae6ee76/include/linux/compiler-gcc3.h
https://github.com/torvalds/linux/blob/39a8804455fb23f09157341d3ba7db6d7ae6ee76/include/linux/compiler-gcc4.h
https://github.com/torvalds/linux/blob/39a8804455fb23f09157341d3ba7db6d7ae6ee76/include/linux/compiler-gcc5.h

[3] https://copr.fedorainfracloud.org/coprs/mrc0mmand/centos8-kernel-debug/build/2182361/

[4] https://github.com/torvalds/linux/commit/cb984d101b30eb7478d32df56a0023e4603cba7f

Comment 20 Ian Kent 2021-05-12 13:01:27 UTC

(In reply to Frantisek Sumsal from comment #19)
> (In reply to Ian Kent from comment #18) 
> > I'd start with an upstream kernel about half way between the RHEL-7
> > base version and 4.18 and go back further if it fails or forward if
> > it doesn't just like what the git bisect does.
> 
> Will do. The first issue I've encountered right away is that the older
> kernels are not compatible with gcc-8 (e.g. 3.10 compiles only with
> gcc-{3,4} [0], 4.0 with gcc-{3,4,5} [1], etc.).
> 
> To work around this I spent couple of hours hunting down and rewriting
> patches for gcc-4 so it compiles with gcc-8 [3], hopefully it'll build
> this time (it did at least locally). I'm not sure if there's an easier
> way I'm missing, but I suspect the "fix" is included in 4.2 with
> the `compiler-gcc*.h` merge commit [4].

Ouch!

Since it appears to be an upstream regression we should be able to
reproduce it on RHEL-7. What we need now is where it happened.

It will get tricky if we cross that kernel version during the testing
but hopefully we can identify a working and not working kernel versions
on the same RHEL release.

Might save a bit of time trying this unless you've already sorted it
out.

Umm ... unless it's a compiler problem ...

Comment 21 Frantisek Sumsal 2021-05-20 14:35:03 UTC

Apologies for the radio silence, I've been dealing with some RHEL stuff and
also I can't seem to boot any kernel <4.1 even though it compiles properly.

When trying to boot the kernel either in a VM or on a bare metal, the screen
is just blank, with no output. After adding `earlyprintk=serial,keep` it at
least shows:

```
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
```

but that's all. After failing to get anything else from the kernel using various
other debug/tracing-related kernel parameters, I tried to dig directly into
the memory via QEMU:

```
# grep __log_buf /boot/System.map-3.10 
ffffffff81d628dc b __log_buf
## (in QEMU console) memsave 0xffffffff81d628dc 1024 mem.dump

# sed -r 's/\x0+/\n/g' /root/systemd/test/TEST-01-BASIC/mem.dump

D
1
�CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0 #2
T
D
� ffffffff81b3f120 ffffffff81a01f08 ffffffff815bcd0a ffffffff81a01fe8
T
D
� ffffffff81b3f176 000000000000000a 000000000000000d 00000000000003fd
T
D
� 0000000001a11000 ffffffff81cde280 ffff880000014480 ffffffff81a01fb0



�Call Trace:
D
3
� [<ffffffff81b3f120>] ? early_idt_handler+0x30/0xc0
<
*
� [<ffffffff815bcd0a>] dump_stack+0x19/0x1b
D
1
� [<ffffffff81b3f176>] early_idt_handler+0x86/0xc0
@
/
� [<ffffffff81b3f231>] ? copy_bootdata+0x13/0xa2
H
6
� [<ffffffff81b3f54e>] ? x86_64_start_kernel+0xb5/0x100
(

�RIP 0xffff880000014480
```

I'm really not sure if I'm missing something obvious or there's another issue
somewhere else...

Comment 22 Ian Kent 2021-05-21 11:08:27 UTC

(In reply to Frantisek Sumsal from comment #21)
> Apologies for the radio silence, I've been dealing with some RHEL stuff and
> also I can't seem to boot any kernel <4.1 even though it compiles properly.

We keep getting the worst outcome we can get ...

Could be a code generation problem, gcc.
If it's RHEL-8 could be kernel features needed no longer present in earlier
kernel.

All that's really needed is a kernel revision where it works and one where
it doesn't.

If your using RHEL-8 you could try RHEL-7 or perhaps just RHEL-7 gcc and
friends.

Comment 23 Frantisek Sumsal 2021-05-26 16:41:22 UTC

After some tinkering I managed to get both 3.10 and 4.18 (upstream)
up and running on CentOS 7, so now I'll try to reproduce the issue
with 4.18 to see whether it still reproduces. I'll get back once I
have something interesting to report.

Comment 24 Frantisek Sumsal 2021-05-27 08:49:33 UTC

So, the good thing is I can reproduce the issue on CentOS 7 with upstream kernel
(4.18):

03:44:50 [ 5139.724637] EXT4-fs (loop2p1): mounted filesystem with ordered data mode. Opts: (null)
03:44:51 [ 5139.956414] EXT4-fs (loop3p1): mounted filesystem with ordered data mode. Opts: (null)
03:44:51 [ 5140.256999] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
03:44:51 [ 5140.789989] VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds.  Have a nice day...
03:44:51 [ 5140.790595] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f0

Hopefully, now the bisect won't be an issue.

Comment 25 Frantisek Sumsal 2021-06-21 10:25:00 UTC

My previous claim that I managed to compile & run upstream 3.10 was actually a lie, I somehow
got it mixed up with other kernels. Even on CentOS 7 upstream 3.10 won't boot with a similar
issue as previously.

However, I got once again sidetracked by RHEL stuff, and the CentOS CI machine pool was upgraded
to CentOS 8.4 GA in the meantime. Just out of curiosity I started a new reproducer job there last
week, and to my surprise it's still running - almost 6 days and dozens of machines later it hasn't
encountered yet a single kernel panic (now that I said it I definitely jinxed it).

So, I guess we can close this one as "CURRENTRELEASE". In case it resurfaces again (I hope it
won't), I'll reopen it with possibly some more information.

Thanks for all the help!

Comment 26 Ian Kent 2021-06-21 10:29:24 UTC

(In reply to Frantisek Sumsal from comment #25)
> My previous claim that I managed to compile & run upstream 3.10 was actually
> a lie, I somehow
> got it mixed up with other kernels. Even on CentOS 7 upstream 3.10 won't
> boot with a similar
> issue as previously.
> 
> However, I got once again sidetracked by RHEL stuff, and the CentOS CI
> machine pool was upgraded
> to CentOS 8.4 GA in the meantime. Just out of curiosity I started a new
> reproducer job there last
> week, and to my surprise it's still running - almost 6 days and dozens of
> machines later it hasn't
> encountered yet a single kernel panic (now that I said it I definitely
> jinxed it).
> 
> So, I guess we can close this one as "CURRENTRELEASE". In case it resurfaces
> again (I hope it
> won't), I'll reopen it with possibly some more information.

Sounds good, hopefully all will remain ok.

Ian

Comment 27 Frantisek Sumsal 2021-07-30 15:15:25 UTC

*sigh*, unfortunately, I was just extremely lucky.

In the past couple of weeks I noticed several randomly aborted jobs, so I configured kdump on the CI machines and implemented a small hook to grab & store anything from /var/crash should the machine die. And today it indeed grabbed something:

```
[  431.306308] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[  431.658379] EXT4-fs (loop1p1): mounted filesystem with ordered data mode. Opts: (null)
[  433.090165] EXT4-fs (loop1p1): mounted filesystem with ordered data mode. Opts: (null)
[  433.728277] proc: Bad value for 'hidepid'
[  438.768729] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null)
[  438.996278]  loop1: p1 p2
[  439.088861] EXT4-fs (loop1p1): mounted filesystem with ordered data mode. Opts: (null)
[  439.354820]  loop1: p1 p2
[  439.447872] EXT4-fs (loop1p1): mounted filesystem with ordered data mode. Opts: (null)
[  439.649286] VFS: Busy inodes after unmount of loop0p1. Self-destruct in 5 seconds.  Have a nice day...
[  439.650258] BUG: unable to handle kernel NULL pointer dereference at 0000000000000210
[  439.666478] PGD 0 P4D 0 
[  439.669026] Oops: 0000 [#1] SMP NOPTI
[  439.672706] CPU: 4 PID: 1501 Comm: kworker/4:3 Kdump: loaded Tainted: G                 ---------r-  - 4.18.0-305.10.2.el8_4.x86_64 #1
[  439.684828] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
[  439.694693] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
[  439.700756] RIP: 0010:ext4_evict_inode+0x7e/0x670 [ext4]
[  439.706087] Code: 02 00 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 0f 85 fe 05 00 00 48 83 c4 10 5b 5d 41 5c c3 48 8b 43 28 48 8b 90 00 04 00 00 <48> 8b ba 10 02 00 00 48 85 ff 74 b5 0f b7 03 89 c1 66 81 e1 00 f0
[  439.724908] RSP: 0018:ffffb5b5044ebdb0 EFLAGS: 00010216
[  439.730150] RAX: ffff8b2a36cbb000 RBX: ffff8b28d7722a40 RCX: 0000000000000002
[  439.737309] RDX: 0000000000000000 RSI: 000000000000002f RDI: ffff8b28d7722a40
[  439.744467] RBP: ffff8b28d7722b60 R08: ffff8b2a5fd297e0 R09: 0000000000000000
[  439.751624] R10: 000000dd00c54d58 R11: 0000000000000000 R12: ffffffffc0956780
[  439.758779] R13: ffff8b2945860c00 R14: ffff8b2945860c88 R15: ffff8b28cd708888
[  439.765937] FS:  0000000000000000(0000) GS:ffff8b2a5fd00000(0000) knlGS:0000000000000000
[  439.774051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  439.779815] CR2: 0000000000000210 CR3: 000000037ac10000 CR4: 00000000000406e0
[  439.786971] Call Trace:
[  439.789435]  evict+0xd2/0x1a0
[  439.792416]  inode_switch_wbs_work_fn+0x378/0x660
[  439.797139]  process_one_work+0x1a7/0x360
[  439.801166]  ? create_worker+0x1a0/0x1a0
[  439.805105]  worker_thread+0x30/0x390
[  439.808782]  ? create_worker+0x1a0/0x1a0
[  439.812721]  kthread+0x116/0x130
[  439.815964]  ? kthread_flush_work_fn+0x10/0x10
[  439.820425]  ret_from_fork+0x22/0x40
[  439.824014] Modules linked in: vrf veth vcan tun ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel bridge ip_gre ip_tunnel bonding gre stp llc dummy xt_nat xt_addrtype ipt_MASQUERADE iptable_nat nft_fib_ipv6 nft_masq nft_nat nft_fib_ipv4 nft_fib nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink dm_crypt loop ext4 mbcache jbd2 amd64_edac_mod edac_mce_amd kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul sp5100_tco k10temp pcspkr ghash_clmulni_intel fam15h_power i2c_piix4 acpi_cpufreq drm fuse ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg ahci libahci crc32c_intel libata e1000 dm_multipath dm_mirror dm_region_hash dm_log dm_mod
[  439.884909] Features: eBPF/cgroup
[  439.888241] CR2: 0000000000000210
```

Full kdump: https://mrc0mmand.fedorapeople.org/2021-07-30_4.18.0-305.10.2.el8_4_kdump.zip

Given the previous discussion I understand this won't help much (as I still don't have a reliable reproducer), but I'll leave it there for future reference.

Comment 29 Frantisek Sumsal 2021-08-04 20:04:56 UTC

Today's kdump: https://mrc0mmand.fedorapeople.org/2021-08-04_4.18.0-305.10.2.el8_4_kdump.zip

@ikent do the additional kdumps help in any way? (not sure if I should keep uploading them or if it's a waste of time).

Comment 30 Eric Sandeen 2021-08-04 20:14:41 UTC

(In reply to Frantisek Sumsal from comment #29)
> Today's kdump:
> https://mrc0mmand.fedorapeople.org/2021-08-04_4.18.0-305.10.2.el8_4_kdump.zip
> 
> @ikent do the additional kdumps help in any way? (not sure if I
> should keep uploading them or if it's a waste of time).

Just to be sure we're on the page, can you explain exactly how you're running the "test-functions" test, assuming that's the one that is triggering the error?

Comment 31 Frantisek Sumsal 2021-08-04 20:26:07 UTC

(In reply to Eric Sandeen from comment #30)
> (In reply to Frantisek Sumsal from comment #29)
> > Today's kdump:
> > https://mrc0mmand.fedorapeople.org/2021-08-04_4.18.0-305.10.2.el8_4_kdump.zip
> > 
> > @ikent do the additional kdumps help in any way? (not sure if I
> > should keep uploading them or if it's a waste of time).
> 
> Just to be sure we're on the page, can you explain exactly how you're
> running the "test-functions" test, assuming that's the one that is
> triggering the error?

Once you have systemd compiled locally, it all boils down to `make -C test/TEST-XX-* clean setup run` (for example `make -C test/TEST-01-BASIC clean setup run`).

where `clean` just makes sure there are no left-over artifacts from a possible previous run (and unmounts/detaches any left over mounts/loop devices if any), `setup` either copies or creates a new image, mounts it via a loop device, tweaks it to test's needs, unmounts it, and detaches the respective loop device, and finally `run`, which either runs the test image directly in QEMU or mounts the test image (again, via a loop dev), runs an nspawn container on the mounted image, and eventually umounts the mount and detaches the loop device.

So it all involves multiple `losetup --show -P -f xxx.img` and `losetup -d /dev/loopX` calls, usually in parallel, as we run several such tests in parallel at once (in this case).

Comment 32 Ian Kent 2021-08-05 00:42:15 UTC

(In reply to Frantisek Sumsal from comment #29)
> Today's kdump:
> https://mrc0mmand.fedorapeople.org/2021-08-04_4.18.0-305.10.2.el8_4_kdump.zip
> 
> @ikent do the additional kdumps help in any way? (not sure if I
> should keep uploading them or if it's a waste of time).

We might need one at some point but the whole problem is we don't
know where to look. That kernel message says "something bad happened
at some earlier time, I'm cleaning up".

So I don't think there's a need for saving kernel dumps but we may
need one later if we get a lead on where to look.

Comment 36 RHEL Program Management 2022-01-30 07:27:01 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 37 Frantisek Sumsal 2022-01-30 17:27:16 UTC

Reopening, since it's still happening.

Comment 40 RHEL Program Management 2022-07-30 07:27:36 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Note You need to log in before you can comment on or make changes to this bug.