Bug 2216972 - kernel BUG at lib/list_debug.c:30!
Summary: kernel BUG at lib/list_debug.c:30!
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Phil Sutter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-23 13:27 UTC by Anthony Messina
Modified: 2023-07-20 23:20 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-20 23:20:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
nftables list ruleset output on first affected machine (11.47 KB, text/plain)
2023-07-06 12:32 UTC, Anthony Messina
no flags Details
/etc/firewalld and /lib/firewalld contents (22.50 KB, application/x-xz)
2023-07-07 01:42 UTC, Anthony Messina
no flags Details

Description Anthony Messina 2023-06-23 13:27:49 UTC
Upon reboot after upgrading to  testing kernel-6.3.9-200.fc38:

[    5.820767] list_add corruption. prev->next should be next (ffff934e00f9da10), but was ffff934e01aaa5a8. (prev=ffff934e266104e8).
[    5.820781] ------------[ cut here ]------------
[    5.820782] kernel BUG at lib/list_debug.c:30!
[    5.820786] fbcon: Taking over console
[    5.820791] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    5.820795] CPU: 2 PID: 601 Comm: firewalld Not tainted 6.3.9-200.fc38.x86_64 #1
[    5.820800] Hardware name: Intel Corporation NUC7i7BNH/NUC7i7BNB, BIOS BNKBL357.86A.0088.2022.0125.1102 01/25/2022
[    5.820804] RIP: 0010:__list_add_valid+0x78/0xa0
[    5.820810] Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 f0 d7 92 9e e8 df 9c 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 48 d8 92 9e e8 c8 9c 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 a0 d8 92 9e e8 b1 9c 99
[    5.820817] RSP: 0018:ffffa7b040a93730 EFLAGS: 00010246
[    5.820821] RAX: 0000000000000075 RBX: ffff934e00f9da00 RCX: 0000000000000000
[    5.820825] RDX: 0000000000000000 RSI: ffff93555ed21540 RDI: ffff93555ed21540
[    5.820828] RBP: ffff934e04c18ba8 R08: 0000000000000000 R09: ffffa7b040a935d8
[    5.820832] R10: 0000000000000003 R11: ffffffff9f146108 R12: ffff934e266104e8
[    5.820835] R13: ffff934e00f9da10 R14: ffffa7b040a93800 R15: ffff934e26568000
[    5.820838] FS:  00007f65d53d6740(0000) GS:ffff93555ed00000(0000) knlGS:0000000000000000
[    5.820843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.820846] CR2: 00007fa21b6d9f78 CR3: 0000000108aa2006 CR4: 00000000003706e0
[    5.820850] Call Trace:
[    5.820853]  <TASK>
[    5.820855]  ? die+0x36/0x90
[    5.820860]  ? do_trap+0xda/0x100
[    5.820864]  ? __list_add_valid+0x78/0xa0
[    5.820868]  ? do_error_trap+0x6a/0x90
[    5.820872]  ? __list_add_valid+0x78/0xa0
[    5.820875]  ? exc_invalid_op+0x50/0x70
[    5.820880]  ? __list_add_valid+0x78/0xa0
[    5.820884]  ? asm_exc_invalid_op+0x1a/0x20
[    5.820891]  ? __list_add_valid+0x78/0xa0
[    5.820894]  nf_tables_bind_set+0x107/0x1e0 [nf_tables]
[    5.820914]  nft_lookup_init+0xd3/0x140 [nf_tables]
[    5.820932]  nf_tables_newrule+0x493/0xaf0 [nf_tables]
[    5.820951]  nfnetlink_rcv_batch+0x7ef/0x970 [nfnetlink]
[    5.820965]  nfnetlink_rcv+0x179/0x1a0 [nfnetlink]
[    5.820973]  netlink_unicast+0x19e/0x290
[    5.820978]  netlink_sendmsg+0x254/0x4d0
[    5.820984]  sock_sendmsg+0x93/0xa0
[    5.820989]  ____sys_sendmsg+0x270/0x300
[    5.820993]  ? copy_msghdr_from_user+0x7d/0xc0
[    5.820998]  ___sys_sendmsg+0x9a/0xe0
[    5.821006]  __sys_sendmsg+0x7a/0xd0
[    5.821011]  do_syscall_64+0x5d/0x90
[    5.821016]  ? sock_getsockopt+0x22/0x30
[    5.821020]  ? __sys_getsockopt+0x195/0x1c0
[    5.821024]  ? syscall_exit_to_user_mode+0x1b/0x40
[    5.821028]  ? do_syscall_64+0x6c/0x90
[    5.821032]  ? exc_page_fault+0x7c/0x180
[    5.821036]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[    5.821041] RIP: 0033:0x7f65d4d368b4
[    5.821056] Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
[    5.821063] RSP: 002b:00007ffde317ffb8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[    5.821069] RAX: ffffffffffffffda RBX: 000000000004dc00 RCX: 00007f65d4d368b4
[    5.821072] RDX: 0000000000000000 RSI: 00007ffde31910e0 RDI: 0000000000000006
[    5.821076] RBP: 00007ffde3191230 R08: 0000000000000004 R09: 0000000000000000
[    5.821080] R10: 00007ffde317ff8c R11: 0000000000000202 R12: 0000000000400000
[    5.821083] R13: 00007ffde3191250 R14: 00007ffde317ffd0 R15: 00007ffde317ffc0
[    5.821089]  </TASK>
[    5.821091] Modules linked in: nf_log_syslog nft_log nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 macvlan ip_set cfg80211 rfkill nf_tables nfnetlink snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_bus snd_soc_avs snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda snd_hda_codec_hdmi snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core intel_rapl_msr intel_rapl_common intel_tcc_cooling snd_hda_codec_realtek x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_generic snd_compress ledtrig_audio vfat fat ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm snd_hda_codec iTCO_wdt mei_hdcp ee1004 mei_pxp intel_pmc_bxt iTCO_vendor_support snd_hda_core irqbypass rapl
[    5.821143]  intel_cstate snd_hwdep snd_pcsp snd_pcm intel_uncore intel_wmi_thunderbolt i2c_i801 mei_me snd_timer wmi_bmof e1000e i2c_smbus mei snd soundcore intel_pch_thermal intel_xhci_usb_role_switch acpi_pad auth_rpcgss sunrpc fuse tun loop zram i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_buddy crct10dif_pclmul crc32_pclmul drm_display_helper crc32c_intel polyval_clmulni polyval_generic cec rtsx_pci ghash_clmulni_intel sha512_ssse3 ttm video wmi
[    5.821209] ---[ end trace 0000000000000000 ]---
[    5.821212] RIP: 0010:__list_add_valid+0x78/0xa0
[    5.821217] Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 f0 d7 92 9e e8 df 9c 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 48 d8 92 9e e8 c8 9c 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 a0 d8 92 9e e8 b1 9c 99
[    5.821224] RSP: 0018:ffffa7b040a93730 EFLAGS: 00010246
[    5.821228] RAX: 0000000000000075 RBX: ffff934e00f9da00 RCX: 0000000000000000
[    5.821232] RDX: 0000000000000000 RSI: ffff93555ed21540 RDI: ffff93555ed21540
[    5.821235] RBP: ffff934e04c18ba8 R08: 0000000000000000 R09: ffffa7b040a935d8
[    5.821239] R10: 0000000000000003 R11: ffffffff9f146108 R12: ffff934e266104e8
[    5.821242] R13: ffff934e00f9da10 R14: ffffa7b040a93800 R15: ffff934e26568000
[    5.821246] FS:  00007f65d53d6740(0000) GS:ffff93555ed00000(0000) knlGS:0000000000000000
[    5.821250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.821254] CR2: 00007fa21b6d9f78 CR3: 0000000108aa2006 CR4: 00000000003706e0

Reproducible: Always




Afterward, unable to interact with nftables firewall using nft command.

Comment 1 Anthony Messina 2023-06-27 20:21:22 UTC
Also found this issue on a ThinPad X1 carbon.  I use nf tables sets (formerly ipsets) via firewalld, which may be related.


list_add corruption. prev->next should be next (ffff99cd41c78a10), but was ffff99cd5f2001e8. (prev=ffff99cd5f200a28).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:30!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 1217 Comm: firewalld Not tainted 6.3.9-200.fc38.x86_64 #1
Hardware name: LENOVO 20KHCTO1WW/20KHCTO1WW, BIOS N23ET84W (1.59 ) 01/03/2023
RIP: 0010:__list_add_valid+0x78/0xa0
Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 f0 d7 92 9a e8 df 9c 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 48 d8 92 9a e8 c8 9c 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 a0 d8 92 9a e8 b1 9c 99
RSP: 0018:ffffbda3c120f740 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff99cd41c78a00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff99d0d16e1540 RDI: ffff99d0d16e1540
RBP: ffff99cd4a6662a8 R08: 0000000000000000 R09: ffffbda3c120f5e8
R10: 0000000000000003 R11: ffffffff9b146108 R12: ffff99cd5f200a28
R13: ffff99cd41c78a10 R14: ffffbda3c120f810 R15: ffff99cd74fd8000
FS:  00007f295c63a740(0000) GS:ffff99d0d16c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f201674ff78 CR3: 000000011f1d4004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? die+0x36/0x90
 ? do_trap+0xda/0x100
 ? __list_add_valid+0x78/0xa0
 ? do_error_trap+0x6a/0x90
 ? __list_add_valid+0x78/0xa0
 ? exc_invalid_op+0x50/0x70
 ? __list_add_valid+0x78/0xa0
 ? asm_exc_invalid_op+0x1a/0x20
 ? __list_add_valid+0x78/0xa0
 nf_tables_bind_set+0x107/0x1e0 [nf_tables]
 nft_lookup_init+0xd3/0x140 [nf_tables]
 nf_tables_newrule+0x493/0xaf0 [nf_tables]
 nfnetlink_rcv_batch+0x7ef/0x970 [nfnetlink]
 nfnetlink_rcv+0x179/0x1a0 [nfnetlink]
 netlink_unicast+0x19e/0x290
 netlink_sendmsg+0x254/0x4d0
 sock_sendmsg+0x93/0xa0
 ____sys_sendmsg+0x270/0x300
 ? copy_msghdr_from_user+0x7d/0xc0
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x5d/0x90
 ? __sys_setsockopt+0xf2/0x1d0
 ? syscall_exit_to_user_mode+0x1b/0x40
 ? do_syscall_64+0x6c/0x90
 ? syscall_exit_to_user_mode+0x1b/0x40
 ? do_syscall_64+0x6c/0x90
 ? exc_page_fault+0x7c/0x180
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f295bf368b4
Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:00007ffe83010828 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000004d800 RCX: 00007f295bf368b4
RDX: 0000000000000000 RSI: 00007ffe83021950 RDI: 0000000000000006
RBP: 00007ffe83021aa0 R08: 0000000000000004 R09: 0000000000000000
R10: 00007ffe830107fc R11: 0000000000000202 R12: 0000000000400000
R13: 00007ffe83021ac0 R14: 00007ffe83010840 R15: 00007ffe83010830
 </TASK>
Modules linked in: nf_log_syslog nft_log nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink bnep snd_hda_codec_hdmi rmi_smbus rmi_core snd_sof_pci_intel_skl snd_sof>
 uvcvideo rapl iwlwifi snd_hwdep intel_cstate uvc btusb videobuf2_vmalloc snd_seq btrtl videobuf2_memops btbcm snd_seq_device videobuf2_v4l2 btintel btmtk videobuf2_common think_lmi firmware_attributes_class cfg80211 intel_wmi_thunderbolt intel_uncore videodev snd_pcm bluetooth pcspkr snd_timer mc wmi_bmof i2c_i80>
 scsi_dh_emc scsi_dh_alua fuse dm_multipath
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid+0x78/0xa0
Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 f0 d7 92 9a e8 df 9c 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 48 d8 92 9a e8 c8 9c 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 a0 d8 92 9a e8 b1 9c 99
RSP: 0018:ffffbda3c120f740 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff99cd41c78a00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff99d0d16e1540 RDI: ffff99d0d16e1540
RBP: ffff99cd4a6662a8 R08: 0000000000000000 R09: ffffbda3c120f5e8
R10: 0000000000000003 R11: ffffffff9b146108 R12: ffff99cd5f200a28
R13: ffff99cd41c78a10 R14: ffffbda3c120f810 R15: ffff99cd74fd8000
FS:  00007f295c63a740(0000) GS:ffff99d0d16c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f201674ff78 CR3: 000000011f1d4004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Comment 2 Anthony Messina 2023-06-29 10:54:07 UTC
Same issue on another NUC running kernel-6.3.10-200.fc38.x86_64.

list_add corruption. prev->next should be next (ffff9e5e867e2210), but was 0000000000000000. (prev=ffff9e5ed17a9128).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:30!
fbcon: Taking over console
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1330 Comm: firewalld Not tainted 6.3.10-200.fc38.x86_64 #1
Hardware name:  /NUC7i7BNB, BIOS BNKBL357.86A.0091.2023.0308.1014 03/08/2023
RIP: 0010:__list_add_valid+0x78/0xa0
Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 90 d8 92 9f e8 bf 99 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 e8 d8 92 9f e8 a8 99 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 40 d9 92 9f e8 91 99 99
RSP: 0000:ffffb9a680e07720 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff9e5e867e2200 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9e65deca1540 RDI: ffff9e65deca1540
RBP: ffff9e5e864372a8 R08: 0000000000000000 R09: ffffb9a680e075c8
R10: 0000000000000003 R11: ffffffffa0146108 R12: ffff9e5ed17a9128
R13: ffff9e5e867e2210 R14: ffffb9a680e077f0 R15: ffff9e5ea7698000
FS:  00007f0054f89740(0000) GS:ffff9e65dec80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000558aa2884068 CR3: 0000000121316003 CR4: 00000000003706e0
Call Trace:
 <TASK>
 ? die+0x36/0x90
 ? do_trap+0xda/0x100
 ? __list_add_valid+0x78/0xa0
 ? do_error_trap+0x6a/0x90
 ? __list_add_valid+0x78/0xa0
 ? exc_invalid_op+0x50/0x70
 ? __list_add_valid+0x78/0xa0
 ? asm_exc_invalid_op+0x1a/0x20
 ? __list_add_valid+0x78/0xa0
 nf_tables_bind_set+0x103/0x180 [nf_tables]
 nft_lookup_init+0xd3/0x140 [nf_tables]
 nf_tables_newrule+0x493/0xb90 [nf_tables]
 nfnetlink_rcv_batch+0x7ef/0x970 [nfnetlink]
 nfnetlink_rcv+0x179/0x1a0 [nfnetlink]
 netlink_unicast+0x19e/0x290
 netlink_sendmsg+0x254/0x4d0
 sock_sendmsg+0x93/0xa0
 ____sys_sendmsg+0x270/0x300
 ? copy_msghdr_from_user+0x7d/0xc0
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x5d/0x90
 ? syscall_exit_to_user_mode+0x1b/0x40
 ? do_syscall_64+0x6c/0x90
 ? syscall_exit_to_user_mode+0x1b/0x40
 ? do_syscall_64+0x6c/0x90
 ? do_user_addr_fault+0x1e0/0x720
 ? exc_page_fault+0x7c/0x180
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f00549368b4
Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:00007ffddabebbb8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000004ec00 RCX: 00007f00549368b4
RDX: 0000000000000000 RSI: 00007ffddabfcce0 RDI: 0000000000000006
RBP: 00007ffddabfce30 R08: 0000000000000004 R09: 0000000000000000
R10: 00007ffddabebb8c R11: 0000000000000202 R12: 0000000000400000
R13: 00007ffddabfce50 R14: 00007ffddabebbd0 R15: 00007ffddabebbc0
 </TASK>
Modules linked in: nf_log_syslog nft_log nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 macvlan ip_set cfg80211 nf_tables nfnetlink bnep snd_sof_pci_intel_skl snd_sof_intel_hda_common sou>
 btbcm btintel snd_seq_device btmtk kvm bluetooth irqbypass rapl wmi_bmof snd_pcm intel_cstate rfkill vfat snd_timer intel_uncore fat i2c_i801 intel_wmi_thunderbolt snd mei_me i2c_smbus pcspkr mei soundcore intel_pch_thermal intel_xhci_usb_role_switch acpi_pad auth_rpcgss sunrpc loop zram dm_crypt i915 rtsx_pci_sd>
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid+0x78/0xa0
Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 90 d8 92 9f e8 bf 99 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 e8 d8 92 9f e8 a8 99 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 40 d9 92 9f e8 91 99 99
RSP: 0000:ffffb9a680e07720 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff9e5e867e2200 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9e65deca1540 RDI: ffff9e65deca1540
RBP: ffff9e5e864372a8 R08: 0000000000000000 R09: ffffb9a680e075c8
R10: 0000000000000003 R11: ffffffffa0146108 R12: ffff9e5ed17a9128
R13: ffff9e5e867e2210 R14: ffffb9a680e077f0 R15: ffff9e5ea7698000
FS:  00007f0054f89740(0000) GS:ffff9e65dec80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000558aa2884068 CR3: 0000000121316003 CR4: 00000000003706e0

Comment 3 Anthony Messina 2023-06-29 11:47:03 UTC
I use multiple IPv4 and IPv6 sets in my nftables rulesets.  This seems to be triggered by firewalld loading those ipsets (nf tables sets) during the start of the firewalld service at bootup.  When I comment the ipsets out for the first startup of firewalld at boot, then after bootre-add them to the firewalld rules and restart firewalld, I don't immediately see this issue.

Comment 4 Anthony Messina 2023-07-03 13:56:35 UTC
This issue continues with kernel-6.3.11-200.fc38.x86_64

list_add corruption. prev->next should be next (ffff90408641ca10), but was 0000000100000001. (prev=ffff904084b61c68).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:30!
fbcon: Taking over console
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 603 Comm: firewalld Not tainted 6.3.11-200.fc38.x86_64 #1
Hardware name: Intel Corporation NUC7i7BNH/NUC7i7BNB, BIOS BNKBL357.86A.0088.2022.0125.1102 01/25/2022
RIP: 0010:__list_add_valid+0x78/0xa0
Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 78 da 92 b0 e8 9f 95 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 d0 da 92 b0 e8 88 95 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 28 db 92 b0 e8 71 95 99
RSP: 0018:ffffad2280907740 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff90408641ca00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9047deda1540 RDI: ffff9047deda1540
RBP: ffff904080f93368 R08: 0000000000000000 R09: ffffad22809075e8
R10: 0000000000000003 R11: ffffffffb1146108 R12: ffff904084b61c68
R13: ffff90408641ca10 R14: ffffad2280907810 R15: ffff9040a50d0000
FS:  00007fbe10be4740(0000) GS:ffff9047ded80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdfc8adcf78 CR3: 0000000115fc4005 CR4: 00000000003706e0
Call Trace:
 <TASK>
 ? die+0x36/0x90
 ? do_trap+0xda/0x100
 ? __list_add_valid+0x78/0xa0
 ? do_error_trap+0x6a/0x90
 ? __list_add_valid+0x78/0xa0
 ? exc_invalid_op+0x50/0x70
 ? __list_add_valid+0x78/0xa0
 ? asm_exc_invalid_op+0x1a/0x20
 ? __list_add_valid+0x78/0xa0
 nf_tables_bind_set+0x103/0x180 [nf_tables]
 nft_lookup_init+0xd3/0x140 [nf_tables]
 nf_tables_newrule+0x493/0xb90 [nf_tables]
 nfnetlink_rcv_batch+0x7ef/0x970 [nfnetlink]
 nfnetlink_rcv+0x179/0x1a0 [nfnetlink]
 netlink_unicast+0x19e/0x290
 netlink_sendmsg+0x254/0x4d0
 sock_sendmsg+0x93/0xa0
 ____sys_sendmsg+0x270/0x300
 ? copy_msghdr_from_user+0x7d/0xc0
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x5d/0x90
 ? syscall_exit_to_user_mode+0x1b/0x40
 ? do_syscall_64+0x6c/0x90
 ? do_user_addr_fault+0x237/0x600
 ? exc_page_fault+0x7c/0x180
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fbe105368b4
Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:00007ffd7e5e0288 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000004dc00 RCX: 00007fbe105368b4
RDX: 0000000000000000 RSI: 00007ffd7e5f13b0 RDI: 0000000000000006
RBP: 00007ffd7e5f1500 R08: 0000000000000004 R09: 0000000000000001
R10: 00007ffd7e5e025c R11: 0000000000000202 R12: 0000000000400000
R13: 00007ffd7e5f1520 R14: 00007ffd7e5e02a0 R15: 00007ffd7e5e0290
 </TASK>
Modules linked in: nf_log_syslog nft_log nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack macvlan nf_defrag_ipv6 nf_defrag_ipv4 cfg80211 ip_set rfkill nf_tables nfnetlink snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp>
 snd_hwdep snd_pcsp snd_pcm rapl intel_cstate e1000e snd_timer i2c_i801 snd wmi_bmof intel_uncore mei_me i2c_smbus soundcore intel_wmi_thunderbolt intel_xhci_usb_role_switch mei intel_pch_thermal acpi_pad auth_rpcgss sunrpc fuse tun loop zram i915 rtsx_pci_sdmmc mmc_core crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni i2c_algo_bit polyval_generic drm_buddy drm_display_helper ghash_clmulni_intel cec rtsx_pci sha>
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid+0x78/0xa0
Code: 99 ff 0f 0b 48 89 c1 48 c7 c7 78 da 92 b0 e8 9f 95 99 ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 d0 da 92 b0 e8 88 95 99 ff <0f> 0b 48 89 f2 48 89 c1 48 89 fe 48 c7 c7 28 db 92 b0 e8 71 95 99
RSP: 0018:ffffad2280907740 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff90408641ca00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9047deda1540 RDI: ffff9047deda1540
RBP: ffff904080f93368 R08: 0000000000000000 R09: ffffad22809075e8
R10: 0000000000000003 R11: ffffffffb1146108 R12: ffff904084b61c68
R13: ffff90408641ca10 R14: ffffad2280907810 R15: ffff9040a50d0000
FS:  00007fbe10be4740(0000) GS:ffff9047ded80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdfc8adcf78 CR3: 0000000115fc4005 CR4: 00000000003706e0

Comment 5 Phil Sutter 2023-07-05 15:04:42 UTC
Between v6.3.8 and v6.3.9 upstream commit 1240eb93f0616 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") was backported, which received some fixes:

26b5a5712eb85 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
3e70489721b6c ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")

The first one is part of v6.4, the second one still unreleased.

IIUC, all these deal with error paths only though. Could you please attach the firewalld config triggering this?

Comment 6 Phil Sutter 2023-07-05 15:08:18 UTC
(In reply to Phil Sutter from comment #5)
> 26b5a5712eb85 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal
> with bound set/chain")
> 3e70489721b6c ("netfilter: nf_tables: unbind non-anonymous set if rule
> construction fails")
> 
> The first one is part of v6.4, the second one still unreleased.

Correction: the first one got backported into v6.3.10. So if at all, the last
one would be the fix.

Comment 7 Anthony Messina 2023-07-06 12:32:52 UTC
Created attachment 1974292 [details]
nftables list ruleset output on first affected machine

Comment 8 Phil Sutter 2023-07-06 14:13:47 UTC
(In reply to Anthony Messina from comment #7)
> Created attachment 1974292 [details]
> nftables list ruleset output on first affected machine

Well, if you acquired this using 'nft list ruleset', it did not trigger the bug and neither will applying it. What I'm looking for is the firewalld config which causes the kernel stack trace after a call to 'systemctl start firewalld'.

I'll meanwhile prepare a kernel build for you to test with.

Thanks!

Comment 9 Anthony Messina 2023-07-07 01:42:01 UTC
(In reply to Phil Sutter from comment #8)
> (In reply to Anthony Messina from comment #7)
> > Created attachment 1974292 [details]
> > nftables list ruleset output on first affected machine
> 
> Well, if you acquired this using 'nft list ruleset', it did not trigger the
> bug and neither will applying it. What I'm looking for is the firewalld
> config which causes the kernel stack trace after a call to 'systemctl start
> firewalld'.
> 
> I'll meanwhile prepare a kernel build for you to test with.
> 
> Thanks!

I acquired the ruleset list using kernel 6.3.8 -- the last released non-affected kernel. kernels 6.3.9, 6.3.10, and 6.3.11 all exhibit the issue as originally reported.  When running any of the kernels after 6.3.8, the issue occurs at boot/during firewalld load after which `nft list ruleset` hangs and will not print the entire ruleset.  In that case, the firewalld service cannot be stopped until it's signalled to death by systemd on a reboot or poweroff.

I was looking for a good way to get you the "firewalld config" but there doesn't seem to be an option to "export" the final XML that's used to build the ruleset -- I'll just zip up the /etc/firewalld and /lib/firewalld directories.

Comment 10 Anthony Messina 2023-07-07 01:42:59 UTC
Created attachment 1974390 [details]
/etc/firewalld and /lib/firewalld contents

Comment 11 Anthony Messina 2023-07-08 03:02:22 UTC
The issue occurs with kernel-6.4.2-201.fc38 in my testing as well.

Comment 12 Anthony Messina 2023-07-12 14:51:01 UTC
(In reply to Anthony Messina from comment #9)
> (In reply to Phil Sutter from comment #8)
> > (In reply to Anthony Messina from comment #7)
> > > Created attachment 1974292 [details]
> > > nftables list ruleset output on first affected machine
> > 
> > Well, if you acquired this using 'nft list ruleset', it did not trigger the
> > bug and neither will applying it. What I'm looking for is the firewalld
> > config which causes the kernel stack trace after a call to 'systemctl start
> > firewalld'.
> > 
> > I'll meanwhile prepare a kernel build for you to test with.
> > 
> > Thanks!

Thanks for looking at this.  Please let me know if there is anything additional you need.  For the moment, I'm holding my machines at kernel-6.3.8 since I use a similar nft set / ipset configuration for most of them.

Comment 13 Anthony Messina 2023-07-14 11:37:28 UTC
The issue occurs with kernel-6.4.3-200.fc38.x86_64

Comment 14 Anthony Messina 2023-07-20 23:20:30 UTC
This appears to be resolved with kernel-6.4.4-200.fc38, https://bodhi.fedoraproject.org/updates/FEDORA-2023-e4e985b5dd


Note You need to log in before you can comment on or make changes to this bug.