RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1958013 - Wireguard: kernel NULL pointer dereference when remove module
Summary: Wireguard: kernel NULL pointer dereference when remove module
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Hangbin Liu
QA Contact: xmu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-07 02:22 UTC by Hangbin Liu
Modified: 2021-08-20 01:10 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-20 01:10:05 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Hangbin Liu 2021-05-07 02:22:25 UTC
Description of problem:
Got kernel NULL pointer dereference when remove wireguard module

Version-Release number of selected component (if applicable):
5.12.0-1.el9

How reproducible:


Steps to Reproduce:
1. modprobe wireguard
2. modprobe -r wireguard
3.

Actual results:
[80908.910347] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. 
[80908.919093] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason>. All Rights Reserved. 
[80908.929774] TECH PREVIEW: WireGuard may not be fully supported. 
[80908.929774] Please review provided documentation for limitations. 
[80925.506872] BUG: kernel NULL pointer dereference, address: 0000000000000000 
[80925.514645] #PF: supervisor read access in kernel mode 
[80925.520376] #PF: error_code(0x0000) - not-present page 
[80925.526108] PGD 0 P4D 0  
[80925.528930] Oops: 0000 [#1] SMP PTI 
[80925.532820] CPU: 20 PID: 46712 Comm: modprobe Tainted: G               X --------- ---  5.12.0-1.el9.x86_64 #1 
[80925.543980] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 2.11.0 11/02/2019 
[80925.552424] RIP: 0010:__list_del_entry_valid+0x2d/0x50 
[80925.558159] Code: 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 fe 8d 4f 00 48 b8 22 01 00 00 00 00 ad de 49 39 c0 0f 84 1e 8e 4f 00 <49> 8b 30 48 39 fe 0f 85 fe 8d 4f 00 48 8b 52 08 48 39 f2 0f 85 e3 
[80925.579112] RSP: 0018:ffffc13e012fbeb8 EFLAGS: 00010217 
[80925.584941] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000000 
[80925.592901] RDX: 0000000000000000 RSI: ffffc13e012fbee8 RDI: ffffffffc06e9038 
[80925.600861] RBP: ffffffffc06e9038 R08: 0000000000000000 R09: 0000000000000000 
[80925.608821] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 
[80925.616780] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
[80925.624740] FS:  00007f8d3dbcab80(0000) GS:ffff9caabfd00000(0000) knlGS:0000000000000000 
[80925.633767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[80925.640176] CR2: 0000000000000000 CR3: 00000003700a8003 CR4: 00000000001706e0 
[80925.648136] Call Trace: 
[80925.650861]  crypto_unregister_alg+0x47/0xe0 
[80925.655626]  __do_sys_delete_module.constprop.0+0x174/0x280 
[80925.661844]  ? exit_to_user_mode_loop+0x4d/0x120 
[80925.666996]  do_syscall_64+0x33/0x40 
[80925.670984]  entry_SYSCALL_64_after_hwframe+0x44/0xae 
[80925.676619] RIP: 0033:0x7f8d3dcf708b 
[80925.680605] Code: 73 01 c3 48 8b 0d e5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 1d 0c 00 f7 d8 64 89 01 48 
[80925.701558] RSP: 002b:00007ffd699ebfb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 
[80925.710003] RAX: ffffffffffffffda RBX: 000056501b791b80 RCX: 00007f8d3dcf708b 
[80925.717963] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000056501b791be8 
[80925.725923] RBP: 000056501b791b80 R08: 0000000000000000 R09: 0000000000000000 
[80925.733883] R10: 00007f8d3dd6aac0 R11: 0000000000000206 R12: 000056501b791be8 
[80925.741843] R13: 0000000000000000 R14: 000056501b78ef18 R15: 00007ffd699ee2f8 
[80925.749803] Modules linked in: libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic ip6_udp_tunnel udp_tunnel curve25519_x86_64(-) libcurve25519_generic libchacha rfkill intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp iTCO_wdt iTCO_vendor_support coretemp dcdbas kvm_intel kvm irqbypass mgag200 rapl intel_cstate pcspkr intel_uncore drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec mxm_wmi ipmi_ssif lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse ip_tables xfs libcrc32c sd_mod t10_pi crct10dif_pclmul ixgbe igb crc32_pclmul ahci crc32c_intel libahci mdio i2c_algo_bit libata ghash_clmulni_intel dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: wireguard] 
[80925.827215] CR2: 0000000000000000 
[80925.830923] ---[ end trace 50416f51d8b1f480 ]--- 
[80925.844620] RIP: 0010:__list_del_entry_valid+0x2d/0x50 
[80925.850357] Code: 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 fe 8d 4f 00 48 b8 22 01 00 00 00 00 ad de 49 39 c0 0f 84 1e 8e 4f 00 <49> 8b 30 48 39 fe 0f 85 fe 8d 4f 00 48 8b 52 08 48 39 f2 0f 85 e3 
[80925.871312] RSP: 0018:ffffc13e012fbeb8 EFLAGS: 00010217 
[80925.877143] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000000 
[80925.885105] RDX: 0000000000000000 RSI: ffffc13e012fbee8 RDI: ffffffffc06e9038 
[80925.893067] RBP: ffffffffc06e9038 R08: 0000000000000000 R09: 0000000000000000 
[80925.901031] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 
[80925.908994] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
[80925.916956] FS:  00007f8d3dbcab80(0000) GS:ffff9caabfd00000(0000) knlGS:0000000000000000 
[80925.925986] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[80925.932397] CR2: 0000000000000000 CR3: 00000003700a8003 CR4: 00000000001706e0 
[80925.940360] Kernel panic - not syncing: Fatal exception 
[80925.946201] Kernel Offset: 0x9400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 
[80925.967974] ---[ end Kernel panic - not syncing: Fatal exception ]--- 

Expected results:


Additional info:

Comment 1 Hangbin Liu 2021-05-07 02:35:12 UTC
[root@wsfd-netdev21 ~]# uname -r
5.12.0-1.el9.x86_64
[root@wsfd-netdev21 ~]# modprobe wireguard
[root@wsfd-netdev21 ~]# lsmod | grep wireguard
wireguard              98304  0
libchacha20poly1305    16384  1 wireguard
libblake2s             16384  1 wireguard
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             24576  1 wireguard
curve25519_x86_64      36864  1 wireguard
libcurve25519_generic    49152  2 curve25519_x86_64,wireguard


Kernel crashed when do crypto_unregister_alg(). I can't reproduce it on upstream 5.12.0-rc4. I will try 5.12 later.
cc Herbert and Simo

Comment 2 Hangbin Liu 2021-05-08 01:17:57 UTC
Test with upstream 5.12.0, no crash. But I build cryptos with modules. Not sure if the crash is due to buildin cryptos

Comment 3 Hangbin Liu 2021-05-27 04:15:41 UTC
Tested with 5.13.0-0.rc2.19.el9.x86_64, no panic now.

Comment 4 xmu 2021-05-31 11:01:42 UTC
The same call trace was found on kernel 5.13.0-0.rc2.19.el9.x86_64. reopen it.


update a reproducer:
setup wg VPN , and run traffic , and then remove wireguard module

Node 1 : 10.20.20.4
Node 2 : 10.20.20.3

node 1:
# modprobe wireguard
# umask 077
# wg genkey >private_appserver1
# ip link add dev wg0 type wireguard
# ip addr add 192.168.10.1/24 dev wg0
# wg set wg0 private-key ./private_appserver1
# ip link set wg0 up
# wg
# wg set wg0 peer MDaeWgZVULXP4gvOj4UmN7bW/uniQeBionqJyzEzSC0= allowed-ips 192.168.10.0/24  endpoint  10.20.20.3:54371
# ping 192.168.10.2 -c 4
# modprobe -r wireguard

node 2:
# umask 077
# wg genkey >private_dbserver1
# wg pubkey < private_dbserver1
# ip link add dev wg0 type wireguard
# ip addr add 192.168.10.2/24 dev wg0
# wg set wg0 private-key ./private_dbserver1
# ip link set wg0 up
# wg
# wg set wg0 peer 6yNLmpkbfsL2ijx7z996ZHl2bNFz9Psp9V6BhoHjvmk= allowed-ips 192.168.10.0/24 endpoint  10.20.20.4:42930
# ping 192.168.10.1 -c 4
# modprobe -r wireguard


[420346.399267] BUG: kernel NULL pointer dereference, address: 0000000000000000
[420346.407138] #PF: supervisor read access in kernel mode
[420346.412965] #PF: error_code(0x0000) - not-present page
[420346.418794] PGD 0 P4D 0
[420346.421716] Oops: 0000 [#1] SMP PTI
[420346.425704] CPU: 4 PID: 49375 Comm: modprobe Tainted: G           OE  X --------- ---  5.13.0-0.rc2.19.el9.x86_64 #1
[420346.437544] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016
[420346.445989] RIP: 0010:__list_del_entry_valid+0x2d/0x50
[420346.451824] Code: 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 35 23 50 00 48 b8 22 01 00 00 00 00 ad de 49 39 c0 0f 84 55 23 50 00 <49> 8b 30 48 39 fe 0f 85 35 23 50 00 48 8b 52 08 48 39 f2 0f 85 1a
[420346.472874] RSP: 0018:ffff9f9287807eb0 EFLAGS: 00010217
[420346.478798] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000000
[420346.486857] RDX: 0000000000000000 RSI: ffff9f9287807ee0 RDI: ffffffffc06b4038
[420346.494915] RBP: ffffffffc06b4038 R08: 0000000000000000 R09: 0000000000000000
[420346.502972] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f9287807f58
[420346.511029] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[420346.519086] FS:  00007fd8a60f1b80(0000) GS:ffff8e699fa80000(0000) knlGS:0000000000000000
[420346.528212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420346.534719] CR2: 0000000000000000 CR3: 000000017014a004 CR4: 00000000001706e0
[420346.542777] Call Trace:
[420346.545601]  crypto_unregister_alg+0x47/0xe0
[420346.550465]  __do_sys_delete_module.constprop.0+0x174/0x280
[420346.556782]  do_syscall_64+0x40/0x80
[420346.560867]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[420346.566601] RIP: 0033:0x7fd8a621e18b
[420346.570683] Code: 73 01 c3 48 8b 0d e5 1c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 1c 0c 00 f7 d8 64 89 01 48
[420346.591734] RSP: 002b:00007ffdf4fd6d08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[420346.600277] RAX: ffffffffffffffda RBX: 0000555c3d1e7bd0 RCX: 00007fd8a621e18b
[420346.608335] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000555c3d1e7c38
[420346.616392] RBP: 0000555c3d1e7bd0 R08: 0000000000000000 R09: 0000000000000000
[420346.624450] R10: 00007fd8a6291ac0 R11: 0000000000000206 R12: 0000555c3d1e7c38
[420346.632507] R13: 0000000000000000 R14: 0000555c3d1e4ed8 R15: 00007ffdf4fd9048
[420346.640566] Modules linked in: libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic ip6_udp_tunnel udp_tunnel curve25519_x86_64(-) libcurve25519_generic libchacha binfmt_misc mlx5_ib ib_uverbs mlx5_core psample mlxfw ib_core tls pci_hyperv_intf rfkill sunrpc iTCO_wdt iTCO_vendor_support intel_rapl_msr dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate mgag200 i2c_algo_bit intel_uncore drm_kms_helper pcspkr syscopyarea sysfillrect sysimgblt mxm_wmi fb_sys_fops cec ipmi_ssif lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter fuse drm ip_tables xfs libcrc32c sr_mod cdrom sd_mod t10_pi ahci libahci crct10dif_pclmul crc32_pclmul i40e crc32c_intel libata ghash_clmulni_intel megaraid_sas tg3 wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: wireguard]
[420346.728274] CR2: 0000000000000000
[420346.732091] ---[ end trace 6d688b1d5861dbd0 ]---
[420346.754602] RIP: 0010:__list_del_entry_valid+0x2d/0x50
[420346.760446] Code: 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 35 23 50 00 48 b8 22 01 00 00 00 00 ad de 49 39 c0 0f 84 55 23 50 00 <49> 8b 30 48 39 fe 0f 85 35 23 50 00 48 8b 52 08 48 39 f2 0f 85 1a
[420346.781501] RSP: 0018:ffff9f9287807eb0 EFLAGS: 00010217
[420346.787431] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000000
[420346.795496] RDX: 0000000000000000 RSI: ffff9f9287807ee0 RDI: ffffffffc06b4038
[420346.803559] RBP: ffffffffc06b4038 R08: 0000000000000000 R09: 0000000000000000
[420346.811621] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f9287807f58
[420346.819683] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[420346.827744] FS:  00007fd8a60f1b80(0000) GS:ffff8e699fa80000(0000) knlGS:0000000000000000
[420346.836873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420346.843385] CR2: 0000000000000000 CR3: 000000017014a004 CR4: 00000000001706e0
[420346.851449] Kernel panic - not syncing: Fatal exception
[420346.857388] Kernel Offset: 0x2ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[420346.888250] ---[ end Kernel panic - not syncing: Fatal exception ]---

Comment 6 Hangbin Liu 2021-05-31 12:13:46 UTC
(In reply to xmu from comment #4)
> The same call trace was found on kernel 5.13.0-0.rc2.19.el9.x86_64. reopen
> it.
> 
> 
> update a reproducer:
> setup wg VPN , and run traffic , and then remove wireguard module

Hmm, maybe the reason is when having traffic, we registered the lib crypto. But it unregister failed when do modprobe -r wireguard.

I will check it.

Thanks
Hangbin

Comment 7 Herbert Xu 2021-05-31 12:16:24 UTC
Sounds like wireguard doesn't do module reference counting for the devices that it creates.  You shouldn't be able to unload a module if there are network devices using its code.  IOW after you've created the wireguard device, lsmod should show an elevated reference count on wireguard and you shouldn't be able to do modprobe -r on wireguard at all.

Comment 8 Herbert Xu 2021-05-31 12:18:33 UTC
I take that back.  You should be able to remove wireguard even with devices in existence.

Comment 9 Herbert Xu 2021-05-31 12:23:14 UTC
I retract my retraction :) Because wireguard is a virtual device, it does need to do module reference counting because otherwise there is no way to ensure that the devices have been properly destroyed prior to the module unload.

So this is something that needs to be fixed upstream by elevating the module reference count.

Comment 10 Hangbin Liu 2021-06-11 08:38:08 UTC
Herber has applied the fix https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=1b82435d177
Let's just wait for it backported to RHEL9

Comment 11 Hangbin Liu 2021-07-16 01:40:23 UTC
Hi XiuMei,

This patch should has been merged to 5.13 and backport latest RHEL9.
Would you help check it. If it's fixed. I will close this bug.

Thanks
Hangbin

Comment 12 Hangbin Liu 2021-07-16 01:42:59 UTC
en. No, looks it hasn't been merged to 5.13. Please keep waiting for some time.

Comment 13 Hangbin Liu 2021-08-11 10:35:56 UTC
Hi Xiumei, The fix should as backport to the latest RHEL9, would you please help verify it?

Comment 14 xmu 2021-08-19 03:11:56 UTC
Hangbin, 
  The issue disappears on kernel 5.14.0-0.rc4.35.el9.1.x86_64.

Comment 15 Hangbin Liu 2021-08-20 01:10:05 UTC
Thanks for the confirmation. I will close this bug.


Note You need to log in before you can comment on or make changes to this bug.