Bug 1958013
| Summary: | Wireguard: kernel NULL pointer dereference when remove module | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Hangbin Liu <haliu> |
| Component: | kernel | Assignee: | Hangbin Liu <haliu> |
| kernel sub component: | Networking | QA Contact: | xmu |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | atragler, herbert.xu, jiji, kzhang, ssorce, sukulkar, xmu |
| Version: | 9.0 | Keywords: | Reopened |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-20 01:10:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Hangbin Liu
2021-05-07 02:22:25 UTC
[root@wsfd-netdev21 ~]# uname -r 5.12.0-1.el9.x86_64 [root@wsfd-netdev21 ~]# modprobe wireguard [root@wsfd-netdev21 ~]# lsmod | grep wireguard wireguard 98304 0 libchacha20poly1305 16384 1 wireguard libblake2s 16384 1 wireguard ip6_udp_tunnel 16384 1 wireguard udp_tunnel 24576 1 wireguard curve25519_x86_64 36864 1 wireguard libcurve25519_generic 49152 2 curve25519_x86_64,wireguard Kernel crashed when do crypto_unregister_alg(). I can't reproduce it on upstream 5.12.0-rc4. I will try 5.12 later. cc Herbert and Simo Test with upstream 5.12.0, no crash. But I build cryptos with modules. Not sure if the crash is due to buildin cryptos Tested with 5.13.0-0.rc2.19.el9.x86_64, no panic now. The same call trace was found on kernel 5.13.0-0.rc2.19.el9.x86_64. reopen it. update a reproducer: setup wg VPN , and run traffic , and then remove wireguard module Node 1 : 10.20.20.4 Node 2 : 10.20.20.3 node 1: # modprobe wireguard # umask 077 # wg genkey >private_appserver1 # ip link add dev wg0 type wireguard # ip addr add 192.168.10.1/24 dev wg0 # wg set wg0 private-key ./private_appserver1 # ip link set wg0 up # wg # wg set wg0 peer MDaeWgZVULXP4gvOj4UmN7bW/uniQeBionqJyzEzSC0= allowed-ips 192.168.10.0/24 endpoint 10.20.20.3:54371 # ping 192.168.10.2 -c 4 # modprobe -r wireguard node 2: # umask 077 # wg genkey >private_dbserver1 # wg pubkey < private_dbserver1 # ip link add dev wg0 type wireguard # ip addr add 192.168.10.2/24 dev wg0 # wg set wg0 private-key ./private_dbserver1 # ip link set wg0 up # wg # wg set wg0 peer 6yNLmpkbfsL2ijx7z996ZHl2bNFz9Psp9V6BhoHjvmk= allowed-ips 192.168.10.0/24 endpoint 10.20.20.4:42930 # ping 192.168.10.1 -c 4 # modprobe -r wireguard [420346.399267] BUG: kernel NULL pointer dereference, address: 0000000000000000 [420346.407138] #PF: supervisor read access in kernel mode [420346.412965] #PF: error_code(0x0000) - not-present page [420346.418794] PGD 0 P4D 0 [420346.421716] Oops: 0000 [#1] SMP PTI [420346.425704] CPU: 4 PID: 49375 Comm: modprobe Tainted: G OE X --------- --- 5.13.0-0.rc2.19.el9.x86_64 #1 [420346.437544] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 [420346.445989] RIP: 0010:__list_del_entry_valid+0x2d/0x50 [420346.451824] Code: 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 35 23 50 00 48 b8 22 01 00 00 00 00 ad de 49 39 c0 0f 84 55 23 50 00 <49> 8b 30 48 39 fe 0f 85 35 23 50 00 48 8b 52 08 48 39 f2 0f 85 1a [420346.472874] RSP: 0018:ffff9f9287807eb0 EFLAGS: 00010217 [420346.478798] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000000 [420346.486857] RDX: 0000000000000000 RSI: ffff9f9287807ee0 RDI: ffffffffc06b4038 [420346.494915] RBP: ffffffffc06b4038 R08: 0000000000000000 R09: 0000000000000000 [420346.502972] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f9287807f58 [420346.511029] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [420346.519086] FS: 00007fd8a60f1b80(0000) GS:ffff8e699fa80000(0000) knlGS:0000000000000000 [420346.528212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420346.534719] CR2: 0000000000000000 CR3: 000000017014a004 CR4: 00000000001706e0 [420346.542777] Call Trace: [420346.545601] crypto_unregister_alg+0x47/0xe0 [420346.550465] __do_sys_delete_module.constprop.0+0x174/0x280 [420346.556782] do_syscall_64+0x40/0x80 [420346.560867] entry_SYSCALL_64_after_hwframe+0x44/0xae [420346.566601] RIP: 0033:0x7fd8a621e18b [420346.570683] Code: 73 01 c3 48 8b 0d e5 1c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 1c 0c 00 f7 d8 64 89 01 48 [420346.591734] RSP: 002b:00007ffdf4fd6d08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [420346.600277] RAX: ffffffffffffffda RBX: 0000555c3d1e7bd0 RCX: 00007fd8a621e18b [420346.608335] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000555c3d1e7c38 [420346.616392] RBP: 0000555c3d1e7bd0 R08: 0000000000000000 R09: 0000000000000000 [420346.624450] R10: 00007fd8a6291ac0 R11: 0000000000000206 R12: 0000555c3d1e7c38 [420346.632507] R13: 0000000000000000 R14: 0000555c3d1e4ed8 R15: 00007ffdf4fd9048 [420346.640566] Modules linked in: libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic ip6_udp_tunnel udp_tunnel curve25519_x86_64(-) libcurve25519_generic libchacha binfmt_misc mlx5_ib ib_uverbs mlx5_core psample mlxfw ib_core tls pci_hyperv_intf rfkill sunrpc iTCO_wdt iTCO_vendor_support intel_rapl_msr dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate mgag200 i2c_algo_bit intel_uncore drm_kms_helper pcspkr syscopyarea sysfillrect sysimgblt mxm_wmi fb_sys_fops cec ipmi_ssif lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter fuse drm ip_tables xfs libcrc32c sr_mod cdrom sd_mod t10_pi ahci libahci crct10dif_pclmul crc32_pclmul i40e crc32c_intel libata ghash_clmulni_intel megaraid_sas tg3 wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: wireguard] [420346.728274] CR2: 0000000000000000 [420346.732091] ---[ end trace 6d688b1d5861dbd0 ]--- [420346.754602] RIP: 0010:__list_del_entry_valid+0x2d/0x50 [420346.760446] Code: 4c 8b 47 08 48 b8 00 01 00 00 00 00 ad de 48 39 c2 0f 84 35 23 50 00 48 b8 22 01 00 00 00 00 ad de 49 39 c0 0f 84 55 23 50 00 <49> 8b 30 48 39 fe 0f 85 35 23 50 00 48 8b 52 08 48 39 f2 0f 85 1a [420346.781501] RSP: 0018:ffff9f9287807eb0 EFLAGS: 00010217 [420346.787431] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000000 [420346.795496] RDX: 0000000000000000 RSI: ffff9f9287807ee0 RDI: ffffffffc06b4038 [420346.803559] RBP: ffffffffc06b4038 R08: 0000000000000000 R09: 0000000000000000 [420346.811621] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f9287807f58 [420346.819683] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [420346.827744] FS: 00007fd8a60f1b80(0000) GS:ffff8e699fa80000(0000) knlGS:0000000000000000 [420346.836873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [420346.843385] CR2: 0000000000000000 CR3: 000000017014a004 CR4: 00000000001706e0 [420346.851449] Kernel panic - not syncing: Fatal exception [420346.857388] Kernel Offset: 0x2ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [420346.888250] ---[ end Kernel panic - not syncing: Fatal exception ]--- (In reply to xmu from comment #4) > The same call trace was found on kernel 5.13.0-0.rc2.19.el9.x86_64. reopen > it. > > > update a reproducer: > setup wg VPN , and run traffic , and then remove wireguard module Hmm, maybe the reason is when having traffic, we registered the lib crypto. But it unregister failed when do modprobe -r wireguard. I will check it. Thanks Hangbin Sounds like wireguard doesn't do module reference counting for the devices that it creates. You shouldn't be able to unload a module if there are network devices using its code. IOW after you've created the wireguard device, lsmod should show an elevated reference count on wireguard and you shouldn't be able to do modprobe -r on wireguard at all. I take that back. You should be able to remove wireguard even with devices in existence. I retract my retraction :) Because wireguard is a virtual device, it does need to do module reference counting because otherwise there is no way to ensure that the devices have been properly destroyed prior to the module unload. So this is something that needs to be fixed upstream by elevating the module reference count. Herber has applied the fix https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=1b82435d177 Let's just wait for it backported to RHEL9 Hi XiuMei, This patch should has been merged to 5.13 and backport latest RHEL9. Would you help check it. If it's fixed. I will close this bug. Thanks Hangbin en. No, looks it hasn't been merged to 5.13. Please keep waiting for some time. Hi Xiumei, The fix should as backport to the latest RHEL9, would you please help verify it? Hangbin, The issue disappears on kernel 5.14.0-0.rc4.35.el9.1.x86_64. Thanks for the confirmation. I will close this bug. |