Bug 1745474

Summary: RHEL8: kernel-rt: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
Product: Red Hat Enterprise Linux 8 Reporter: Chunyu Hu <chuhu>
Component: kernel-rtAssignee: Red Hat Real Time Maintenance <rt-maint>
kernel-rt sub component: NIC Drivers QA Contact: Ma Yuying <yuma>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bhu, jlelli, network-qe, qzhao, williams
Version: 8.1   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-27 03:02:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1680412    

Description Chunyu Hu 2019-08-26 08:04:55 UTC
Description of problem:

kernel panic in i40e when reboot on host:
lenovo-sr950-01.lab.eng.pek2.redhat.com (6T memory).
https://beaker.engineering.redhat.com/view/lenovo-sr950-01.lab.eng.pek2.redhat.com#details

[  256.908991] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 
[  256.908992] PGD 0 P4D 0  
[  256.908996] Oops: 0000 [#1] PREEMPT SMP PTI 
[  256.908997] CPU: 217 PID: 43923 Comm: reboot Not tainted 4.18.0-135.rt24.80.el8.x86_64 #1 
[  256.908998] Hardware name: Lenovo ThinkSystem SR950 -[7X12CTO1WW]-/-[7X12CTO1WW]-, BIOS -[PSE118M-1.41]- 10/30/2018 
[  256.909009] RIP: 0010:__kthread_cancel_work_sync+0x14/0xd0 
[  256.909010] Code: 08 eb b0 e8 6e a5 fd ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 45 31 ed 41 54 55 53 48 83 ec 10 <48> 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 85 
[  256.909011] RSP: 0018:ffffb15559253c90 EFLAGS: 00010286 
[  256.909013] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 0000000000000000 
[  256.909014] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008 
[  256.909014] RBP: ffff94e6b488d000 R08: ffff94e6badc42f8 R09: ffff94e6badc4228 
[  256.909015] R10: 0000000000000000 R11: 0000000000000000 R12: ffff94e6b3999d58 
[  256.909016] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000155 
[  256.909017] FS:  00007fc4215ad480(0000) GS:ffff98a6bfa40000(0000) knlGS:0000000000000000 
[  256.909018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  256.909019] CR2: 0000000000000020 CR3: 0000047d35eb4005 CR4: 00000000007606e0 
[  256.909019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  256.909020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[  256.909020] PKRU: 55555554 
[  256.909021] Call Trace: 
[  256.909031]  ? delay_tsc+0xb3/0x100 
[  256.909038]  irq_set_affinity_notifier+0x8b/0xc0 
[  256.909055]  i40e_vsi_free_irq+0xb9/0x210 [i40e] 
[  256.909061]  i40e_vsi_close+0x25/0x80 [i40e] 
[  256.909067]  i40e_vsi_release+0x24e/0x2d0 [i40e] 
[  256.909073]  i40e_shutdown+0x49/0x120 [i40e] 
[  256.909081]  pci_device_shutdown+0x34/0x60 
[  256.909087]  device_shutdown+0x15a/0x210 
[  256.909092]  kernel_restart+0xe/0x30 
[  256.909095]  __do_sys_reboot+0x1d2/0x210 
[  256.909099]  ? vfs_writev+0xc5/0x100 
[  256.909103]  ? __audit_syscall_entry+0xd7/0x160 
[  256.909107]  ? syscall_trace_enter+0x1fb/0x300 
[  256.909109]  ? __audit_syscall_exit+0x228/0x290 
[  256.909111]  do_syscall_64+0x5b/0x1b0 
[  256.909120]  entry_SYSCALL_64_after_hwframe+0x65/0xca 
[  256.909122] RIP: 0033:0x7fc420803af7 
[  256.909123] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 61 93 2c 00 f7 d8 64 89 02 b8 
[  256.909124] RSP: 002b:00007ffc67518c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9 
[  256.909126] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc420803af7 
[  256.909127] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead 
[  256.909127] RBP: 00007ffc67518ca0 R08: 0000000000000002 R09: 0000000000000000 
[  256.909128] R10: 000000000000004b R11: 0000000000000246 R12: 0000000000000001 
[  256.909129] R13: 00000000fffffffe R14: 0000000000000006 R15: 0000000000000000 
[  256.909130] Modules linked in: nfsv4 dns_resolver nfs lockd grace fscache sunrpc iTCO_wdt iTCO_vendor_support intel_rapl skx_edac nfit libnvdimm intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate cdc_ether intel_uncore intel_rapl_perf usbnet pcspkr mii ipmi_ssif sg mei_me ipmi_si mei i2c_i801 lpc_ich ipmi_devintf ipmi_msghandler wmi ioatdma dca acpi_pad acpi_power_meter binfmt_misc xfs libcrc32c sd_mod crc32c_intel mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm i40e drm ahci libahci megaraid_sas libata dm_mirror dm_region_hash dm_log dm_mod 
[  256.909173] CR2: 0000000000000020 
[ [  257.307008] RIP: 0010:__kthread_cancel_work_sync+0x14/0xd0 
[  257.307010] Code: 08 eb b0 e8 6e a5 fd ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 45 31 ed 41 54 55 53 48 83 ec 10 <48> 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 85 
[  257.307012] RSP: 0018:ffffb15559253c90 EFLAGS: 00010286 
[  257.307015] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 0000000000000000 
[  257.307017] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008 
[  257.307019] RBP: ffff94e6b488d000 R08: ffff94e6badc42f8 R09: ffff94e6badc4228 
[  257.307020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff94e6b3999d58 
[  257.307022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000155 
[  257.307023] FS:  00007fc4215ad480(0000) GS:ffff98a6bfa40000(0000) knlGS:0000000000000000 
[  257.307025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  257.307026] CR2: 0000000000000020 CR3: 0000047d35eb4005 CR4: 00000000007606e0 
[  257.307027] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  257.307027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[  257.307029] PKRU: 55555554 
[  257.307030] Kernel panic - not syncing: Fatal exception 
[  257.388818] Kernel Offset: 0x1e800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 

Version-Release number of selected component (if applicable):
4.18.0-135.rt24.80.el8.x86_64 

How reproducible:
always

Steps to Reproduce:
1. https://beaker.engineering.redhat.com/recipes/7274048#task98204289
2.
3.

Actual results:
kernel panic in i40e

Expected results:
no kernel panic in i40e for kernel-rt.

Additional info:
No panic in none-rt variant.