Bug 1410661

Summary: skbuff: skb_over_panic
Product: [Fedora] Fedora Reporter: Zac Durham <zacdurham>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: cz172638, gansalmon, ichavero, itamar, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab, mercy.speax
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-06 18:33:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
combined dmesg and lspci output none

Description Zac Durham 2017-01-06 03:16:54 UTC
Created attachment 1237851 [details]
combined dmesg and lspci output

Description of problem:

I've recently observed crashes that render the wireless stack on the Surface Pro 3 utterly unusable. I've not only observed this on rawhide, but in f25 and several other distros using kernels 4.8 and beyond. The wireless hardware in the SP3 (same in the SP4, I believe) has never enjoyed stable wireless in Linux, to my knowledge or experience. I submit this to the Fedora team in hopes some improvement can come to Linux users on the Surface Pro platform.

How reproducible:

Extremely. It seems it doesn't take much throughput through the interface to trigger the panic. I've seen it soon as the desktop loads and (ostensibly) checks for available updates. Any appreciable yum or web browsing activity will surely trigger it. I've tried to take NetworkManager and similar utilities out of the equation and down to bare essentials like wpa_supplicant with the same result each time.


Steps to Reproduce:
1. Boot system (rawhide, f25 workstation)
2. Connect to 2.5 or 5ghz access point (N, AC)
3. Perform any amount of network activity and wait

Actual results:

Inevitably the following will be observed. Wireless device becomes completely unusable.

[  148.409668] skbuff: skb_over_panic: text:ffffffffc0adb620 len:3570 put:500 head:ffff8f97b8cbf000 data:ffff8f97b8cbf0e4 tail:0xed6 end:0xec0 dev:<NULL>
[  148.409710] ------------[ cut here ]------------
[  148.409740] kernel BUG at net/core/skbuff.c:105!
[  148.409763] invalid opcode: 0000 [#1] SMP
[  148.409783] Modules linked in: uas usb_storage rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep appletalk ax25 btusb ipx p8023 btrtl psnap btbcm p8022 btintel llc bluetooth vfat fat intel_rapl x86_pkg_temp_thermal hid_sensor_magn_3d intel_powerclamp hid_sensor_incl_3d hid_sensor_gyro_3d coretemp hid_sensor_als hid_sensor_rotation hid_sensor_accel_3d
[  148.410160]  kvm_intel hid_sensor_trigger kvm hid_sensor_iio_common industrialio_triggered_buffer irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_codec_generic intel_cstate snd_soc_rt5640 snd_soc_rl6231 snd_soc_core hid_sensor_hub iTCO_wdt iTCO_vendor_support hid_multitouch snd_compress snd_pcm_dmaengine elan_i2c ac97_bus intel_uncore intel_rapl_perf mwifiex_pcie mwifiex snd_hda_intel snd_hda_codec uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_hda_core videobuf2_core snd_hwdep cfg80211 videodev joydev snd_seq snd_seq_device media snd_pcm rfkill i2c_i801 mei_me snd_timer lpc_ich snd mei shpchp soundcore surface3_button surfacepro3_button soc_button_array tpm_crb snd_soc_sst_acpi snd_soc_sst_match dw_dmac acpi_als i2c_designware_platform
[  148.410514]  kfifo_buf industrialio i2c_designware_core tpm_tis tpm_tis_core spi_pxa2xx_platform tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c i915 i2c_algo_bit drm_kms_helper drm crc32c_intel sdhci_acpi sdhci mmc_core video fjes i2c_hid hid_microsoft
[  148.410652] CPU: 3 PID: 897 Comm: kworker/u9:3 Tainted: G        W       4.10.0-0.rc0.git9.1.fc26.x86_64 #1
[  148.410703] Hardware name: Microsoft Corporation Surface Pro 3/Surface Pro 3, BIOS 3.11.1550 06/30/2016
[  148.410768] Workqueue: MWIFIEX_WORK_QUEUE mwifiex_main_work_queue [mwifiex]
[  148.410801] task: ffff8f9833120000 task.stack: ffffb1ca01af8000
[  148.410831] RIP: 0010:skb_panic+0x64/0x70
[  148.410848] RSP: 0018:ffffb1ca01afbc98 EFLAGS: 00010282
[  148.410884] RAX: 000000000000008a RBX: ffff8f9841f80000 RCX: 0000000000000000
[  148.410917] RDX: 0000000000000000 RSI: ffff8f98473ce388 RDI: ffff8f98473ce388
[  148.410946] RBP: ffffb1ca01afbcb8 R08: 0000000000000001 R09: 0000000000000001
[  148.410978] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f97b8c440fc
[  148.411014] R13: ffff8f97e10650e8 R14: 00000000000001f4 R15: ffff8f97c2132400
[  148.411045] FS:  0000000000000000(0000) GS:ffff8f9847200000(0000) knlGS:0000000000000000
[  148.411084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  148.411110] CR2: 00003db385b0a070 CR3: 000000015fe0c000 CR4: 00000000001406e0
[  148.411140] Call Trace:
[  148.411156]  skb_put+0x4d/0x50
[  148.411189]  mwifiex_11n_aggregate_pkt+0x1f0/0x690 [mwifiex]
[  148.411222]  mwifiex_wmm_process_tx+0x496/0x9a0 [mwifiex]
[  148.411251]  mwifiex_main_process+0x604/0x8f0 [mwifiex]
[  148.411278]  mwifiex_main_work_queue+0x1f/0x30 [mwifiex]
[  148.411306]  process_one_work+0x251/0x700
[  148.411327]  ? process_one_work+0x1cd/0x700
[  148.411347]  worker_thread+0x4e/0x4a0
[  148.411359]  kthread+0x10f/0x150
[  148.411370]  ? process_one_work+0x700/0x700
[  148.411383]  ? kthread_create_on_node+0x60/0x60
[  148.411397]  ret_from_fork+0x2a/0x40
[  148.411408] Code: cc 00 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 d8 82 d2 9f 48 89 04 24 e8 3f d9 a3 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8 f7 75 cd ff 
[  148.411480] RIP: skb_panic+0x64/0x70 RSP: ffffb1ca01afbc98
[  148.423008] ---[ end trace ce7a32c51e9c2822 ]---
[  162.156359] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -110
[  162.156369] mwifiex_pcie 0000:01:00.0: failed to get signal information


Expected results:

For this not happen, obviously. Stable and consistent wireless hardware support.

Additional info:

Attached is this relevant dmesg snippet and lspci -nn output. I have an sosreport should anyone want it through appropriate channels.

Thank you all,

Zac

Comment 1 Laura Abbott 2018-04-06 18:33:55 UTC
Doing some pruning, this bug looks to be several kernel versions old. Please test on a newer kernel and reopen if the problem still exists.