Bug 1762031 - kernel NULL pointer dereference when charger is unplugged
Summary: kernel NULL pointer dereference when charger is unplugged
Keywords:
Status: CLOSED DUPLICATE of bug 1785972
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-15 21:44 UTC by vincent.datrier
Modified: 2020-03-14 14:35 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-02 08:42:51 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lshw output (sanitized) + dmesg (94.35 KB, text/plain)
2019-10-15 21:44 UTC, vincent.datrier
no flags Details
Dmesg with rawhide kernel (140.46 KB, text/plain)
2019-10-18 08:22 UTC, vincent.datrier
no flags Details

Description vincent.datrier 2019-10-15 21:44:31 UTC
Created attachment 1626197 [details]
lshw output (sanitized) + dmesg

1. Please describe the problem:

Every time I unplug my charger, I get a oops about a NULL pointer dereference. The computer then slows down to a crawl, up until it fully freezes. ABRT tries to get a report, but never gets to write any data. I installed kdump and forced the panic on oops setting to get as much data as possible. I initially thought that tlp was the culprit; deactivating it yielded no improvements. Then, reading dmesg while troubleshooting I noticed lockdown was blocking direct writes to registers. I disabled Secure Boot but the problem still is there.


2. What is the Version-Release number of the kernel:
5.3.4-300.fc31.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I only tried Fedora 30 and 31; both have the issue. Interestingly, I used Manjaro before and never had this specific problem.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Boot on AC power, unplug the charger. ``dmesg -w`` shows the oops the second I unplug the charger.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I cannot download it; dnf complains about keys for rawhide not matching the repo (signing with Fedora 32 keys).


6. Are you running any modules that not shipped with directly Fedora's kernel?:

I enabled Intel's GuC firmware loading but disabled it, thinking this issue appeared after this modification. With the kernel completely untainted the issue remains.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.


Here's the oops:

[ 1110.200260] BUG: kernel NULL pointer dereference, address: 0000000000000080
[ 1110.200268] #PF: supervisor read access in kernel mode
[ 1110.200271] #PF: error_code(0x0000) - not-present page
[ 1110.200274] PGD 0 P4D 0 
[ 1110.200281] Oops: 0000 [#1] SMP PTI
[ 1110.200288] CPU: 0 PID: 12998 Comm: kworker/0:0 Kdump: loaded Not tainted 5.3.4-300.fc31.x86_64 #1
[ 1110.200291] Hardware name: ASUSTeK COMPUTER INC. UX370UAR/UX370UAR, BIOS UX370UAR.310 04/17/2019
[ 1110.200303] Workqueue: events ucsi_connector_change [typec_ucsi]
[ 1110.200312] RIP: 0010:ucsi_displayport_remove_partner+0xa/0x20 [typec_ucsi]
[ 1110.200318] Code: 38 00 c7 43 28 00 00 00 00 48 83 c7 10 5b e9 2d 9b 01 d3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 48 85 ff 74 0f <48> 8b 47 78 48 c7 00 00 00 00 00 c6 40 3d 00 c3 66 0f 1f 44 00 00
[ 1110.200322] RSP: 0018:ffff9e2d84ec7df8 EFLAGS: 00010202
[ 1110.200326] RAX: 0000000000000008 RBX: ffff90fe43b70170 RCX: 00000000000056d3
[ 1110.200329] RDX: 00000000000056d2 RSI: 0000000000000001 RDI: 0000000000000008
[ 1110.200332] RBP: 0000000000000000 R08: ffffffff94528880 R09: ffff9e2d84ec7cf0
[ 1110.200334] R10: ffff90fecc3be1ff R11: 0000000000000000 R12: ffff90fe43b70170
[ 1110.200337] R13: 0000000000000001 R14: ffff90fe43b702c0 R15: ffff90fe43b70038
[ 1110.200341] FS:  0000000000000000(0000) GS:ffff90fe4ea00000(0000) knlGS:0000000000000000
[ 1110.200344] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1110.200348] CR2: 0000000000000080 CR3: 000000038e40a001 CR4: 00000000003606f0
[ 1110.200350] Call Trace:
[ 1110.200361]  ucsi_unregister_altmodes+0x7b/0x90 [typec_ucsi]
[ 1110.200370]  ucsi_unregister_partner.part.0+0x13/0x30 [typec_ucsi]
[ 1110.200377]  ucsi_connector_change+0x247/0x340 [typec_ucsi]
[ 1110.200389]  process_one_work+0x19d/0x340
[ 1110.200397]  worker_thread+0x50/0x3b0
[ 1110.200404]  kthread+0xfb/0x130
[ 1110.200411]  ? process_one_work+0x340/0x340
[ 1110.200416]  ? kthread_park+0x80/0x80
[ 1110.200425]  ret_from_fork+0x35/0x40
[ 1110.200431] Modules linked in: squashfs zstd_decompress loop rfcomm fuse ccm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables cmac ip6table_filter ip6_tables iptable_filter bnep sunrpc vfat fat uvcvideo btusb btrtl btbcm btintel videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 bluetooth videobuf2_common videodev mc ecdh_generic ecc snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc joydev snd_soc_sst_ipc iwlmvm snd_hda_codec_hdmi snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 snd_soc_core snd_hda_codec_realtek
[ 1110.200494]  snd_hda_codec_generic ledtrig_audio snd_compress spi_pxa2xx_platform kvm_intel ac97_bus mei_hdcp dw_dmac iTCO_wdt hid_multitouch snd_pcm_dmaengine snd_hda_intel typec_displayport iTCO_vendor_support libarc4 intel_rapl_msr snd_hda_codec gpio_keys kvm iwlwifi snd_hda_core snd_hwdep irqbypass snd_seq intel_cstate intel_uncore snd_seq_device snd_pcm intel_rapl_perf cfg80211 asus_nb_wmi hid_sensor_accel_3d snd_timer asus_wmi sparse_keymap wmi_bmof hid_sensor_trigger snd hid_sensor_iio_common industrialio_triggered_buffer mei_me kfifo_buf soundcore intel_xhci_usb_role_switch rfkill i2c_i801 mei industrialio roles idma64 processor_thermal_device ucsi_acpi typec_ucsi intel_rapl_common cros_ec_ishtp intel_lpss_pci cros_ec_core intel_lpss intel_soc_dts_iosf intel_pch_thermal typec int3403_thermal int340x_thermal_zone soc_button_array int3400_thermal asus_wireless acpi_pad acpi_thermal_rel binfmt_misc ip_tables xfs libcrc32c dm_crypt hid_sensor_hub intel_ishtp_loader intel_ishtp_hid i915
[ 1110.200553]  i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm nvme ghash_clmulni_intel serio_raw nvme_core intel_ish_ipc intel_ishtp wmi i2c_hid video pinctrl_sunrisepoint pinctrl_intel
[ 1110.200575] CR2: 0000000000000080

I attached my lshw output and vmcore-dmesg.

Comment 1 vincent.datrier 2019-10-15 21:46:54 UTC
In addition, cold booting on battery power, then plugging the charger generates the "asus_wmi: Unknown key cf pressed" message. I remember seeing a different "key" one time I was quickly plugging and unplugging the charger, but not which specific key.

Comment 2 vincent.datrier 2019-10-18 08:22:00 UTC
Created attachment 1627143 [details]
Dmesg with rawhide kernel

I managed to install the Rawhide kernel, and got some more output from the bug. Please find a new dmesg attached.

Comment 3 Andrea Gagliardi La Gala 2020-01-28 10:27:01 UTC
Is there anyone looking at this? I am experiencing the exact same issue on my ASUS UX391UA laptop.

Comment 4 Hans de Goede 2020-01-28 10:36:22 UTC
This seems to be related to bug 1745924. As I mentioned there I have been talking to the upstream ucsi driver maintainer (Heikki Krogerus) and he would like for someone who is having this problem to file a bug at bugzilla.kernel.org and work through the issue with him there.

Please go to https://bugzilla.kernel.org/enter_bug.cgi?product=Drivers choosing USB as component. So that he can work directly with you to debug this. Please post a link to the new bug here, then I can assign the kernel.org bug to him.

Comment 5 Andrea Gagliardi La Gala 2020-01-31 08:25:33 UTC
Thanks Hans. I have submitted the bug: https://bugzilla.kernel.org/show_bug.cgi?id=206365

Comment 6 Hans de Goede 2020-02-02 08:42:51 UTC
Thank you, I've assigned the bug to Heikki and also send him an email about it. Lets continue this in the bugzilla.kernel.org bugzilla.

Comment 7 Hans de Goede 2020-02-14 08:40:35 UTC
Ping?

Many people seem to be hitting this, also see bug 1762031, bug 1785972, bug 1798810 and bug 1800913. Can someone seeing this issue please provide the information requested in the upstream bug to help debug this ?  :

https://bugzilla.kernel.org/show_bug.cgi?id=206365#c5

Comment 8 Hans de Goede 2020-02-14 09:41:16 UTC
After checking the upstream bug one more time, I noticed that yesterday Heikki provided a patch to test. I've started a test/scratch build of a Fedora kernel with that patch added:
https://koji.fedoraproject.org/koji/taskinfo?taskID=41491485
(note still building atm, this takes a couple of hours)

See here for generic instructions for installing a kernel directly from koji:
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

If you can reproduce the bug, by e.g. unplugging your chager, then please give this new kernel a try and let us know if it fixes things. If this new kernel does not fix things, please collect the debugging info described here:
https://bugzilla.kernel.org/show_bug.cgi?id=206365#c5

Comment 9 Hans de Goede 2020-02-15 15:00:34 UTC
The kernel test build is ready for downloading, please give it a try.

Comment 10 Hans de Goede 2020-02-21 18:47:06 UTC
Upstream has provided a second patch which should fix this. I've done a scratch-build of a Fedora kernel with that patch added:
https://koji.fedoraproject.org/koji/taskinfo?taskID=41491485

Note the build has already finished, so you can get it right away. See here for generic instructions for installing a kernel directly from koji:
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

If you can reproduce the bug, by e.g. unplugging your chager, then please give this new kernel a try and let us know if it fixes things. If this new kernel does not fix things, please collect the debugging info described here:
https://bugzilla.kernel.org/show_bug.cgi?id=206365#c5

Comment 11 Hans de Goede 2020-02-23 21:10:11 UTC
I accidentally posted the link to the old, first test build in my last comment. 

The correct link for the new build is:
https://koji.fedoraproject.org/koji/taskinfo?taskID=41750652

Comment 12 Hans de Goede 2020-03-14 14:35:18 UTC
2 patches which should fix these have been posted upstream.

These 2 patches are headed on their way to the mainline kernel. I've added them as downstream patches to the Fedora kernels for F30 and F31 for now.

These 2 patches will be in the next official Fedora kernel build for f30 + f31, which will be either 5.5.9-201.fc31 or 5.5.10.

There are multiple bugs open for this issue, since these 2 patches should fix all cases, I'm marking all the bugs as duplicates of bug 1785972, where most of the discussion on this has happened.

Please give the next official kernel builds a try once it hit updates-testing, and add a note to bug 1785972 if it resolves things for you (or not).

*** This bug has been marked as a duplicate of bug 1785972 ***


Note You need to log in before you can comment on or make changes to this bug.