Bug 1948468 - kernel NULL pointer dereference when using pinctrl_elkhartlake
Summary: kernel NULL pointer dereference when using pinctrl_elkhartlake
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-12 09:16 UTC by Gilles Buloz
Modified: 2022-08-26 12:16 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-22 15:21:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Full kernel log (295.59 KB, text/plain)
2021-04-12 09:16 UTC, Gilles Buloz
no flags Details
gziped output of acpidump. Please note this is a custom board with a custom BIOS based on AMI BIOS (276.39 KB, application/gzip)
2022-08-26 12:16 UTC, Gilles Buloz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github thesofproject linux issues 2828 0 None closed [BUG][EHL] Kernel boot failure on EHL with PINCTL enabled 2022-08-25 19:28:35 UTC
Linux Kernel 213365 0 P1 RESOLVED Elkhart Lake: kernel NULL pointer dereference in intel_pinctrl_get_soc_data 2022-08-25 18:42:37 UTC

Description Gilles Buloz 2021-04-12 09:16:30 UTC
Created attachment 1771318 [details]
Full kernel log

Description of problem:
When booting on an ElkhartLake CPU system, I get a "BUG: kernel NULL pointer dereference, address: 0000000000000000". This makes "systemd-udevd" hang on device INTC1020:00 for several minutes, leading to a very long boot time.

Workaround : blacklist module pinctrl_elkhartlake with the following command line :
module_blacklist=pinctrl_elkhartlake
Note : this module is in my initrd image

Release number and kernel :
Fedora 33 kernel 5.11.11-200.fc33.x86_64

Older versions :
The problem also occurs on previous kernel versions, probably since pinctrl_elkhartlake has been added to kernel

Steps to Reproduce:
Just boot the system with default kernel boot options. But this issue probably occurs only with the specific ACPI of my system.

Kernel log : see attachment

Comment 1 Gilles Buloz 2021-06-15 10:12:18 UTC
Problem still present with kernel 5.12.9-200.fc33.x86_64

Comment 2 Andy Shevchenko 2021-06-15 12:55:09 UTC
I hope that the firmware update will fix this (When? It's another question on which I have no answer). Otherwise we would need to get driver rewritten.

Comment 3 Gilles Buloz 2021-06-15 14:57:09 UTC
Which firmware ? Do you mean BIOS ? We are currently using a custom BIOS (modified by our BIOS team).

Comment 4 Andy Shevchenko 2021-06-15 19:11:36 UTC
(In reply to Gilles Buloz from comment #3)
> Which firmware ? Do you mean BIOS ? We are currently using a custom BIOS
> (modified by our BIOS team).

Yes.

I believe it's easy to fix issue, but the problem is that it should be modified in all BIOSes for this platform (i.o.w. the reference firmware should do the same).

Comment 5 Gilles Buloz 2021-06-22 13:57:21 UTC
Is there at least a workaround on BIOS or Linux side ?
I have to blacklist this pinctrl_elkhartlake driver and I suspect this is causing this audio codec issue due to i2c_acpi_get_irq() calling acpi_dev_gpio_irq_get() returning -EPROBE_DEFER.
https://github.com/thesofproject/linux/issues/2990#issuecomment-865844386

Comment 6 Andy Shevchenko 2021-06-22 14:21:45 UTC
Either BIOS should provide proper tables, or driver should be completely rewritten. There is no other "workaround" available. As a driver author I don't know if any of this is going to happen (I have no requirement so far to rewrite the driver).

Comment 7 Gilles Buloz 2021-06-22 14:27:52 UTC
OK, I understand.
As my company is building a specific BIOS for our boards, could you tell me which modification/addon would be required in the BIOS tables so that we can give it a try ?

Comment 8 Andy Shevchenko 2021-06-22 15:21:31 UTC
I'm not a firmware guy, I never saw the BIOS sources for this platform.

Nevertheless, the problem here is that we need to have synchronized this change along all stakeholders (vendor(s) and customer(s) together) to avoid ambiguity of the ACPI HID usage. If you want to perform such changes you have to allocate a new ACPI HID (see https://uefi.org on how to achieve that in case your company has neither PNP nor ACPI vendor ID).

BUT... we really, really won't having two drivers for the very same hardware due to platform being desynchronized between stakeholders. That said, I'll try hard to do anything to the driver in upstream until the global agreement on what is the road map here will be reached.

I would recommend to ping Intel representative on this.

So, for now I have taken the liberty to close this report till we have *crystal clear* understanding what should be modified and in which way.
Feel free to reopen when you have gotten any new relevant information on it.

Comment 9 Andy Shevchenko 2021-06-22 15:22:50 UTC
I'm not a firmware guy, I never saw the BIOS sources for this platform.

Nevertheless, the problem here is that we need to have synchronized this change along all stakeholders (vendor(s) and customer(s) together) to avoid ambiguity of the ACPI HID usage. If you want to perform such changes you have to allocate a new ACPI HID (see https://uefi.org on how to achieve that in case your company has neither PNP nor ACPI vendor ID).

BUT... we really, really won't having two drivers for the very same hardware due to platform being desynchronized between stakeholders. That said, I'll try hard to avoid doing anything to the driver in upstream until the global agreement on what is the road map here will be reached.

I would recommend to ping Intel representative on this.

So, for now I have taken the liberty to close this report till we have *crystal clear* understanding what should be modified and in which way.
Feel free to reopen when you have gotten any new relevant information on it.

Comment 10 Andy Shevchenko 2021-06-22 15:30:45 UTC
I'm not a firmware guy, I never saw the BIOS sources for this platform.

Nevertheless, the problem here is that we need to have synchronized this change along all stakeholders (vendor(s) and customer(s) together) to avoid ambiguity of the ACPI HID usage. If you want to perform such changes you have to allocate a new ACPI HID (see https://uefi.org on how to achieve that in case your company has neither PNP nor ACPI vendor ID).

BUT... we really, really won't having two drivers for the very same hardware due to platform being desynchronized between stakeholders. That said, I'll try hard to avoid doing anything to the driver in upstream until the global agreement on what is the road map here will be reached.

I would recommend to ping Intel representative on this.

So, for now I have taken the liberty to close this report till we have *crystal clear* understanding what should be modified and in which way.
Feel free to reopen when you have gotten any new relevant information on it.

Comment 11 Gilles Buloz 2022-08-26 07:53:21 UTC
The problem has been fixed last month by updating the BIOS to a new one based on AMI codebase "5.19_1AWHS_RC1.5.0_016" (Intel RC version 1.5.0 / Intel BKC version 2022_WW14 MR3 (ID #726728)) that has the "pinctrl" fix (enabled by default).
I now use it with Fedora 36 without problem.
This also solved my audio problems because the GPIO support was required here (see https://github.com/thesofproject/linux/issues/2990 )

Comment 12 Andy Shevchenko 2022-08-26 09:13:26 UTC
(In reply to Gilles Buloz from comment #11)
> The problem has been fixed last month by updating the BIOS to a new one
> based on AMI codebase "5.19_1AWHS_RC1.5.0_016" (Intel RC version 1.5.0 /
> Intel BKC version 2022_WW14 MR3 (ID #726728)) that has the "pinctrl" fix
> (enabled by default).
> I now use it with Fedora 36 without problem.
> This also solved my audio problems because the GPIO support was required
> here (see https://github.com/thesofproject/linux/issues/2990 )

Thank you very much for this good news!

As I read above Intel is deploying a new BIOS for OEM/ODM/customers for the Elkhart Lake to fix the pin control issue (and maybe others) and use the driver that is in upstream. It means that the all firmware vendors who want to run their products on vanilla Linux kernel must update their BIOS. It also means that upstream has nothing to update in order to support customers. This is now all clear.

Btw, Gilles, is it possible that you can share the `acpidump -o elkhartlake-tables.dat` file somewhere?

Comment 13 Gilles Buloz 2022-08-26 12:16:40 UTC
Created attachment 1907852 [details]
gziped output of acpidump. Please note this is a custom board with a custom BIOS based on AMI BIOS


Note You need to log in before you can comment on or make changes to this bug.