Bug 1948468

Summary: kernel NULL pointer dereference when using pinctrl_elkhartlake
Product: [Fedora] Fedora Reporter: Gilles Buloz <gilles.buloz>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: acaringi, adscvr, airlied, alciregi, andy.shevchenko, bskeggs, hdegoede, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-22 15:21:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full kernel log
none
gziped output of acpidump. Please note this is a custom board with a custom BIOS based on AMI BIOS none

Description Gilles Buloz 2021-04-12 09:16:30 UTC
Created attachment 1771318 [details]
Full kernel log

Description of problem:
When booting on an ElkhartLake CPU system, I get a "BUG: kernel NULL pointer dereference, address: 0000000000000000". This makes "systemd-udevd" hang on device INTC1020:00 for several minutes, leading to a very long boot time.

Workaround : blacklist module pinctrl_elkhartlake with the following command line :
module_blacklist=pinctrl_elkhartlake
Note : this module is in my initrd image

Release number and kernel :
Fedora 33 kernel 5.11.11-200.fc33.x86_64

Older versions :
The problem also occurs on previous kernel versions, probably since pinctrl_elkhartlake has been added to kernel

Steps to Reproduce:
Just boot the system with default kernel boot options. But this issue probably occurs only with the specific ACPI of my system.

Kernel log : see attachment

Comment 1 Gilles Buloz 2021-06-15 10:12:18 UTC
Problem still present with kernel 5.12.9-200.fc33.x86_64

Comment 2 Andy Shevchenko 2021-06-15 12:55:09 UTC
I hope that the firmware update will fix this (When? It's another question on which I have no answer). Otherwise we would need to get driver rewritten.

Comment 3 Gilles Buloz 2021-06-15 14:57:09 UTC
Which firmware ? Do you mean BIOS ? We are currently using a custom BIOS (modified by our BIOS team).

Comment 4 Andy Shevchenko 2021-06-15 19:11:36 UTC
(In reply to Gilles Buloz from comment #3)
> Which firmware ? Do you mean BIOS ? We are currently using a custom BIOS
> (modified by our BIOS team).

Yes.

I believe it's easy to fix issue, but the problem is that it should be modified in all BIOSes for this platform (i.o.w. the reference firmware should do the same).

Comment 5 Gilles Buloz 2021-06-22 13:57:21 UTC
Is there at least a workaround on BIOS or Linux side ?
I have to blacklist this pinctrl_elkhartlake driver and I suspect this is causing this audio codec issue due to i2c_acpi_get_irq() calling acpi_dev_gpio_irq_get() returning -EPROBE_DEFER.
https://github.com/thesofproject/linux/issues/2990#issuecomment-865844386

Comment 6 Andy Shevchenko 2021-06-22 14:21:45 UTC
Either BIOS should provide proper tables, or driver should be completely rewritten. There is no other "workaround" available. As a driver author I don't know if any of this is going to happen (I have no requirement so far to rewrite the driver).

Comment 7 Gilles Buloz 2021-06-22 14:27:52 UTC
OK, I understand.
As my company is building a specific BIOS for our boards, could you tell me which modification/addon would be required in the BIOS tables so that we can give it a try ?

Comment 8 Andy Shevchenko 2021-06-22 15:21:31 UTC
I'm not a firmware guy, I never saw the BIOS sources for this platform.

Nevertheless, the problem here is that we need to have synchronized this change along all stakeholders (vendor(s) and customer(s) together) to avoid ambiguity of the ACPI HID usage. If you want to perform such changes you have to allocate a new ACPI HID (see https://uefi.org on how to achieve that in case your company has neither PNP nor ACPI vendor ID).

BUT... we really, really won't having two drivers for the very same hardware due to platform being desynchronized between stakeholders. That said, I'll try hard to do anything to the driver in upstream until the global agreement on what is the road map here will be reached.

I would recommend to ping Intel representative on this.

So, for now I have taken the liberty to close this report till we have *crystal clear* understanding what should be modified and in which way.
Feel free to reopen when you have gotten any new relevant information on it.

Comment 9 Andy Shevchenko 2021-06-22 15:22:50 UTC
I'm not a firmware guy, I never saw the BIOS sources for this platform.

Nevertheless, the problem here is that we need to have synchronized this change along all stakeholders (vendor(s) and customer(s) together) to avoid ambiguity of the ACPI HID usage. If you want to perform such changes you have to allocate a new ACPI HID (see https://uefi.org on how to achieve that in case your company has neither PNP nor ACPI vendor ID).

BUT... we really, really won't having two drivers for the very same hardware due to platform being desynchronized between stakeholders. That said, I'll try hard to avoid doing anything to the driver in upstream until the global agreement on what is the road map here will be reached.

I would recommend to ping Intel representative on this.

So, for now I have taken the liberty to close this report till we have *crystal clear* understanding what should be modified and in which way.
Feel free to reopen when you have gotten any new relevant information on it.

Comment 10 Andy Shevchenko 2021-06-22 15:30:45 UTC
I'm not a firmware guy, I never saw the BIOS sources for this platform.

Nevertheless, the problem here is that we need to have synchronized this change along all stakeholders (vendor(s) and customer(s) together) to avoid ambiguity of the ACPI HID usage. If you want to perform such changes you have to allocate a new ACPI HID (see https://uefi.org on how to achieve that in case your company has neither PNP nor ACPI vendor ID).

BUT... we really, really won't having two drivers for the very same hardware due to platform being desynchronized between stakeholders. That said, I'll try hard to avoid doing anything to the driver in upstream until the global agreement on what is the road map here will be reached.

I would recommend to ping Intel representative on this.

So, for now I have taken the liberty to close this report till we have *crystal clear* understanding what should be modified and in which way.
Feel free to reopen when you have gotten any new relevant information on it.

Comment 11 Gilles Buloz 2022-08-26 07:53:21 UTC
The problem has been fixed last month by updating the BIOS to a new one based on AMI codebase "5.19_1AWHS_RC1.5.0_016" (Intel RC version 1.5.0 / Intel BKC version 2022_WW14 MR3 (ID #726728)) that has the "pinctrl" fix (enabled by default).
I now use it with Fedora 36 without problem.
This also solved my audio problems because the GPIO support was required here (see https://github.com/thesofproject/linux/issues/2990 )

Comment 12 Andy Shevchenko 2022-08-26 09:13:26 UTC
(In reply to Gilles Buloz from comment #11)
> The problem has been fixed last month by updating the BIOS to a new one
> based on AMI codebase "5.19_1AWHS_RC1.5.0_016" (Intel RC version 1.5.0 /
> Intel BKC version 2022_WW14 MR3 (ID #726728)) that has the "pinctrl" fix
> (enabled by default).
> I now use it with Fedora 36 without problem.
> This also solved my audio problems because the GPIO support was required
> here (see https://github.com/thesofproject/linux/issues/2990 )

Thank you very much for this good news!

As I read above Intel is deploying a new BIOS for OEM/ODM/customers for the Elkhart Lake to fix the pin control issue (and maybe others) and use the driver that is in upstream. It means that the all firmware vendors who want to run their products on vanilla Linux kernel must update their BIOS. It also means that upstream has nothing to update in order to support customers. This is now all clear.

Btw, Gilles, is it possible that you can share the `acpidump -o elkhartlake-tables.dat` file somewhere?

Comment 13 Gilles Buloz 2022-08-26 12:16:40 UTC
Created attachment 1907852 [details]
gziped output of acpidump. Please note this is a custom board with a custom BIOS based on AMI BIOS