Bug 1830150 - [BISECTED] Fedora 31 with kernel 5.6 does not wake up from suspend
Summary: [BISECTED] Fedora 31 with kernel 5.6 does not wake up from suspend
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1829096 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-30 22:59 UTC by William Bader
Modified: 2020-06-17 00:03 UTC (History)
23 users (show)

Fixed In Version: kernel-5.6.14-300.fc32 kernel-5.6.15-200.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-25 02:46:42 UTC
Type: Bug


Attachments (Terms of Use)
journalctl from 5.6.7 with a failed resume from suspending (79.51 KB, text/plain)
2020-04-30 22:59 UTC, William Bader
no flags Details
journalctl from 5.5.17 with a successful resume from suspending (92.17 KB, text/plain)
2020-04-30 23:01 UTC, William Bader
no flags Details
more dmesg logs (27.46 KB, application/x-bzip)
2020-05-01 06:53 UTC, William Bader
no flags Details
patch against 5.7-rc2 that fixes the problem (1.38 KB, patch)
2020-05-05 03:30 UTC, William Bader
no flags Details | Diff
journalctl -b -1 --no-hostname -k from boot with second patch only (98.10 KB, text/plain)
2020-05-07 16:48 UTC, William Bader
no flags Details


Links
System ID Priority Status Summary Last Updated
Linux Kernel 207491 None None None 2020-05-05 16:26:54 UTC

Description William Bader 2020-04-30 22:59:41 UTC
Created attachment 1683497 [details]
journalctl from 5.6.7 with a failed resume from suspending

1. Please describe the problem:

I have Fedora 31 with the Mate Desktop on a Sony Vaio laptop.
When I use the system menu and select Shut Down... -> Suspend
the laptop goes to sleep, but when it wakes up, it shows the themed desktop background but the mouse isn't responsive and it doesn't show a login dialog.

2. What is the Version-Release number of the kernel:

The kernel 5.6.7-200.fc31.x86_64 has the problem.
The kernel 5.5.17-200.fc31.x86_64 works.
I have tried booting both kernels with no other changes, so the problem is the kernel and not the desktop.

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I can try more kernels if requested.
I can try to bisect it if requested.

4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below:

It happens every time if I suspend. I tested it a few times to be sure.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I can try this if requested.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 William Bader 2020-04-30 23:01:23 UTC
Created attachment 1683500 [details]
journalctl from 5.5.17 with a successful resume from suspending

Comment 2 Steve 2020-05-01 01:30:07 UTC
Thanks for your report. Could you test without any non-Fedora kernel modules:

Apr 30 21:44:54 kernel: vboxdrv: loading out-of-tree module taints kernel.

Can you get to a console (ctrl-alt-f2)?

Comment 3 Steve 2020-05-01 01:33:25 UTC
This is in updates-testing:

kernel-5.6.8-200.fc31 
https://bodhi.fedoraproject.org/updates/FEDORA-2020-b453269c4e

Comment 4 William Bader 2020-05-01 06:53:47 UTC
Created attachment 1683588 [details]
more dmesg logs

>Can you get to a console (ctrl-alt-f2)?

No, I can't switch to a console. I tried that first because I initially thought that the problem was Mate Desktop, but it seems to depend on the kernel. 5.6 kernels do not resume. 5.5 kernels are OK.
I tried ctrl-alt-f2, ctrl-alt-f3, etc. plus ctrl-alt-del.
I eventually gave up and shut down by holding the power button.
(When I was testing kernels, I did a few syncs before suspending. My system disk is an SSD with ext4.)

>This is in updates-testing: kernel-5.6.8-200.fc31 

I used koji to try kernel-5.6.8-200.fc31 (the most recent), kernel-5.6.3-300.fc31 (the first 5.6 kernel), and kernel-5.5.18-200.fc31 (the last 5.5.18 kernel).

I have attached journalctl logs.
5.6.8 and 5.6.3 hung on resume. 5.5.18 worked.

>Thanks for your report. Could you test without any non-Fedora kernel modules:
>Apr 30 21:44:54 kernel: vboxdrv: loading out-of-tree module taints kernel.

egrep -i 'vbox|taint' dmesg*.txt
finds a match only in the 5.5.18 dmesg, which resumes correctly.

virtualbox might not be in the kernels I installed from koji because it might not have automatically updated it the way that updating through dnfdragora would.

I don't have a pressing need for virtualbox, so I can remove it if necessary as long as I don't lose my VMs.

When it resumes it paints the blue-themed fedora background splash screen with the mouse pointer at the same location (I think) as when I clicked to suspend, so I think that the video isn't the problem. Not being able to switch to a text console means (I think) that it isn't a USB or network issue because I recently tried to boot a custom kernel without the ehci-pci driver loaded, and I could still switch to a text console and log in.
I can take a photo of the hung resume screen if it would help.

https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.6 mentions 'suspend' and 'resume' a lot of times.

$ grep -i suspend ChangeLog-5.6.txt | wc -l
549
$ grep -i resume ChangeLog-5.6.txt | wc -l
408

Comment 5 Hans de Goede 2020-05-01 09:03:58 UTC
This sounds like it is a duplicate of bug 1829096. See that bug for 2 possible workarounds (blacklist sony-laptop or remove it before suspend and modprobe it again after resume).

As mentioned in that bug we really need someone to bisect this to get the root cause of this issue.

Comment 6 Steve 2020-05-01 14:08:56 UTC
(In reply to William Bader from comment #4)
> more dmesg logs

Thanks for attaching the logs. Unfortunately, all the "bad" logs are truncated at:

Apr 30 23:36:26 kernel: PM: suspend entry (deep)

> No, I can't switch to a console.

OK.

> I don't have a pressing need for virtualbox, so I can remove it if necessary as long as I don't lose my VMs.

If the virtualbox kernel modules aren't being loaded in the failure cases, that is good enough.

> 5.6.8 and 5.6.3 hung on resume. 5.5.18 worked.

Following Hans's suggestion (Comment 5), try blacklisting sony-laptop by appending this to the kernel command-line from grub2:

"module_blacklist=sony-laptop" (without the quotes)

with 5.6.8.

Documentation:

The kernel’s command-line parameters
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

Comment 7 Steve 2020-05-01 15:06:34 UTC
(In reply to Steve from comment #6)
...
> "module_blacklist=sony-laptop" (without the quotes)
> 
> with 5.6.8.
...

Actually, it might be better to try that with 5.6.3, since that is the earliest "bad" case, according to your "lite" bisection results:

5.6.8 bad
5.6.7 bad
5.6.3 bad
5.5.18 good
5.5.17 good

There are some earlier 5.6 builds on Koji. This is the mainline 5.6 release version:

Information for build kernel-5.6.0-300.fc32
https://koji.fedoraproject.org/koji/buildinfo?buildID=1486276

If that is "bad", we will have to dig into the 5.6-rcN versions.

Comment 8 Steve 2020-05-01 16:58:55 UTC
(In reply to Steve from comment #7)
> If that is "bad", we will have to dig into the 5.6-rcN versions.

Here is what I believe is a complete list of 5.6-rcN builds. The gitN builds where N>0 are Fedora snapshot builds, so look for the git0 builds to see what was officially released:

$ koji list-builds --package=kernel --state=COMPLETE --after=2020-01-01 --quiet | grep 'kernel-5.6.0-0.rc.*fc32' | sort -Vr
kernel-5.6.0-0.rc7.git0.2.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc6.git0.2.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc5.git0.2.fc32                           jcline            COMPLETE <<< Occasionally, you will see a rebuild.*
kernel-5.6.0-0.rc5.git0.1.fc32                           jcline            COMPLETE <<< Here, git0.1 was rebuilt as git0.2.
kernel-5.6.0-0.rc4.git0.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc3.git0.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc2.git0.1.fc32                           pbrobinson        COMPLETE
kernel-5.6.0-0.rc1.git2.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc1.git0.1.fc32                           jcline            COMPLETE <<< 5.6-rc1
kernel-5.6.0-0.rc0.git5.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc0.git4.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc0.git3.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc0.git2.1.fc32                           jcline            COMPLETE
kernel-5.6.0-0.rc0.git1.1.fc32                           jcline            COMPLETE

And for background, here is Linus's announcement for 5.6-rc1:

From	Linus Torvalds <>
Date	Sun, 9 Feb 2020 16:43:41 -0800
Subject	Linux 5.6-rc1
https://lkml.org/lkml/2020/2/9/221

"The rc1 tag has been pushed out, and so the merge window for 5.6 is closed."
                                         ^^^^^^^^^^^^^^^^

The merge window is the period between the last mainline release and the -rc1 release for the next mainline release. That's when all the bugs get introduced. :-)

* The changelog shows that some Fedora patches were added for the rebuild:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1476218

Comment 9 William Bader 2020-05-01 17:20:41 UTC
Thanks for the replies.

>This sounds like it is a duplicate of bug 1829096. 

I booted with 5.6.8-200.fc31.x86_64 this morning.
I ran 'modprobe -r sony-laptop' (one of the work-arounds mentioned in that bug) and then suspend and resume worked.
I think that the bugs are duplicates.
I searched for 'suspend' before creating this issue. I must have missed it.

I suppose that means that means a bisection from 5.5.18 to 5.6.0.
I still have the kernel build area from bug 1818952 so I can try a bisection this weekend.

>Thanks for attaching the logs. Unfortunately, all the "bad" logs are truncated at:
>Apr 30 23:36:26 kernel: PM: suspend entry (deep)

I suppose that is a symptom of having to power off and reboot.
The person who reported it first booted with no_console_suspend

>koji list-builds --package=kernel --state=COMPLETE --after=2020-01-01 --quiet | grep 'kernel-5.6.0-0.rc.*fc32' | sort -Vr

I have Fedora 31 on my laptop. When I replace 32 with 31 on that command, it does not show any builds.
Can I use a Fedora 32 kernel?

Comment 10 Steve 2020-05-01 17:48:56 UTC
(In reply to William Bader from comment #9)
...
> I have Fedora 31 on my laptop. When I replace 32 with 31 on that command, it does not show any builds.
> Can I use a Fedora 32 kernel?

Yes. I frequently do test installs where the Fedora version in the kernel version string doesn't match the Fedora release version of the test system.

The "fc32" in the kernel version really just identifies the Fedora repo that the kernel comes from. Kernels downloaded from Koji aren't coming from a Fedora repo, so you can "mix and match". :-)

Comment 11 Steve 2020-05-01 18:28:33 UTC
(In reply to William Bader from comment #9)
> I suppose that means that means a bisection from 5.5.18 to 5.6.0.

I'm not sure that is going to be possible, because they are on different stable branches. The way all of this branching and merging works is much easier to see with "gitk", which is a GUI app. "gitk" is the name of the package.

Usage:

$ cd linux-5.6
$ gitk

Comment 12 Steve 2020-05-01 19:10:38 UTC
I think this is what we want to bisect on:

Using the stable repo:

$ git config --get remote.origin.url
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/

Bisect on the linux-5.6.y branch:

$ git branch
  linux-5.4.y
  linux-5.5.y
* linux-5.6.y

Between these tags:

$ git log --oneline --no-merges --topo-order v5.5^..v5.6.3 | wc -l
12732

That range covers the 5.6-rc1 merge window. (Comment 8)

But it excludes all 5.5.y versions, which are on a separate branch.

log2(12732) ~= 14 bisection builds.

Comment 13 William Bader 2020-05-01 19:47:46 UTC
>git log --oneline --no-merges --topo-order v5.5^..v5.6.3

I did git bisect good v5.5.18; git bisect bad v5.6.1
Bisecting: a merge base must be tested
[d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5

Can I continue with that or should I start over?
The last commit listed by your git log command is the first git that my bisection is trying.
git log --oneline --no-merges --topo-order v5.5^..v5.6.3 | tail -1
d5226fa6dbae Linux 5.5

Bug 1829096 says that it happens with 5.6.0-300.fc32 so is it necessary to look all the way to 5.6.3?

Comment 14 Steve 2020-05-01 20:17:12 UTC
(In reply to William Bader from comment #13)
> >git log --oneline --no-merges --topo-order v5.5^..v5.6.3
> 
> I did git bisect good v5.5.18; git bisect bad v5.6.1
> Bisecting: a merge base must be tested
> [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
> 
> Can I continue with that or should I start over?
> The last commit listed by your git log command is the first git that my bisection is trying.
> git log --oneline --no-merges --topo-order v5.5^..v5.6.3 | tail -1
> d5226fa6dbae Linux 5.5

"git log" lists commits in reverse order, so that is the "first" (earliest) commit. A "good" version on the 5.6.y branch needs to be found, but you don't need to start over. Git is telling you that "Linux 5.5" needs to be verified as "good" or "bad":

$ git checkout v5.5
Checking out files: 100% (11913/11913), done.
Note: checking out 'v5.5'.
...

Make, test.

> Bug 1829096 says that it happens with 5.6.0-300.fc32 so is it necessary to look all the way to 5.6.3?

No. I was going by what you reported. Using 5.6.0 would narrow the range.

You can get a list of tags with:

$ git tag --list

From that output, the exact tag for the upper bisection bound is "v5.6".

Comment 15 Steve 2020-05-01 20:46:29 UTC
I started over with a clean clone of the linux-5.6.y stable branch:

$ git clone --shallow-exclude linux-5.4.y --branch linux-5.6.y https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git linux-stable

Narrowing the range doesn't reduce the number of commits by much:

$ git log --oneline --no-merges --topo-order v5.5^..v5.6.3 | wc -l
12990

$ git log --oneline --no-merges --topo-order v5.5^..v5.6 | wc -l
12924

This shows that most of the commits are added in the merge window:

$ git log --oneline --no-merges --topo-order v5.5^..v5.6-rc1 | wc -l
11039

Comment 16 William Bader 2020-05-02 01:08:03 UTC
>Yes. I frequently do test installs where the Fedora version in the kernel version string doesn't match the Fedora release version of the test system.
>The "fc32" in the kernel version really just identifies the Fedora repo that the kernel comes from. Kernels downloaded from Koji aren't coming from a Fedora repo, so you can "mix and match". :-)

Thanks, I'll try that with the kernel for bug 1818952.

I am running bisections now.
[d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 <- good
[9f68e3655aae6d49d6ba05dd263f99f33c2567af] Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm <- bad
[fb95aae6e67c4e319a24b3eea32032d4246a5335] Merge tag 'sound-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound <- building

make oldconfig has been asking about a lot of new kernel options. For the second build, it asked for values for about 50 new options. I used values from the Fedora config in /boot.
Since the second build is probably the most recent commit that the bisection will explore, can I use its config for the other builds and take the default for questions?

With all of the koji and bisection kernels, I filled up /boot and I had to clean it up.

I have been booting from a 5.6.8 kernel that I downloaded from koji.
Since it has been ok, I tried removing the 5.6.7 kernel.
$ rpm -qa | grep '^kernel.*5.6.7'
kernel-devel-5.6.7-200.fc31.x86_64
kernel-core-5.6.7-200.fc31.x86_64
kernel-tools-libs-5.6.7-200.fc31.x86_64
kernel-debug-devel-5.6.7-200.fc31.x86_64
kernel-modules-5.6.7-200.fc31.x86_64
kernel-5.6.7-200.fc31.x86_64
kernel-headers-5.6.7-200.fc31.x86_64

I tried removing those packages, and dnf wanted to remove a very large list of packages, so I said no.

Once 5.6.8 becomes the stable Fedora kernel, will dnfdragora automatically update kernel-devel and kernel-tools (which I suspect have the big list of dependencies)?

Comment 17 Steve 2020-05-02 02:09:01 UTC
> make oldconfig has been asking about a lot of new kernel options. For the second build, it asked for values for about 50 new options. I used values from the Fedora config in /boot.

You did all 50 manually?

> Since the second build is probably the most recent commit that the bisection will explore, can I use its config for the other builds and take the default for questions?

Good question. Since we aren't trying to debug the config file, my view is that it should match the Fedora config as closely as possible. If that involves massive changes, we will need to come up with a way to update the config file more efficiently.

I don't have a good suggestion, but the software process is basically one of *merging* two config files and resolving any conflicts. Using the kernel's tools is probably the most reliable way, but I will refer you to a superb GUI app for viewing diffs and merging files:

meld
https://meldmerge.org/

"Meld is a visual diff and merge tool targeted at developers. Meld helps you compare files, directories, and version controlled projects. It provides two- and three-way comparison of both files and directories, and has support for many popular version control systems."

The Fedora package is "meld".

BTW, if you install "rcs", you can put your config files under version control without bothering git in the least. I used rcs for every config change I made while doing builds of longterm kernel 4.19.119 (https://www.kernel.org/), and it worked perfectly.

Comment 18 Steve 2020-05-02 02:21:56 UTC
> Since the second build is probably the most recent commit that the bisection will explore, can I use its config for the other builds and take the default for questions?

I changed my mind. :-)

If you can reproduce the problem with defaults, then that is good enough for bisection. There is some chance that the config file could affect the results, but in the end, git points you to a specific commit, which you can test (by reverting) with a specific config file.

Comment 19 Steve 2020-05-02 03:13:07 UTC
> I have been booting from a 5.6.8 kernel that I downloaded from koji.
> I tried removing those [5.6.7] packages, and dnf wanted to remove a very large list of packages, so I said no.

5.6.8 is in updates-testing, so try updating everything matching kernel\*, and then removing any remaining 5.6.7 kernel packages.

# dnf update kernel\* --enablerepo=updates-testing

Comment 20 Steve 2020-05-02 03:29:31 UTC
> make oldconfig has been asking about a lot of new kernel options.

There are two make targets that might help with, if not solve, the config-merging problem:

$ make help
...
  listnewconfig   - List new options
  helpnewconfig   - List new options and help text
...

Comment 21 Steve 2020-05-02 03:59:33 UTC
> ... and take the default for questions?

That can be automated:

$ make help
...
  olddefconfig    - Same as oldconfig but sets new symbols to their
                    default value without prompting
...

Comment 22 William Bader 2020-05-02 07:34:23 UTC
>> make oldconfig has been asking about a lot of new kernel options. For the second build, it asked for values for about 50 new options. I used values from the Fedora config in /boot.
>You did all 50 manually?

Yes. coronavirus lockdown. I kept notes:

1 [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 / good
        CONFIG_X86_INTEL_MPX y / CONFIG_VIRTIO_BLK_SCSI n / CONFIG_I2C_PARPORT_LIGHT m / CONFIG_GPIO_LYNXPOINT n
        CONFIG_DRM_AMD_DC_DCN2_0 y / CONFIG_DRM_AMD_DC_DCN2_1 y / CONFIG_DRM_AMD_DC_DSC_SUPPORT y / CONFIG_EXFAT_FS n / CONFIG_THUNDERBOLT m / CONFIG_X86_PTDUMP n
        CONFIG_THUNDERBOLT_NET m
2 [9f68e3655aae6d49d6ba05dd263f99f33c2567af] Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm / bad
        CONFIG_TIME_NS y / CONFIG_EFI_DISABLE_PCI_DMA n / CONFIG_INET_ESPINTCP y / CONFIG_MPTCP y / CONFIG_MPTCP_IPV6 y / CONFIG_MPTCP_HMAC_TEST n
        CONFIG_NET_DSA_TAG_AR9331 n / CONFIG_NET_SCH_FQ_PIE n / CONFIG_NET_SCH_ETS n / CONFIG_VSOCKETS_LOOPBACK m / CONFIG_ETHTOOL_NETLINK y
        CONFIG_WIREGUARD m / CONFIG_WIREGUARD_DEBUG n / CONFIG_NET_DSA_AR9331 n / CONFIG_NET_DSA_VITESSE_VSC73XX_SPI n / CONFIG_NET_DSA_VITESSE_VSC73XX_PLATFORM n
        CONFIG_BCM84881_PHY n / CONFIG_SERIAL_8250_16550A_VARIANTS n / CONFIG_PTP_1588_CLOCK_INES n / CONFIG_PINCTRL_LYNXPOINT m
        CONFIG_SENSORS_ADM1177 n / CONFIG_SENSORS_DRIVETEMP m / CONFIG_SENSORS_MAX31730 n / CONFIG_SENSORS_MAX20730 n / CONFIG_SENSORS_XDPE122 n
        CONFIG_REGULATOR_MP8859 n / CONFIG_SND_SOC_INTEL_USER_FRIENDLY_LONG_NAMES y / CONFIG_SND_SOC_INTEL_BDW_RT5650_MACH m / CONFIG_SND_SOC_INTEL_SOF_DA7219_MAX98373_MACH m
        CONFIG_SND_SOC_RT1308_SDW n / CONFIG_SND_SOC_RT700_SDW n / CONFIG_SND_SOC_RT711_SDW n / CONFIG_SND_SOC_RT715_SDW n / CONFIG_SND_SOC_WSA881X n / CONFIG_SND_SOC_MT6660 n
        CONFIG_INTEL_IDXD m / CONFIG_PLX_DMA n / CONFIG_DMABUF_HEAPS n / CONFIG_STAGING_EXFAT_FS n / CONFIG_INTEL_UNCORE_FREQ_CONTROL m / CONFIG_BMA400 n
        CONFIG_AD7091R5 n / CONFIG_LTC2496 n / CONFIG_DLHL60D n / CONFIG_PING n / CONFIG_PHY_INTEL_EMMC m / CONFIG_USB4 m / CONFIG_USB4_NET m / CONFIG_TEE n
        CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS 9 / CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE 256
3 [fb95aae6e67c4e319a24b3eea32032d4246a5335] Merge tag 'sound-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound / bad (resume showed blue line at bottom)
        CONFIG_GPIO_LYNXPOINT n / CONFIG_DRM_AMD_DC_DCN2_0 y / CONFIG_DRM_AMD_DC_DCN2_1 y / CONFIG_DRM_AMD_DC_DSC_SUPPORT y / CONFIG_EXFAT_FS n
        CONFIG_THUNDERBOLT m / CONFIG_THUNDERBOLT_NET m
4 [f76e4c167ea2212e23c15ee7e601a865e822c291] net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC / good
5 [c677124e631d97130e4ff7db6e10acdfb7a82321] Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

For 4 and 5, I used the same config as 2, and it asked the same questions as 3.
I suppose that as the bisection gets closer, the config files will need fewer changes.
Now that I think about it, I should probably start with the config of the last bad bisection.
5 is still building.

I suppose that for the purposes of the bisection only the ACPI options matter. I doubt that my laptop supports most of the new hardware and devices.

Comment 23 Steve 2020-05-02 17:54:31 UTC
For config 2, there are 52 options:

$ cat config-list-1.txt | sed -e 's/\//\n/g' | wc -l
52

> I suppose that for the purposes of the bisection only the ACPI options matter. I doubt that my laptop supports most of the new hardware and devices.

Good point. A lot of those options appear to be device-specific.

> CONFIG_EXFAT_FS n

That appears to be a recent addition:

$ git log --format=fuller fs/exfat/Kconfig
commit b9d1e2e6265f5dc25e9f5dbfbde3e53d8a4958ac
Author:     Namjae Jeon <...>
AuthorDate: Mon Mar 2 15:21:42 2020 +0900
Commit:     Al Viro <...>
CommitDate: Thu Mar 5 21:00:40 2020 -0500

    exfat: add Kconfig and Makefile
    
    This adds the Kconfig and Makefile for exfat.
...

Comment 24 William Bader 2020-05-03 23:37:00 UTC
I did the bisection.

6d232b29cfce65961db4a668c2c6c6987cd24d45 is the first bad commit
commit 6d232b29cfce65961db4a668c2c6c6987cd24d45
Author: Maximilian Luz <luzmaximilian@gmail.com>
Date:   Tue Dec 17 11:35:22 2019 -0800
    ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator  
    ACPICA commit 79a466b64e6af36cc83102f05915e56cb7dd89ab

Here are my notes. I think that I started with a config file from a Fedora 5.6 kernel.
I saved all of the kernels and most of the config files, so I can install any of the kernels and do more testing.

git bisect start
git bisect good v5.5.18
git bisect bad v5.6.1

(I started numbering local versions at 21 to avoid confusion with local versions from bisections for bug 1818952)
('bad' means that the resume froze with the blue-themed splash screen but no login dialog. 'bad (blue line)' means that the resume froze with a small blue bar across the bottom of the screen and the rest of the screen black. I am not sure if it is random or an indication of where it hung.)

Bisecting: a merge base must be tested <- I probably should have started the bisection at the start of the 5.6 branch.
21 [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5 / good
	CONFIG_X86_INTEL_MPX y / CONFIG_VIRTIO_BLK_SCSI n / CONFIG_I2C_PARPORT_LIGHT m / CONFIG_GPIO_LYNXPOINT n
	CONFIG_DRM_AMD_DC_DCN2_0 y / CONFIG_DRM_AMD_DC_DCN2_1 y / CONFIG_DRM_AMD_DC_DSC_SUPPORT y / CONFIG_EXFAT_FS n / CONFIG_THUNDERBOLT m / CONFIG_X86_PTDUMP n
	CONFIG_THUNDERBOLT_NET m
22 [9f68e3655aae6d49d6ba05dd263f99f33c2567af] Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm / bad
	CONFIG_TIME_NS y / CONFIG_EFI_DISABLE_PCI_DMA n / CONFIG_INET_ESPINTCP y / CONFIG_MPTCP y / CONFIG_MPTCP_IPV6 y / CONFIG_MPTCP_HMAC_TEST n
	CONFIG_NET_DSA_TAG_AR9331 n / CONFIG_NET_SCH_FQ_PIE n / CONFIG_NET_SCH_ETS n / CONFIG_VSOCKETS_LOOPBACK m / CONFIG_ETHTOOL_NETLINK y
	CONFIG_WIREGUARD m / CONFIG_WIREGUARD_DEBUG n / CONFIG_NET_DSA_AR9331 n / CONFIG_NET_DSA_VITESSE_VSC73XX_SPI n / CONFIG_NET_DSA_VITESSE_VSC73XX_PLATFORM n
	CONFIG_BCM84881_PHY n / CONFIG_SERIAL_8250_16550A_VARIANTS n / CONFIG_PTP_1588_CLOCK_INES n / CONFIG_PINCTRL_LYNXPOINT m
	CONFIG_SENSORS_ADM1177 n / CONFIG_SENSORS_DRIVETEMP m / CONFIG_SENSORS_MAX31730 n / CONFIG_SENSORS_MAX20730 n / CONFIG_SENSORS_XDPE122 n
	CONFIG_REGULATOR_MP8859 n / CONFIG_SND_SOC_INTEL_USER_FRIENDLY_LONG_NAMES y / CONFIG_SND_SOC_INTEL_BDW_RT5650_MACH m / CONFIG_SND_SOC_INTEL_SOF_DA7219_MAX98373_MACH m
	CONFIG_SND_SOC_RT1308_SDW n / CONFIG_SND_SOC_RT700_SDW n / CONFIG_SND_SOC_RT711_SDW n / CONFIG_SND_SOC_RT715_SDW n / CONFIG_SND_SOC_WSA881X n / CONFIG_SND_SOC_MT6660 n
	CONFIG_INTEL_IDXD m / CONFIG_PLX_DMA n / CONFIG_DMABUF_HEAPS n / CONFIG_STAGING_EXFAT_FS n / CONFIG_INTEL_UNCORE_FREQ_CONTROL m / CONFIG_BMA400 n
	CONFIG_AD7091R5 n / CONFIG_LTC2496 n / CONFIG_DLHL60D n / CONFIG_PING n / CONFIG_PHY_INTEL_EMMC m / CONFIG_USB4 m / CONFIG_USB4_NET m / CONFIG_TEE n
	CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS 9 / CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE 256
23 [fb95aae6e67c4e319a24b3eea32032d4246a5335] Merge tag 'sound-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound / bad (resume showed blue line at bottom)
	CONFIG_GPIO_LYNXPOINT n / CONFIG_DRM_AMD_DC_DCN2_0 y / CONFIG_DRM_AMD_DC_DCN2_1 y / CONFIG_DRM_AMD_DC_DSC_SUPPORT y / CONFIG_EXFAT_FS n
	CONFIG_THUNDERBOLT m / CONFIG_THUNDERBOLT_NET m
24 [f76e4c167ea2212e23c15ee7e601a865e822c291] net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC / good
	same as 23
25 [c677124e631d97130e4ff7db6e10acdfb7a82321] Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip / bad
	same as 23
26 [6d277aca488fdf0a1e67cd14b5a58869f66197c9] Merge tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm / good
27 [9f2a43019edc097347900daade277571834a3e2c] Merge branch 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip / bad (blue line)
28 [a56c41e5d766871231828046f477611d6ee7d2db] Merge tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip / bad
29 [22a8f39c520fc577c02b4e5c99f8bb3b6017680b] Merge tag 'for-5.6/drivers-2020-01-27' of git://git.kernel.dk/linux-block / bad (blue line)
30 [34dabd81160f7bfb18b67c1161b3c4d7ca6cab83] Merge tag 'pnp-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm / bad
31 [3dd855147feff375dfa6737331628ea919e9da59] Merge branches 'acpi-battery', 'acpi-video', 'acpi-fan' and 'acpi-drivers' / bad (blue line)
32 [ff7a672f83b355365478a1fdfb60933ef34d8d02] Merge branch 'acpica' / bad (blue line)
33 [ae6252d8dfeb21f5067a09a8f4a6dd30851d70c1] ACPICA: Update version to 20191213 / bad (blue line)
	NET_DSA_MSCC_FELIX n / PINCTRL_EQUILIBRIUM n
34 [5ddbd77181dfca61b16d2e2222382ea65637f1b9] ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1 / good
	same as 33 <- I have ccache, but most of the builds took almost 2 hours until here, and then went down to about 40 minutes.
35 [6d232b29cfce65961db4a668c2c6c6987cd24d45] ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator / bad
	same as 33
36 [69e86e59ad2a2518704a31c35530e6e99963c358] ACPICA: acpisrc: add unix line ending support for non-windows build / good
	same as 33

Steps for each bisection:

make mrproper
git bisect good-or-bad
cp ../config-resume .config
uemacs .config # bump CONFIG_LOCALVERSION
make oldconfig
cp -p .config ../config-`grep Linux .config | head -1 | awk '{print $3}'`-`grep -i CONFIG_LOCALVERSION= .config | sed -e 's/.*=".//' -e 's/"//g'`
time make
time make binrpm-pkg

Comment 25 Steve 2020-05-04 03:36:58 UTC
(In reply to William Bader from comment #24)
> I did the bisection.

Thanks for all your hard work doing this bisection and for your careful documentation. Also, thanks for updating the bug summary.

The history of that commit is complicated, so I may be misinterpreting it, but it appears to have been added to a branch from v5.5-rc2, and it wasn't merged into mainline until before v5.6-rc1.

So that explains why it isn't in 5.5.17. Basically, it was added in the merge window before v5.6-rc1.

$ git describe 6d232b29cfce65961db4a668c2c6c6987cd24d45
v5.5-rc2-4-g6d232b29cf

$ git describe --contains 6d232b29cfce65961db4a668c2c6c6987cd24d45
v5.6-rc1~191^2~2^2~4

I believe that the v5.5-rc6 tag here means that the 'acpica' merge was with a branch from v5.5-rc6:

$ git log --oneline --topo-order -n11 ff7a672f83b355365478a1fdfb60933ef34d8d02
ff7a672f83 Merge branch 'acpica' <<<- bad
be91c44288 ACPICA: Update version to 20200110
800ba7c5ea ACPICA: All acpica: Update copyrights to 2020 Including tool signons.
fbdd256fe7 ACPICA: Update the list of maintainers
ae6252d8df ACPICA: Update version to 20191213 <<<- bad
6d232b29cf ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator <<<- bad
69e86e59ad ACPICA: acpisrc: add unix line ending support for non-windows build <<<- good
5ddbd77181 ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1 <<<- good
22e38ca735 ACPICA: debugger: fix spelling mistake "adress" -> "address"
cea79e7e2f apei/ghes: Do not delay GHES polling
b3a987b026 (tag: v5.5-rc6) Linux 5.5-rc6

$ git describe ff7a672f83 <<<- Merge branch 'acpica'
v5.5-rc6-10-gff7a672f83

$ git describe --contains ff7a672f83
v5.6-rc1~191^2~2

Comment 26 William Bader 2020-05-04 04:57:39 UTC
Thanks. It took two days. I was nervous about number of times that I had to power off my laptop after failed resumes. I did a few syncs before each suspend attempt, although I hope that suspends flush everything to disk.

A lot of the bisection tests were merges. I suppose having a lot of merges is normal for the initial commits of a new kernel branch.

What should I do next?

I started this running. I suppose that it will fail to build because the patch in the commit changed the type of a variable and added a new variable.
git bisect reset   
git checkout master <- Linux 5.7-rc2
git branch william-resume-test
git checkout william-resume-test
git revert 6d232b29cfce65961db4a668c2c6c6987cd24d45
copy fedora 5.6.6 config
make olddefconfig
emacs .config
CONFIG_LOCALVERSION=".localversion41"
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT="buildidsalt1"
make oldconfig
make

Comment 27 Steve 2020-05-04 09:51:46 UTC
(In reply to William Bader from comment #26)
> Thanks. It took two days. I was nervous about number of times that I had to power off my laptop after failed resumes. I did a few syncs before each suspend attempt, although I hope that suspends flush everything to disk.

syncing is the first thing the kernel does, according to the attached 5.5.17 log:

Apr 30 23:38:22 kernel: PM: suspend entry (deep)
Apr 30 23:38:31 kernel: Filesystems sync: 0.040 seconds

What I would like to know is how the VM host handled all those builds. Did it overheat?
 
> A lot of the bisection tests were merges. I suppose having a lot of merges is normal for the initial commits of a new kernel branch.

In mainline, merges are almost the only source of commits. And the merges are all committed by Linus. However, the merges may contain merges, as in this example:

Merge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=743f05732f49bacd196306de87864aa074492026
(Click the second "parent" link to see the inner merges.)

> What should I do next?
> 
> I started this running. I suppose that it will fail to build because the patch in the commit changed the type of a variable and added a new variable.

Good point about the change. acobject.h is included by accommon.h, which is included in over 100 other files:

$ git grep 'acobject.h' | head -1
drivers/acpi/acpica/accommon.h:#include "acobject.h"		/* ACPI internal object */

$ git grep 'accommon.h' | wc -l
182

Checked with:

$ git log --oneline --no-decorate -n1 HEAD
ae83d0b416 Linux 5.7-rc2

> git bisect reset   
> git checkout master <- Linux 5.7-rc2

If you have done a "git pull", HEAD will be past 5.7-rc2, so checking out "master" may not have the intended effect (although this example doesn't show it, you could end up doing a snapshot build):

$ git pull
...
 * [new tag]               v5.7-rc4   -> v5.7-rc4

$ git log --oneline --no-decorate -n1 HEAD
0e698dfa28 Linux 5.7-rc4

> git branch william-resume-test
> git checkout william-resume-test
> git revert 6d232b29cfce65961db4a668c2c6c6987cd24d45

Doing the revert in a separate branch is a good idea. That avoids changing the "master" branch.

> copy fedora 5.6.6 config
> make olddefconfig
> emacs .config
> CONFIG_LOCALVERSION=".localversion41"
> CONFIG_LOCALVERSION_AUTO=y
> CONFIG_BUILD_SALT="buildidsalt1"
> make oldconfig
> make

Comment 28 Steve 2020-05-04 10:26:47 UTC
(In reply to Steve from comment #25)
...
> I believe that the v5.5-rc6 tag here means that the 'acpica' merge was with a branch from v5.5-rc6:
> 
> $ git log --oneline --topo-order -n11
...

Confirmed with gitk and with this command:

$ git log --oneline --first-parent -n3 ff7a672f83b355365478a1fdfb60933ef34d8d02
ff7a672f83 Merge branch 'acpica'
cea79e7e2f apei/ghes: Do not delay GHES polling
b3a987b026 (tag: v5.5-rc6) Linux 5.5-rc6

That exposes a problem: Commit cea79e7e2f isn't really part of the merge, yet it got pulled in anyway.

For the record, "GHES" is "Generic Hardware Error Source" (per drivers/acpi/apei/ghes.c).

Comment 29 Steve 2020-05-04 15:48:17 UTC
(In reply to Steve from comment #27)
...
> In mainline, merges are almost the only source of commits. And the merges are all committed by Linus. However, the merges may contain merges, as in this example:
...

Here is the mainline ACPI merge committed by Linus:

Merge tag 'acpi-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=55816dc1a50463ec0ea45954e87ec3dff70e2863

195 files changed, 511 insertions, 246 deletions

These show that the merge commit was added in the merge window for v5.6-rc1:

$ git describe 55816dc1a50463ec0ea45954e87ec3dff70e2863
v5.5-614-g55816dc1a5

$ git describe --contains 55816dc1a50463ec0ea45954e87ec3dff70e2863
v5.6-rc1~191

Comment 30 William Bader 2020-05-04 16:41:39 UTC
>What I would like to know is how the VM host handled all those builds. Did it overheat?
 
The VM has one core of a 16-core rack-mounted server in a cabinet with a dedicated cooling system.
It is a little old, so per-core, it is slightly slower than my laptop on quick cpu benchmarks, but it can run a full load without overheating, while my laptop gets hot and throttles its cpu pretty quickly.

>If you have done a "git pull", HEAD will be past 5.7-rc2, so checking out "master" may not have the intended effect

How important is that? I just wanted to switch back to a recent kernel.

I looked on google. Can I update my branch with
git checkout master <- to save my work, do I need to commit my changes before checking out another branch?
git pull
git checkout william-resume-test
git merge master <- some web sites said to add --no-ff. Would 'git rebase master' also work?

The build I started last night with the revert failed, but not with a compile error.
BTF: .tmp_vmlinux.btf: pahole (pahole) is not available
Failed to generate BTF for vmlinux
Try to disable CONFIG_DEBUG_INFO_BTF
make: *** [Makefile:1106: vmlinux] Error 1

I disabled that config option and started over.

When I was doing the bisections, I was wondering if anyone wrote a script to explore the bisection space.
Git has a procedure to recover from making bisection mistakes.
Instead of doing only one kernel build per night, a script could explore the next bisections if the first one was good or bad, and maybe even the next level after that, so in the morning, I could advance two or three steps quickly without waiting for builds.

Comment 31 Steve 2020-05-04 17:59:39 UTC
>The VM has one core of a 16-core rack-mounted server in a cabinet with a dedicated cooling system.
>It is a little old, so per-core, it is slightly slower than my laptop on quick cpu benchmarks, but it can run a full load without overheating, while my laptop gets hot and throttles its cpu pretty quickly.

Thanks. That sounds like "big iron" to me. :-) But the actual hardware appears to be quite compact:

PowerEdge Rack Servers
https://www.dell.com/en-us/work/shop/servers-storage-networking/sf/poweredge-rack-servers

>>If you have done a "git pull", HEAD will be past 5.7-rc2, so checking out "master" may not have the intended effect

>How important is that? I just wanted to switch back to a recent kernel.

OK. I misunderstood. The git commands for documenting exactly what code is being discussed or analyzed are cumbersome, so I have been fretting about how to simplify them. So far, this is the best I have come with:

$ git remote -v
origin	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ (fetch)
origin	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ (push)

$ git log --oneline -n1 HEAD
bb6d3fb354 (HEAD, tag: v5.6-rc1) Linux 5.6-rc1

>I looked on google. Can I update my branch with
>git checkout master <- to save my work, do I need to commit my changes before checking out another branch?
>git pull
>git checkout william-resume-test
>git merge master <- some web sites said to add --no-ff. Would 'git rebase master' also work?

Thanks for researching that. Git won't remove your working files. If you want to temporarily move them out of the way, you can use the "git stash" command.

>The build I started last night with the revert failed, but not with a compile error.
>BTF: .tmp_vmlinux.btf: pahole (pahole) is not available
>Failed to generate BTF for vmlinux
>Try to disable CONFIG_DEBUG_INFO_BTF
>make: *** [Makefile:1106: vmlinux] Error 1

>I disabled that config option and started over.

Evidently, "pahole" is part of yet another package that needs to be installed when doing builds:

$ dnf -Cq repoquery --whatprovides \*/pahole
dwarves-0:1.12-1.fc30.x86_64
dwarves-0:1.17-1.fc30.x86_64

>When I was doing the bisections, I was wondering if anyone wrote a script to explore the bisection space.
>Git has a procedure to recover from making bisection mistakes.
>Instead of doing only one kernel build per night, a script could explore the next bisections if the first one was good or bad, and maybe even the next level after that, so in the morning, I could advance two or three steps quickly without waiting for builds.

That's a good idea. Parallel bisections on a fully-cooled, multi-core server would be awesome. Linus thinks in terms of '"light cones" in physics' and uses the term "git space":

Subject: Re: git pull on Linux/ACPI release tree
From: Linus Torvalds <...>
Date: Tue, 10 Jan 2006 11:28:58 -0800 (PST)
https://lore.kernel.org/git/Pine.LNX.4.64.0601101111110.4939@g5.osdl.org/

Comment 32 William Bader 2020-05-04 19:01:26 UTC
>PowerEdge Rack Servers

The server is Dell hardware. We have been using Dell since SCO Xenix days. Back then, Dell was small, and they worked with small companies like us. I think that my boss actually spoke with Michael Dell a few times. A few years ago we began switching to Intel NUCs.

>Evidently, "pahole" is part of yet another package that needs to be installed when doing builds:

Thanks, I will try installing dwarves.

Building with CONFIG_DEBUG_INFO_BTF disabled was bad. The kernel rpm took half an hour to build (instead of the usual 4 minutes) and ended up as 1,092,970,184 bytes instead of the usual 77MB. How could disabling debug info make it so much bigger? Did disabling CONFIG_DEBUG_INFO_BTF mean that it still included debug info but in a less compact format?

That kernel won't fit in my /boot, so I am going to restart with CONFIG_DEBUG_INFO_BTF=y as before and hope that installing the dwarf debugging support library helps.

>The git commands for documenting exactly what code is being discussed or analyzed are cumbersome

I was surprised that 5.7-rc2 built with that commit reverted and no other changes.
If that change is isolated, and later acpi changes don't depend on it, maybe it isn't necessary to understand all of the other commits in that branch.
The commit that causes problems for me seems to be specifically for the ACPI implementation in MS Surface devices. Maybe there is a more precise test that will work for Surface devices without breaking my laptop.
"More specifically, the Microsoft AML interpreter always treats buffer
fields created by the create_field() operator as buffer. ACPICA
currently does this only when the field size is larger than the
maximum integer width. This causes problems with AML code shipped in
Microsoft Surface devices."

If I can build a recent kernel with only that commit reverted, I can also check if later acpi changes also cause problems.
When I was doing the bisection, I noticed that sometimes the resume would show the splash screen and sometimes just a blue line at the bottom of the screen.

>Linus thinks in terms of '"light cones" in physics' and uses the term "git space"

That is an interesting way to look at it.
I was thinking like a chess program that explores the move tree while the opponent is thinking.

>This sounds like it is a duplicate of bug 1829096. See that bug for 2 possible workarounds (blacklist sony-laptop or remove it before suspend and modprobe it again after resume).

The source is in drivers/platform/x86/sony-laptop.c
Maybe the ACPI code there needed to be updated to correspond to the change in the commit that is causing problems.

Comment 33 Steve 2020-05-04 20:48:28 UTC
>Building with CONFIG_DEBUG_INFO_BTF disabled was bad. The kernel rpm took half an hour to build (instead of the usual 4 minutes) and ended up as 1,092,970,184 bytes instead of the usual 77MB. How could disabling debug info make it so much bigger? Did disabling CONFIG_DEBUG_INFO_BTF mean that it still included debug info but in a less compact format?

DEBUG_INFO could be enabled. That would explain the large size.

With "make nconfig" and v5.6-rc1, disabling DEBUG_INFO hides all suboptions and disables CONFIG_DEBUG_INFO_BTF.

CONFIG_DEBUG_INFO_BTF cannot be enabled independently.

There could be a bug in the Kconfig file. What kernel version do you have checked out and what config tool are you using?

Tested with:

$ git log --oneline -n1 HEAD
bb6d3fb354 (HEAD, tag: v5.6-rc1) Linux 5.6-rc1
(That's what I currently have checked out ...)

Comment 34 William Bader 2020-05-04 21:36:43 UTC
>DEBUG_INFO could be enabled. That would explain the large size.

Thanks, I think that is the problem. I forget to unset it.
It is enabled in the Fedora config files in my /boot, but I disabled it in most of the config files that I saved from this bisection and from the webcam bisection.
I started with a Fedora 5.6 config from my laptop instead of using one of the bisection configs.
I should have put that in my notes with the other config changes like CONFIG_LOCALVERSION.

>What kernel version do you have checked out and what config tool are you using?

I think that it was my mistake for not unsetting CONFIG_DEBUG_INFO.
I am using the distributed Fedora 31 tools.

$ make --version | head -1
GNU Make 4.2.1
$ gcc --version | head -1
gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1)

$ git log --oneline -n2 HEAD
f2aa8feff8ba (HEAD -> william-resume-test) Revert "ACPICA: Dispatcher: always generate buffer objects for ASL create_field() operator"
ae83d0b416db (tag: v5.7-rc2, origin/master, origin/HEAD, master) Linux 5.7-rc2

I am still wondering if the change in the commit that causes the problems needed a corresponding change in the sony-laptop module.
I have the same log messages as https://bugzilla.redhat.com/show_bug.cgi?id=1829096#c5
$ dmesg | grep sony
[    6.594616] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[    6.594775] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[    6.594776] sony_laptop: couldn't set up keyboard backlight function (-22)
[    6.596874] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[    6.597829] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[    6.598011] sony_laptop: Invalid acpi_object: expected 0x1 got 0x3
[    6.598012] sony_laptop: couldn't to read the thermal profiles
[    6.598013] sony_laptop: couldn't set up thermal profile function (-22)
[    6.598623] sony_laptop: SNC setup done.


I suspect that the change to pass integer-sized fields as buffers instead of integers in drivers/acpi/acpica/exfield.c means that sony_laptop is getting pointers where it expects simple values.
drivers/platform/x86/sony-laptop.c starts with the comment "Parts of this driver inspired from asus_acpi.c and ibm_acpi.c"
None of the other *-laptop.c and *_acpi.c drivers seem to have had ACPI changes since much before the Dec 2019 timeframe of the 6d232b29cfce65961db4a668c2c6c6987cd24d45 commit that is causing problems.
I wonder if the resume problem will affect more than just Sony laptops. That might be a hint whether the fix should be in extfield.c or sony-laptop.c.

Comment 35 Maximilian Luz 2020-05-04 21:46:49 UTC
> Maybe the ACPI code there needed to be updated to correspond to the change in the commit that is causing problems.

This pretty much sums it up. The referenced patch changes the behavior of the Linux AML interpreter to match the behavior of the Windows one. Prior to this patch, AML buffer fields created via CreateField(...) were automatically converted to type Integer on return if they were small enough (<= 64 bits on a 64 bit machine). It seems that this is the behavior expected in the sony-laptop driver.

As an example: The DSDT contains the following code:

  Mutex (SNM1, 0x00)
  Mutex (SNM2, 0x00)
  Name (SNBF, Buffer (0x0410){})
  CreateField (SNBF, Zero, 0x20, SNBD)
  CreateWordField (SNBF, 0x02, CPW0)
  CreateWordField (SNBF, 0x04, CPW1)
  CreateWordField (SNBF, Zero, RCW0)
  CreateWordField (SNBF, 0x02, RCW1)

  Method (SN07, 1, NotSerialized)
  {
      Acquire (SNM2, 0xFFFF)
      Local0 = Arg0
      SNBD = Local0
      SNCM ()
      Release (SNM2)
      Return (SNBD) /* \_SB_.PCI0.LPCB.SNC_.SNBD */
  }

Prior to the patch, SN07 returned an Integer type, calls to SN07 from sony-laptop expect this: https://elixir.bootlin.com/linux/v5.6.10/source/drivers/platform/x86/sony-laptop.c#L913. This call now fails and we see

  sony_laptop: Invalid acpi_object: expected 0x1 got 0x3

in the logs (see https://elixir.bootlin.com/linux/v5.6.10/source/drivers/platform/x86/sony-laptop.c#L774). Type 0x3 is Buffer, type 0x1 is Integer. I assume that something like this could break suspend/resume if called during any of this.

As far as I can see, the easiest solution would be to convert the returned buffer to an integer in sony_nc_int_call.

Finally a disclaimer: I don't have an affected device, so I can't do any testing and someone should probably verify this. I've only had a quick glance at the DSDT, but from what I can see, the patch shouldn't cause any issues in the AML code itself and is solely confined to the sony-laptop driver (or any other driver accessing the ACPI like that).

Comment 36 William Bader 2020-05-04 22:57:24 UTC
>the easiest solution would be to convert the returned buffer to an integer in sony_nc_int_call.

So in sony_nc_int_call() https://elixir.bootlin.com/linux/v5.6.10/source/drivers/platform/x86/sony-laptop.c#L774
would it be adding something like this?
if (object->type == ACPI_TYPE_BUFFER) {
  if (result) {
    size_t len = MIN(object->buffer.length, sizeof(int));
    *result = 0;
    memcpy(result, object->buffer.pointer, len);

Can the buffer length ever be different from the size of an int so byte ordering matters on the memcpy?
It looks like values are copied directly in and out of the buffers.

The build is still running to test what happens when 6d232b29cfce65961db4a668c2c6c6987cd24d45 is reverted, so I can't try this for a few hours.

Comment 37 William Bader 2020-05-04 23:27:54 UTC
The build completed of 5.7-rc2 with 6d232b29cfce65961db4a668c2c6c6987cd24d45 reverted, and suspend and resume worked, so hopefully the change to use a buffer is the only issue.

Comment 38 Steve 2020-05-04 23:34:02 UTC
Adding Mattia to CC list.

Mattia: Commit 6d232b29cfce65961db4a668c2c6c6987cd24d45 causes a Sony Vaio laptop to hang on resuming from suspend. (Comment 24, Comment 37)

The hang does not occur when sony-laptop is unloaded before suspending. (Comment 9)

$ grep -A9 'SONY VAIO CONTROL DEVICE DRIVER' MAINTAINERS 
SONY VAIO CONTROL DEVICE DRIVER
M:	Mattia Dongili <malattia@linux.it>
L:	platform-driver-x86@vger.kernel.org
S:	Maintained
W:	http://www.linux.it/~malattia/wiki/index.php/Sony_drivers
F:	Documentation/admin-guide/laptops/sony-laptop.rst
F:	drivers/char/sonypi.c
F:	drivers/platform/x86/sony-laptop.c
F:	include/linux/sony-laptop.h

Comment 39 Steve 2020-05-04 23:37:23 UTC
Mattia: Comment 38 is intended for you, but a BZ mid-air collision resulted in your email address not being added when the comment was reposted.

Comment 40 Maximilian Luz 2020-05-04 23:39:35 UTC
(In reply to William Bader from comment #36)
> So in sony_nc_int_call()
> https://elixir.bootlin.com/linux/v5.6.10/source/drivers/platform/x86/sony-
> laptop.c#L774
> would it be adding something like this?
> if (object->type == ACPI_TYPE_BUFFER) {
>   if (result) {
>     size_t len = MIN(object->buffer.length, sizeof(int));
>     *result = 0;
>     memcpy(result, object->buffer.pointer, len);

This should work on x86 (or little-endian systems in general), but will cause issues on big-endian systems. Not sure if that's something that has to be guaranteed for ACPI in general, but I guess it's definitely guaranteed for the sony-laptop driver, so I think it should be fine.

> Can the buffer length ever be different from the size of an int so byte
> ordering matters on the memcpy?

Yes, the size can be different. In case of CreateField(...) it can be an arbitrary number of bits, which gets rounded up to bytes by the AML interpreter. As long as the endianness matches, this code should be fine. If it's smaller than sizeof(int), you're just copying the lower bytes and since you've zeroed result before, the value will be the same. If it's larger than sizeof(int), the length check will truncate it, which on x86 will basically be wrap-around. I assume that you have some sort of guarantee on the value, otherwise the function would have result be an u64 or something else in the first place.

> It looks like values are copied directly in and out of the buffers.

As far as I can tell, this is the same as what the previous ACPI code did (and still does for Integer fields). See https://elixir.bootlin.com/linux/v5.6.10/source/drivers/acpi/acpica/exfield.c#L208. Not sure why there isn't any explicit endianness handling there.

For reference, here's how ACPI code does general buffer to integer conversions via AML's ToInteger(...) function: https://elixir.bootlin.com/linux/v5.6.10/source/drivers/acpi/acpica/exconvrt.c#L107

Comment 41 Maximilian Luz 2020-05-04 23:43:14 UTC
> I assume that you have some sort of guarantee on the value, otherwise the function would have result be an u64 or something else in the first place.

For a patch to be submitted you should probably add a check for truncation of the value and print or return an error.

Comment 42 Steve 2020-05-05 00:07:28 UTC
(In reply to Maximilian Luz from comment #35)
> ... or any other driver accessing the ACPI like that).

Could you suggest a systematic way to find ALL the drivers that could be affected?

And why didn't the data structure change cause a compile failure?

Comment 43 Steve 2020-05-05 00:37:00 UTC
(In reply to Steve from comment #42)
> (In reply to Maximilian Luz from comment #35)
> > ... or any other driver accessing the ACPI like that).
> 
> Could you suggest a systematic way to find ALL the drivers that could be affected?

Here is a simplistic first attempt:

$ git grep -l ACPI_TYPE_BUFFER . | wc -l
111

$ git grep -l 'union.*acpi_object' . | wc -l
157

$ git log --oneline -n1 HEAD
ae83d0b416 (HEAD, tag: v5.7-rc2) Linux 5.7-rc2

Comment 44 Maximilian Luz 2020-05-05 00:41:58 UTC
(In reply to Steve from comment #42)
> Could you suggest a systematic way to find ALL the drivers that could be
> affected?

There isn't any viable. All calls to acpi_evaluate_{integer,object,dsm} functions that expect an integer to be returned _could potentially_ be affected. Almost all of them are likely not affected. To check if a device really is affected, you'd have to get the DSDTs/ACPI tables for that device. So in the end you'd have to get the DSDTs of every device that has drivers which call to any (non-standard) ACPI function and expect an Integer to be returned. So even if we limit ourselves to platform drivers, I don't see any sane way to do that.

> And why didn't the data structure change cause a compile failure?

Because it's not really a data structure change as far as the kernel code is concerned. The change is in the value returned from ACPI. ACPI/AML is interpreted at run-time, and you get a union acpi_object back. Same C data structure as before, but instead of acpi_object.type == 0x01 (being Integer), you now get acpi_object.type == 0x03 (being Buffer).

Comment 45 Steve 2020-05-05 00:57:59 UTC
(In reply to Maximilian Luz from comment #44)
> (In reply to Steve from comment #42)
> > Could you suggest a systematic way to find ALL the drivers that could be affected?
> 
> There isn't any viable. All calls to acpi_evaluate_{integer,object,dsm}
> functions that expect an integer to be returned _could potentially_ be
> affected. Almost all of them are likely not affected. To check if a device
> really is affected, you'd have to get the DSDTs/ACPI tables for that device.
> So in the end you'd have to get the DSDTs of every device that has drivers
> which call to any (non-standard) ACPI function and expect an Integer to be
> returned. So even if we limit ourselves to platform drivers, I don't see any
> sane way to do that.

Thanks. So the only way to handle that case is to log an error and fail?

Basically, that means that the kernel has become a mere DSDT validator. :-)

> > And why didn't the data structure change cause a compile failure?
> 
> Because it's not really a data structure change as far as the kernel code is
> concerned. The change is in the value returned from ACPI. ACPI/AML is
> interpreted at run-time, and you get a union acpi_object back. Same C data
> structure as before, but instead of acpi_object.type == 0x01 (being
> Integer), you now get acpi_object.type == 0x03 (being Buffer).

The kernel needs to handle ALL replies, valid or not. And the kernel should fail gracefully, not hang.

William: If you are going to propose a patch, those comments are meant for you. :-)

Comment 46 Mattia Dongili 2020-05-05 02:50:59 UTC
I added a patch in https://bugzilla.kernel.org/show_bug.cgi?id=207491#c12.

Any testing is much appreciated, I'll send it off to platform-driver-x86@vger.kernel.org later today.

Comment 47 William Bader 2020-05-05 03:30:27 UTC
Created attachment 1685073 [details]
patch against 5.7-rc2 that fixes the problem

I have attached a patch to drivers/platform/x86/sony-laptop.c that fixes the problem by enhancing sony_nc_int_call() to support ACPI_TYPE_BUFFER.

I temporarily added debug code that produced the lines below in dmesg.
After the patch, the "kernel: sony_laptop: Invalid acpi_object: expected 0x1 got 0x3" messages no longer get tripped, and it looks like all of the buffers are 4 bytes long, although I think that the code in my patch will support smaller buffer sizes as long as the values are unsigned.
[    6.630992] sony_laptop: sony_nc_int_call, got int 0x0
[    6.631054] sony_laptop: sony_nc_int_call, got int 0x0
[    6.631463] sony_laptop: sony_nc_int_call, got int 0x100
[    6.631601] sony_laptop: sony_nc_int_call, got int 0x14b
[    6.631731] sony_laptop: sony_nc_int_call, got int 0x135
[    6.631856] sony_laptop: sony_nc_int_call, got int 0x13a
[    6.631976] sony_laptop: sony_nc_int_call, got int 0x0
[    6.632099] sony_laptop: sony_nc_int_call, got int 0x0
[    6.632235] sony_laptop: sony_nc_int_call, got int 0x0
[    6.632806] sony_laptop: sony_nc_int_call, got int 0x13f
[    6.633015] sony_laptop: sony_nc_int_call, got int 0x11d
[    6.633160] sony_laptop: sony_nc_int_call, got int 0x114
[    6.633508] sony_laptop: sony_nc_int_call, got int 0x0
[    6.636841] sony_laptop: sony_nc_int_call, got int 0x0
[    6.636957] sony_laptop: sony_nc_int_call, got int 0x148
[    6.637059] sony_laptop: sony_nc_int_call, got int 0x122
[    6.641685] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x2
[    6.641847] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x103
[    6.643138] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x201
[    6.645157] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x303
[    6.646210] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x70404
[    6.647397] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x201
[    6.648500] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x503
[    6.652743] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x70604
[    6.652972] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x3
[    6.657237] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x100
[    6.657421] sony_laptop: sony_nc_int_call, got int 0xce3c
[    6.659681] sony_laptop: sony_nc_int_call, got int 0xce20
[    6.660942] sony_laptop: SNC setup done.
[    6.990260] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x70404
[    6.991883] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x70604
[    9.707736] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x70404
[  389.917343] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x2
[  389.918290] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x201
[  389.919408] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x303
[  389.920181] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x503
[  389.921419] sony_laptop: sony_nc_int_call, got buffer len 4 -> 0x100
[  389.921541] sony_laptop: sony_nc_int_call, got int 0xce3c
[  389.922128] sony_laptop: sony_nc_int_call, got int 0xce20

Comment 48 Hans de Goede 2020-05-05 09:16:25 UTC
Hi All,

Thank you all for your great work on finding the cause of this. This potentially impacts a lot of other code then just the sony-laptop driver. So I believe that this is best discussed with various involved upstream maintainers. I plan to start a mailinglist thread about this soon.

William, Steve, Mattia. I would like to add you to the Cc of the upstream discussion, but this will leak the email address you use for bugzilla to pretty much the entire world. Can you please let me know if that is ok ?  Or mail me a different email address to use at hdegoede@redhat.com .

Regards,

Hans

Comment 49 Hans de Goede 2020-05-05 10:07:10 UTC
*** Bug 1829096 has been marked as a duplicate of this bug. ***

Comment 50 Steve 2020-05-05 13:24:10 UTC
(In reply to Mattia Dongili from comment #46)
> I added a patch in https://bugzilla.kernel.org/show_bug.cgi?id=207491#c12.
> 
> Any testing is much appreciated, I'll send it off to platform-driver-x86@vger.kernel.org later today.

Thanks, Mattia.

William: Could you add a link at the top of this bug report (under Links -> Linux Kernel), so that the upstream bug report is easy to find?

And thanks for your proposed patch (Comment 47) and for your feedback on the upstream patch.*

Hans: I bookmarked the upstream bug report, so I don't need any email.

* https://bugzilla.kernel.org/show_bug.cgi?id=207491#c13

Comment 51 Raphael Groner 2020-05-05 16:01:19 UTC
As epel8 seems somehow to be a fork of F29 someone could guess guidelines apply from Fedora 29 or 30, too.

Comment 52 Raphael Groner 2020-05-05 16:02:59 UTC
(In reply to Raphael Groner from comment #51)
> As epel8 seems somehow to be a fork of F29 someone could guess guidelines
> apply from Fedora 29 or 30, too.

Sorry, wrong bug. Please excuse and ignore my noise.

Comment 53 William Bader 2020-05-05 16:57:06 UTC
>Could you add a link at the top of this bug report

I added the link.

>this will leak the email address

The hotmail address that I used is ok to leak.

>And why didn't the data structure change cause a compile failure?

I wondered that also.
ACPI_OBJECT_COMMON_HEADER and ACPI_COMMON_FIELD_INFO are macros that declare common fields.
The formatting with everything on the same line makes them look like types.
When I write Qt code, for example, I put Q_OBJECT on its own line.
Also, is_create_field probably could have been added after buffer_obj, which would have reduced the diff to only one added line.

git log -p 6d232b29cfce65961db4a668c2c6c6987cd24d45
drivers/acpi/acpica/acobject.h:
-       ACPI_OBJECT_COMMON_HEADER ACPI_COMMON_FIELD_INFO union acpi_operand_object *buffer_obj; /* Containing Buffer object */
+       ACPI_OBJECT_COMMON_HEADER ACPI_COMMON_FIELD_INFO u8 is_create_field;    /* Special case for objects created by create_field() */
+       union acpi_operand_object *buffer_obj;  /* Containing Buffer object */

Comment 54 Steve 2020-05-05 17:50:03 UTC
(In reply to William Bader from comment #53)
> ACPI_OBJECT_COMMON_HEADER and ACPI_COMMON_FIELD_INFO are macros that declare common fields.

Thanks for pointing that out. In C, a *struct* should be used for that:

struct acpi_object_common_header {
        union acpi_operand_object       *next_object;       /* Objects linked to parent NS node */\
        u8                              descriptor_type;    /* To differentiate various internal objs */\
        u8                              type;               /* acpi_object_type */\
        u16                             reference_count;    /* For object deletion management */\
        u8                              flags;
        /* add padding here as needed */
};

From drivers/acpi/acpica/acobject.h:

#define ACPI_OBJECT_COMMON_HEADER \
        union acpi_operand_object       *next_object;       /* Objects linked to parent NS node */\
        u8                              descriptor_type;    /* To differentiate various internal objs */\
        u8                              type;               /* acpi_object_type */\
        u16                             reference_count;    /* For object deletion management */\
        u8                              flags;
        /*
         * Note: There are 3 bytes available here before the
         * next natural alignment boundary (for both 32/64 cases)
         */

Comment 55 Steve 2020-05-05 18:30:32 UTC
(In reply to Steve from comment #54)
> In C, a *struct* should be used for that:

Admittedly, this is entirely academic*, but the 4.3BSD kernel source code uses a struct in exactly that way here:

struct dinode {
        union {
                struct  icommon di_icom;
                char    di_size[128];
        } di_un;
};

Source:

./h/inode.h
srcsys.tar.gz
https://www.tuhs.org/Archive/Distributions/UCB/4.3BSD/

* Which means that I do not advocate making any changes in the Linux kernel based on this point.

Comment 56 Maximilian Luz 2020-05-05 18:45:58 UTC
If you want to make a change like this, you should pitch this to the ACPICA developers (https://github.com/acpica/acpica/). I believe the formatting issue is caused by their automated conversion tool. Basically, ACPICA is developed and maintained externally and updated in the Linux tree after each release made by them, see https://acpica.org/downloads/linux.

Comment 57 Steve 2020-05-05 19:24:24 UTC
(In reply to Maximilian Luz from comment #56)
> If you want to make a change like this, you should pitch this to the ACPICA developers (https://github.com/acpica/acpica/). I believe the formatting issue is caused by their automated conversion tool. Basically, ACPICA is developed and maintained externally and updated in the Linux tree after each release made by them, see https://acpica.org/downloads/linux.

Thanks for your reply. I didn't know that. Those macros-that-should-be-structs are in their source code:

https://github.com/acpica/acpica/blob/master/source/include/acobject.h

Comment 58 Steve 2020-05-06 04:43:09 UTC
(In reply to William Bader from comment #34)
> drivers/platform/x86/sony-laptop.c starts with the comment "Parts of this driver inspired from asus_acpi.c and ibm_acpi.c"

Thanks for pointing that out. Those files don't exist anymore:

$ find . -name 'asus_acpi.c'
$ find . -name 'ibm_acpi.c'

With:

$ git log --oneline --no-decorate -n1 HEAD
ae83d0b416 Linux 5.7-rc2

However, write_acpi_int_ret() in asus-laptop.c looks problematic:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/asus-laptop.c?h=v5.7-rc2#n355

And there are quite a few other references in drivers/platform/x86/:

$ find drivers/platform/x86/ -name '*acpi*'
drivers/platform/x86/thinkpad_acpi.c
drivers/platform/x86/toshiba_acpi.c
drivers/platform/x86/system76_acpi.c

$ fgrep ACPI_TYPE_INTEGER drivers/platform/x86/*.c | wc -l
108

$ fgrep ACPI_TYPE_BUFFER drivers/platform/x86/*.c | wc -l
37

Comment 59 William Bader 2020-05-06 06:06:00 UTC
(In reply to comment #58)

>However, write_acpi_int_ret() in asus-laptop.c looks problematic:
>fgrep ACPI_TYPE_INTEGER drivers/platform/x86/*.c

Just scrolling through asus-laptop.c:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/asus-laptop.c?h=v5.7-rc2#n1220

Probably any sequence that checks for ACPI_TYPE_INTEGER without also checking for ACPI_TYPE_BUFFER with a sequence like the one below is suspicious.
if (obj && obj->type == ACPI_TYPE_INTEGER)
  *result = obj->integer.value;
else
  err = -EIO;

$ git grep 'type == ACPI_TYPE_INTEGER' | wc -l
59

Someone suggested making a shared function for getting integer values so every driver doesn't need cut-and-paste copies of the same code to test for integers and buffers.

>> drivers/platform/x86/sony-laptop.c starts with the comment "Parts of this driver inspired from asus_acpi.c and ibm_acpi.c"
> Thanks for pointing that out. Those files don't exist anymore:

Out of curiosity, is there a way to find out what happened to them?
I tried git log --full-history -- drivers/platform/x86/asus_acpi.c
and the looking at the patches for the first commit it returned, but that didn't show the file being removed, unless I missed something.
I was thinking to find a few lines and then grep for them in the current master.

(In reply to comment #57)

>Those macros-that-should-be-structs are in their source code:

I wonder if they were trying to emulate C++ classes, and they didn't want to have an extra layer of identifiers with structs.
Or maybe they were worried about memory usage, and they didn't want each group of fields to be padded to a struct alignment.

(In reply to comment #56)

>I believe the formatting issue is caused by their automated conversion tool.

That is a quite a rabbit hole. Wouldn't it have been easier to use C typedefs or macros instead of postprocessing the code?

https://github.com/acpica/acpica/blob/master/source/tools/acpisrc/astable.c#L162

The last time I saw code like that was in the 1980's when I used a fortran preprocessor called SFORTRAN, written by Volvo (with variable names and comments in Swedish).
It had a similar system to translate portable type names using string matching and to emulate structures (by embedding files with named common blocks and large numbers of equivalence statements).

Comment 60 Hans de Goede 2020-05-06 09:33:41 UTC
(In reply to William Bader from comment #59)
> Probably any sequence that checks for ACPI_TYPE_INTEGER without also
> checking for ACPI_TYPE_BUFFER with a sequence like the one below is
> suspicious.
> if (obj && obj->type == ACPI_TYPE_INTEGER)
>   *result = obj->integer.value;
> else
>   err = -EIO;
> 
> $ git grep 'type == ACPI_TYPE_INTEGER' | wc -l
> 59

I'm afraid that it is not that simple, the problem only happens when the DSDT code for the ACPI method being called builds its return value using a CreateField call, rather then a CreateDWordField call, or assigning an integer to
an object and then returning that object. So in most cases the type == ACPI_TYPE_INTEGER check will be correct and will still work. The sony code was sort of special in this regards and with some luck it is the only case affected by this.

Comment 61 Steve 2020-05-06 11:10:36 UTC
(In reply to William Bader from comment #59)
> Someone suggested making a shared function for getting integer values so every driver doesn't need cut-and-paste copies of the same code to test for integers and buffers.

The ACPICA is thoroughly documented, and there appears to be an API:

ACPI Component Architecture
User Guide and Programmer Reference
OS-Independent Kernel Subsystem, Debugger, and Utilities
Revision 6.2
May 31,2017
https://acpica.org/sites/acpica/files/acpica-reference_18.pdf

Links from here:

ACPICA Documentation
https://acpica.org/documentation

Comment 62 Steve 2020-05-06 11:31:31 UTC
Following William's suggestion (Comment 59), here is another one for review:

$ grep -C4 -n '==.*ACPI_TYPE_INTEGER' drivers/platform/x86/*acpi.c
drivers/platform/x86/thinkpad_acpi.c-621-
drivers/platform/x86/thinkpad_acpi.c-622-	switch (res_type) {
drivers/platform/x86/thinkpad_acpi.c-623-	case 'd':		/* int */
drivers/platform/x86/thinkpad_acpi.c-624-		success = (status == AE_OK &&
drivers/platform/x86/thinkpad_acpi.c:625:			   out_obj.type == ACPI_TYPE_INTEGER);
drivers/platform/x86/thinkpad_acpi.c-626-		if (success && res)
drivers/platform/x86/thinkpad_acpi.c-627-			*res = out_obj.integer.value;
drivers/platform/x86/thinkpad_acpi.c-628-		break;
drivers/platform/x86/thinkpad_acpi.c-629-	case 'v':		/* void */

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/x86/thinkpad_acpi.c?h=v5.7-rc2#n621

Comment 63 Steve 2020-05-06 11:40:38 UTC
Here is a complete list for review:

$ grep -C4 -n '==.*ACPI_TYPE_INTEGER' drivers/platform/x86/*.c | less

"4" is arbitrary, so adjust as needed.

Comment 64 Steve 2020-05-06 12:42:07 UTC
(In reply to William Bader from comment #59)
> Out of curiosity, is there a way to find out what happened to them [asus_acpi.c and ibm_acpi.c]?

Yes, and the commit messages are quite informative:

This commit actually deletes asus_acpi.c:

platform/x86: drop deprecated asus_acpi driver
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/platform/x86?h=v5.7-rc2&id=7ec48ceda25c6c16ab3f69b6c318d3d196f7abd0

This one merely removes the module alias:

thinkpad-acpi: drop ibm-acpi alias
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/platform/x86?h=v5.7-rc2&id=257bc1cb3e29c8da62b9c9e0a4505011776c7040

I found those by searching the mainline git repo online. Specifically, I did separate "log msg" searches for "asus_acpi" and "ibm-acpi".*

These are the exact queries:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/platform/x86?h=v5.7-rc2&qt=grep&q=asus_acpi
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/platform/x86?h=v5.7-rc2&qt=grep&q=ibm-acpi

* NB: A search for "ibm_acpi" returns no results.
                       ^

Comment 65 Mattia Dongili 2020-05-06 12:55:47 UTC
ibm-acpi was from around here: http://ibm-acpi.sf.net
It's possible that the actual inspiration was from thinkpad_acpi.c, I picked up the sony-laptop driver (once was sony-acpi) from Stelian Pop and worked on it to upstream it.

Comment 66 Steve 2020-05-06 13:29:11 UTC
(In reply to Mattia Dongili from comment #65)
> ibm-acpi was from around here: http://ibm-acpi.sf.net
> It's possible that the actual inspiration was from thinkpad_acpi.c, I picked up the sony-laptop driver (once was sony-acpi) from Stelian Pop and worked on it to upstream it.

Thanks. That link doesn't seem to be working ATM, but this page has more history:

ThinkPad ACPI Extras Driver
Version 0.25
October 16th, 2013
https://www.kernel.org/doc/html/latest/admin-guide/laptops/thinkpad-acpi.html

Could you review thinkpad_acpi.c? (Comment 62)

If it looks problematic, we will probably need to see if this bug can be reproduced with an actual ThinkPad.

Comment 67 Steve 2020-05-06 13:35:55 UTC
(In reply to Steve from comment #66)
> (In reply to Mattia Dongili from comment #65)
> > ibm-acpi was from around here: http://ibm-acpi.sf.net
...
> Thanks. That link doesn't seem to be working ATM, ...

That link only works with unencrypted connections. For finding a reproducer, would it matter which ThinkPad model is used?

Comment 68 Maximilian Luz 2020-05-06 19:57:02 UTC
(In reply to Steve from comment #67)
> For finding a reproducer, would it matter which ThinkPad model is used?

Short answer: Yes.

Long answer: If you want to fully check that driver, you need to cover each non-standard ACPI function that can possibly be called directly in the driver. Since there are different models with different capabilities (just refer to the comment "Not all thinkpads have a hardware radio switch" in thinkpad_acpi.c) and thus different ACPI functions that can be called, you will need to find a set of models that covers everything.

Comment 69 Steve 2020-05-07 00:35:53 UTC
(In reply to Maximilian Luz from comment #68)
> (In reply to Steve from comment #67)
> > For finding a reproducer, would it matter which ThinkPad model is used?
> 
> Short answer: Yes.
> 
> Long answer: If you want to fully check that driver, you need to cover each non-standard ACPI function that can possibly be called directly in the driver. Since there are different models with different capabilities (just refer to the comment "Not all thinkpads have a hardware radio switch" in thinkpad_acpi.c) and thus different ACPI functions that can be called, you will need to find a set of models that covers everything.

Thanks for your analysis. The right way to conduct such a test would be with a unit test for each kernel module and a complete array of ACPI functions as inputs. Does the ACPICA have support for unit tests on the ACPI side?

Comment 70 William Bader 2020-05-07 02:46:09 UTC
>For finding a reproducer

Is it worth looking at recent bug reports?

Bug 1485013 (Touchpad don't work after wake up from suspend) (posted a long time ago but the last comment says "Still happens on F32 using kernel 5.6.8-300.fc32.x86_64.")

Bug 1830149 ([abrt] suspend_devices_and_enter: WARNING: CPU: 3 PID: 18844 at kernel/power/suspend_test.c:53 suspend_test_finish+0x74/0x80)

Bug 1630419 (Laptop freezes when mounting on a docking station connected to external monitors.)

Bug 1831600 (Sound devices disappear after suspend)

Bug 1082211 (suspend/resume triggers Machine Check Event on Lenovo Yoga 2 pro)

Comment 71 Steve 2020-05-07 05:44:44 UTC
(In reply to William Bader from comment #70)
> >For finding a reproducer
> 
> Is it worth looking at recent bug reports?

Thanks for doing that research. What search terms did you use?

> Bug 1485013 (Touchpad don't work after wake up from suspend) (posted a long time ago but the last comment says "Still happens on F32 using kernel 5.6.8-300.fc32.x86_64.")

The last reporter has had the problem long before 5.6. (Bug 1485013, Comment 19, cites 4.18.11)

> Bug 1830149 ([abrt] suspend_devices_and_enter: WARNING: CPU: 3 PID: 18844 at kernel/power/suspend_test.c:53 suspend_test_finish+0x74/0x80)

+++ 5.6.7-300.fc32.x86_64 with dell_* modules. Log attached. Numerous duplicates (all private, though).

> Bug 1630419 (Laptop freezes when mounting on a docking station connected to external monitors.)

+++ The last report is for 5.6.8-300 with a "ThinkPad X1 Carbon 6th Gen". I asked the reporter to open a new bug and to attach a log.

> Bug 1831600 (Sound devices disappear after suspend)

5.6.8, but no log and no HW info.

> Bug 1082211 (suspend/resume triggers Machine Check Event on Lenovo Yoga 2 pro)

5.5.7. And an MCE is probably not caused by a software problem.

==
In the above, "+++" means 5.6 with HW info or log.

Comment 72 William Bader 2020-05-07 06:08:21 UTC
>What search terms did you use?

I tried a few queries: kernel, suspend, resume, wake
Then I sorted by date to find the new reports, but it showed a lot of old reports with dates that were updated by automated messages.

Could the buffer/integer issue cause less obvious problems than a failed resume?

I might be mistaken, but I think that Mattia thinks that my laptop ultimately failed to resume due to the same NULL reference reading the thermal profiles that Dominik reported, but my laptop crashes differently (or I didn't do the right logging) to capture it. If that is correct, then the resume problem is specific to the unguarded dereference of th_handle in sony_nc_thermal_resume() fixed by Mattia's second patch, and laptops from other manufacturers might not crash on resume but still might lose other functionality when they think that an ACPICA evaluation failed.

Comment 73 Steve 2020-05-07 06:34:52 UTC
(In reply to William Bader from comment #72)
> >What search terms did you use?
> 
> I tried a few queries: kernel, suspend, resume, wake
> Then I sorted by date to find the new reports, but it showed a lot of old reports with dates that were updated by automated messages.

Thanks. The automated messages make searching for recent reports much harder.

> Could the buffer/integer issue cause less obvious problems than a failed resume?

Possibly. The Call Trace in this bug report has Dell module calls and acpi calls. I've suggested a possible reproducer and a blacklisting test to Arcadiy:

Bug 1831380 - page allocation failure in acpi_ut_initialize_buffer with kernel 5.6.8. Regression? 

> I might be mistaken, but I think that Mattia thinks that my laptop ultimately failed to resume due to the same NULL reference reading the thermal profiles that Dominik reported, but my laptop crashes differently (or I didn't do the right logging) to capture it. If that is correct, then the resume problem is specific to the unguarded dereference of th_handle in sony_nc_thermal_resume() fixed by Mattia's second patch, and laptops from other manufacturers might not crash on resume but still might lose other functionality when they think that an ACPICA evaluation failed.

Dereferencing an invalid pointer could cause almost anything to happen, so if that is what happened, the symptoms may be impossible to reliably describe.

Comment 74 Mattia Dongili 2020-05-07 07:08:24 UTC
(In reply to Steve from comment #73)
> (In reply to William Bader from comment #72)
...
> > I might be mistaken, but I think that Mattia thinks that my laptop ultimately failed to resume due to the same NULL reference reading the thermal profiles that Dominik reported, but my laptop crashes differently (or I didn't do the right logging) to capture it. If that is correct, then the resume problem is specific to the unguarded dereference of th_handle in sony_nc_thermal_resume() fixed by Mattia's second patch, and laptops from other manufacturers might not crash on resume but still might lose other functionality when they think that an ACPICA evaluation failed.
> 
> Dereferencing an invalid pointer could cause almost anything to happen, so
> if that is what happened, the symptoms may be impossible to reliably
> describe.

One way to assess it's the same issue would be to apply only the second patch I submitted on top of a vanilla 5.7-rc* or 5.6.
If the NULL check fixes your laptop's resume from suspend to ram then it's a pretty good confirmation it's the same NULL deref that Dominik reported.
You should still see the "Invalid acpi_object: expected 0x1 got 0x3" warnings and especially "couldn't set up thermal profile function (-22)", but successfully resume.

I tried that exact exercises on my Vaio Pro 11 and got a successful resume where it was failing before.

Comment 75 Steve 2020-05-07 07:35:40 UTC
>the unguarded dereference of th_handle in sony_nc_thermal_resume()

The sony-laptop driver appears to support sysfs, so it might be possible to reproduce the problem by reading or writing a suitable file in /sys/.

Specifically, the sony-laptop driver code exports these names in sony_nc_thermal_setup(): "thermal_profiles", "thermal_control".

You might be able to find them with something like this:

$ find -L /sys/class/thermal/ -maxdepth 2 2>/dev/null

And then:

# cat /sys/class/thermal/path/to/file

Or:

# echo "balanced" > /sys/class/thermal/path/to/file

The possible values are:

$ grep -A5 -m1 'THM_PROFILE_MAX' drivers/platform/x86/sony-laptop.c 
#define THM_PROFILE_MAX 3
static const char * const snc_thermal_profiles[] = {
	"balanced",
	"silent",
	"performance"
};

Comment 76 Mattia Dongili 2020-05-07 08:25:00 UTC
(In reply to Steve from comment #75)
> >the unguarded dereference of th_handle in sony_nc_thermal_resume()
> 
> The sony-laptop driver appears to support sysfs, so it might be possible to
> reproduce the problem by reading or writing a suitable file in /sys/.
> 
> Specifically, the sony-laptop driver code exports these names in
> sony_nc_thermal_setup(): "thermal_profiles", "thermal_control".
...

The thermal_control sysfs files won't be there when sony_nc_thermal_setup fails to perform the setup (which is the case without the first of my two patches).
The problem on resume is that sony-laptop discovers "handles" that the DSDT exposes in Device (SNC)[1] on first loading and those "handles" exist after resume as well.
sony-laptop managed to successfully recover on module load (so no sysfs files) but it triggered the NULL pointer deref on resume.
See the recovery in [2] and the bug in [3].

The sysfs files are like this for the record:

$ grep . /sys/devices/platform/sony-laptop/*thermal*
/sys/devices/platform/sony-laptop/thermal_control:balanced
/sys/devices/platform/sony-laptop/thermal_profiles:balanced silent performance


[1]: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/sony-laptop.c#L846-L855
[2]: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/sony-laptop.c#L2281
[3]: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/sony-laptop.c#L2300

Comment 77 Steve 2020-05-07 13:51:50 UTC
(In reply to Mattia Dongili from comment #76)

Thanks for your very informative reply.

> so no sysfs files

Would this be a diagnostic that doesn't require a suspend/resume cycle?

$ cat /sys/devices/platform/sony-laptop/thermal_control
cat: /sys/devices/platform/sony-laptop/thermal_control: No such file or directory

(That's on my non-Sony system.)

Comment 78 Maximilian Luz 2020-05-07 14:16:50 UTC
(In reply to Steve from comment #69)
> Does the ACPICA have support for unit tests on the ACPI side?

I believe they have unit testing set-up outside of the kernel. No idea what that covers.

Comment 79 Maximilian Luz 2020-05-07 14:22:23 UTC
(In reply to William Bader from comment #72)
> Could the buffer/integer issue cause less obvious problems than a failed resume?

Yes, definitely. I believe the thinkpad-acpi driver uses ACPI calls for hotkeys, so in that case it could be something like the hotkeys not working any more. Maybe even more subtle for other drivers.

Comment 80 Maximilian Luz 2020-05-07 15:12:37 UTC
(In reply to Steve from comment #73)
> (In reply to William Bader from comment #72)
> > Could the buffer/integer issue cause less obvious problems than a failed resume?
> 
> Possibly. The Call Trace in this bug report has Dell module calls and acpi
> calls. I've suggested a possible reproducer and a blacklisting test to
> Arcadiy:
> 
> Bug 1831380 - page allocation failure in acpi_ut_initialize_buffer with
> kernel 5.6.8. Regression? 

This does not look like a match to me. First off, the error seems to occur in an allocator function. There should be no way that ACPI related code causes allocation failures when allocating 64kb as Hans indicated.

Second, the run_smbios_call in the trace expects a Buffer object (https://elixir.bootlin.com/linux/v5.6.11/source/drivers/platform/x86/dell-smbios-wmi.c#L67). Now it could be that the error path, where it expects an Integer, fails and it always detects this as Buffer and thus always thinks the method ran successful, but then I think that the error would occur in a different place, and not in an allocator function. Also I doubt that they would return error values (what the Integer seems to represent) via a buffer field and not an Integer directly. That doesn't make sense to me. I've also found a DSDT (attached in https://bugzilla.kernel.org/show_bug.cgi?id=204251), and it doesn't look like any WMI function would return a field created via CreateField.

Comment 81 William Bader 2020-05-07 16:48:59 UTC
Created attachment 1686244 [details]
journalctl -b -1 --no-hostname -k from boot with second patch only

I tested booting 5.7-rc4 with only Mattia's second patch to guard the null pointer reference, and resume worked, but the handle wasn't null during the resume, which I think agrees with not having errors about a null pointer in dmesg on vanilla kernels on my laptop. I was expecting to see the debug code show a null pointer, so maybe there is something else going on.
I attached a log of booting, running the sequence below to enable debug, doing a suspend and resume, and then shutting down. I added some debug code, which is also captured in the log.

$ sudo rmmod sony-laptop
$ sudo lsmod | grep sony
$ sudo modprobe sony-laptop debug=1
$ sudo lsmod | grep sony
sony_laptop            65536  0
rfkill                 28672  7 bluetooth,cfg80211,sony_laptop
video                  53248  2 i915,sony_laptop

$ git diff master
diff --git a/drivers/platform/x86/sony-laptop.c b/drivers/platform/x86/sony-laptop.c
index 51309f7ceede..00a4b438ac20 100644
--- a/drivers/platform/x86/sony-laptop.c
+++ b/drivers/platform/x86/sony-laptop.c
@@ -855,6 +855,11 @@ static int sony_nc_handles_setup(struct platform_device *pd)
        }
 
        if (debug) {
+#ifdef CONFIG_PM_SLEEP
+               dprintk("sony_nc_handles_setup, CONFIG_PM_SLEEP enabled\n");
+#else
+               dprintk("sony_nc_handles_setup, CONFIG_PM_SLEEP not set\n");
+#endif
                sysfs_attr_init(&handles->devattr.attr);
                handles->devattr.attr.name = "handles";
                handles->devattr.attr.mode = S_IRUGO;
@@ -2297,7 +2302,8 @@ static void sony_nc_thermal_resume(void)
 {
        unsigned int status = sony_nc_thermal_mode_get();
 
-       if (status != th_handle->mode)
+       dprintk("sony_nc_thermal_resume, th_handle %p\n", th_handle);
+       if (th_handle && status != th_handle->mode)
                sony_nc_thermal_mode_set(th_handle->mode);
 }
 #endif

$ git log | head
commit 0e698dfa282211e414076f9dc7e83c1c288314fd
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun May 3 14:56:04 2020 -0700

    Linux 5.7-rc4

commit 262f7a6b8317a06e7d51befb690f0bca06a473ea
Merge: ea91593350ec eb91db63a90d
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun May 3 11:30:08 2020 -0700

Comment 82 William Bader 2020-05-07 17:10:21 UTC
I get the thermal devices only when I boot with the kernel that has both patches.
I have the Fedora stable kernel booted now, and corresponding to not having kbd_backlight,the keyboard backlight isn't working. I hadn't noticed earlier because I am in a room with a lot of light. I searched for 'backlight' on bugzilla but didn't find anything that seems to be related to this.

$ uname -a # both patches (integer/buffer and null pointer)
Linux laptop 5.7.0-rc4.localversion48-dirty #1 SMP Tue May 5 01:44:29 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ls /sys/devices/platform/sony-laptop/
battery_care_health   driver           kbd_backlight          modalias  power      thermal_control   touchpad
battery_care_limiter  driver_override  kbd_backlight_timeout  panel_id  subsystem  thermal_profiles  uevent

$ uname -a # only second patch (null pointer)
Linux laptop 5.7.0-rc4.localversion49-dirty #2 SMP Thu May 7 10:50:35 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ls /sys/devices/platform/sony-laptop/
battery_care_health battery_care_limiter driver driver_override modalias panel_id power subsystem touchpad uevent

$ uname -a # Fedora stable 5.6.8
Linux laptop 5.6.8-200.fc31.x86_64 #1 SMP Wed Apr 29 19:10:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ls /sys/devices/platform/sony-laptop/
battery_care_health battery_care_limiter driver driver_override modalias panel_id power subsystem touchpad uevent

Comment 83 Steve 2020-05-07 18:11:50 UTC
(In reply to Maximilian Luz from comment #80)

Thanks for your careful analysis of Bug 1831380 and nice work digging up the related DSDT. You have convinced me that the other bug is not directly related to this one.

(In reply to William Bader from comment #81)

Nice work on documenting your debugging code and on the debugging code itself. I like that you provide debug output for both possible values of CONFIG_PM_SLEEP.

> the handle wasn't null during the resume

Further, it is a valid pointer, because there is no fault when it is dereferenced here:

+       if (th_handle && status != th_handle->mode)

For the record, the attached log has:

May 07 17:14:13 kernel: sony_laptop: sony_nc_thermal_resume, th_handle 00000000fb01a452

(In reply to William Bader from comment #82)

Thanks for checking /sys/devices/platform/sony-laptop/ in all three cases.

That gives a diagnostic that doesn't require a suspend/resume test. Further, it is more definitive, since the missing files are clearly related to the sony-laptop driver.

Comment 84 William Bader 2020-05-20 04:56:17 UTC
Is there any way to check the status of the patch?
Mattia added me to the CC list of his submission, and it was accepted about two weeks ago.
It doesn't appear to be included in 5.7-rc6 from May 17.

Comment 85 Mattia Dongili 2020-05-20 06:13:37 UTC
I sent a follow-up on the platform-driver-x86 mailing list. I don't see the patches in either Linus' or the platform-driver-x86 tree.

Comment 86 Steve 2020-05-20 17:20:54 UTC
Here are Mattia's patches on patchwork. They are both in state "Accepted, archived":

[v4,0/2] Two fixes for one sony-laptop reported bug on 5.6
https://patchwork.kernel.org/cover/11535147/

Click Related: "show" to see each one:

[v4,1/2] platform/x86: sony-laptop: SNC calls should handle BUFFER types
https://patchwork.kernel.org/patch/11535143/

[v4,2/2] platform/x86: sony-laptop: Make resuming thermal profile safer
https://patchwork.kernel.org/patch/11535145/

Comment 87 Steve 2020-05-20 17:38:41 UTC
I don't think Andy has sent a pull request yet:

https://www.google.com/search?hl=en&as_q=Andy+Shevchenko+pull+platform-drivers-x86

Comment 88 Steve 2020-05-20 17:54:42 UTC
I found them in linux-next:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?qt=grep&q=sony-laptop

linux-next is linked from here:

https://www.kernel.org/

Comment 89 Steve 2020-05-20 18:39:13 UTC
(In reply to Steve from comment #88)
> I found them in linux-next:

BTW, the linux-next git repo is interesting because it is a semi-automatic "merg[e] [of] 316 trees (counting Linus' and 78 trees of bug fix patches pending for the current merge release)."

Date	Sun, 5 Apr 2020 14:49:19 +1000
From	Stephen Rothwell <>
Subject	linux-next: Tree for Apr 5
https://lkml.org/lkml/2020/4/5/1

The Next/Trees file lists all the trees that are merged:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Next/Trees

For example:

$ grep platform-drivers-x86 Trees-1.txt 
drivers-x86-fixes	git	git://git.infradead.org/linux-platform-drivers-x86.git#fixes
drivers-x86	git	git://git.infradead.org/linux-platform-drivers-x86.git#for-next

Comment 90 William Bader 2020-05-20 19:26:26 UTC
Thanks for the links!

Comment 91 Hans de Goede 2020-05-20 20:19:49 UTC
Yes, it looks like finding its way upstream is a bit slow for this patch, since this fixes a serious issue I've added the patch as downstream patch to the Fedora kernels for now.

The fix will be included in the 5.6.14 Fedora kernel update which is building now and will show up in updates-testing in about a day or so.

Comment 92 William Bader 2020-05-20 22:48:28 UTC
Hans, thanks! I asked about it because I have kept running the official Fedora kernels, and yesterday I closed the lid of my laptop by accident, and oops...

Comment 93 Fedora Update System 2020-05-21 13:41:25 UTC
FEDORA-2020-57bf620276 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-57bf620276

Comment 94 Fedora Update System 2020-05-21 13:41:48 UTC
FEDORA-2020-0c0b5d9004 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-0c0b5d9004

Comment 95 Fedora Update System 2020-05-21 13:42:15 UTC
FEDORA-2020-320f05784e has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-320f05784e

Comment 96 Fedora Update System 2020-05-22 03:02:23 UTC
FEDORA-2020-0c0b5d9004 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-0c0b5d9004`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-0c0b5d9004

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 97 Fedora Update System 2020-05-22 04:24:04 UTC
FEDORA-2020-57bf620276 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-57bf620276`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-57bf620276

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 98 Fedora Update System 2020-05-22 04:54:39 UTC
FEDORA-2020-320f05784e has been pushed to the Fedora 30 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-320f05784e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-320f05784e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 99 Fedora Update System 2020-05-25 02:46:42 UTC
FEDORA-2020-57bf620276 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 100 Fedora Update System 2020-05-30 02:04:13 UTC
FEDORA-2020-5436586091 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5436586091`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5436586091

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 101 Fedora Update System 2020-06-02 03:13:41 UTC
FEDORA-2020-5436586091 has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 102 William Bader 2020-06-03 14:55:46 UTC
Kernel 5.6.15-200.fc31.x86_64 is working for me. Thanks!

Comment 103 Mattia Dongili 2020-06-17 00:03:58 UTC
 For the record, the patches have just been picked up for the upcoming 5.6 and 5.7 stable releases.


Note You need to log in before you can comment on or make changes to this bug.