Hide Forgot
Created attachment 1674785 [details] kernel log from journalctl --no-hostname -k 1. Please describe the problem: The kernel does not find the built-in webcam. My laptop is a Sony Vaio, product VPCCB4Q1E, model PCG-71D14M. 2. What is the Version-Release number of the kernel: 5.5.11-200.fc31.x86_64 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : It worked fairly recently because I remember the webcam showing as an option in xsane. I don't normally use the webcam, but with the virus, I have had to use Zoom, and I noticed the Zoom doesn't find it, and other applications like cheese also don't find it. Going deeper, there is no /dev/video. cheese gives the messages below: ** Message: 19:50:31.949: cheese-application.vala:214: Error during camera setup: No device found (cheese:15772): cheese-CRITICAL **: 19:50:31.963: cheese_camera_device_get_name: assertion 'CHEESE_IS_CAMERA_DEVICE (device)' failed (cheese:15772): GLib-CRITICAL **: 19:50:31.963: g_variant_new_string: assertion 'string != NULL' failed (cheese:15772): GLib-CRITICAL **: 19:50:31.963: g_variant_ref_sink: assertion 'value != NULL' failed (cheese:15772): GLib-GIO-CRITICAL **: 19:50:31.963: g_settings_schema_key_type_check: assertion 'value != NULL' failed (cheese:15772): GLib-CRITICAL **: 19:50:31.963: g_variant_get_type_string: assertion 'value != NULL' failed (cheese:15772): GLib-GIO-CRITICAL **: 19:50:31.963: g_settings_set_value: key 'camera' in 'org.gnome.Cheese' expects type 's', but a GVariant of type '(null)' was given (cheese:15772): GLib-CRITICAL **: 19:50:31.963: g_variant_unref: assertion 'value != NULL' failed ** (cheese:15772): CRITICAL **: 19:50:31.963: cheese_preferences_dialog_setup_resolutions_for_device: assertion 'device != NULL' failed 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: The kernel never finds the webcam. It happens every time the system boots. I think that these are the relevant lines from /var/log/messages: Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: new high-speed USB device number 4 using ehci-pci Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 has too many interfaces: 120, using maximum allowed: 32 Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 descriptor has 1 excess byte, ignoring Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 has 0 interfaces, different from the descriptor's value: 120 Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: New USB device found, idVendor=05ca, idProduct=18c0, bcdDevice= 7.32 Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: Product: USB2.0 Camera Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: Manufacturer: Ricoh Company Ltd. Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: can't set config #247, error -32 lsusb shows Bus 002 Device 003: ID 093a:2510 Pixart Imaging, Inc. Optical Mouse Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 004: ID 05ca:18c0 Ricoh Co., Ltd Bus 001 Device 005: ID 0489:e036 Foxconn / Hon Hai Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: I haven't tried, but I can try if someone requests. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No, not as far as I know. Did Fedora 31 change something so I now need to load firmware like http://vaio-utils.org/camera/ 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. ok
Thanks for your report. There is a newer kernel in updates-testing: kernel-5.5.13-200.fc31 https://bodhi.fedoraproject.org/updates/FEDORA-2020-809ff0b166 # dnf update kernel --enablerepo=updates-testing There is also a new F32 kernel here: kernel-5.6.0-300.fc32 and kernel-headers-5.6.0-300.fc32 https://bodhi.fedoraproject.org/updates/FEDORA-2020-e8b6474ee5 The full list is here: https://bodhi.fedoraproject.org/updates/?packages=kernel
Updating from updates-testing would take the kernel from 5.5.11 to 5.5.13. The changelog for 5.5.12 doesn't show anything about cameras, and the changelog for 5.5.13 is very short. Is there any specific change in either one that you suspect would help? Is there an easy way to test the 5.6 Fedora 32 kernel without updating to Fedora 32 beta? The webcam is not critical, and I don't want to risk breaking Linux on my laptop. Will the 5.6 kernel come to Fedora 31 soon? Regards, William
(In reply to William Bader from comment #2) > Updating from updates-testing would take the kernel from 5.5.11 to 5.5.13. > The changelog for 5.5.12 doesn't show anything about cameras, and the changelog for 5.5.13 is very short. > Is there any specific change in either one that you suspect would help? Testing whether a newer kernel fixes a problem is fairly standard operating procedure, but if you prefer to wait until 5.5.13 reaches "stable" that is fine. > Is there an easy way to test the 5.6 Fedora 32 kernel without updating to Fedora 32 beta? Yes, very easy, but I would suggest trying 5.5.13 first. > The webcam is not critical, and I don't want to risk breaking Linux on my laptop. The "karma" reports on Bodhi for 5.5.13 are all positive: https://bodhi.fedoraproject.org/updates/FEDORA-2020-809ff0b166 And if there is a problem, you can still boot 5.5.11 from the grub2 menu. You have "rhgb quiet" on your kernel command-line, so you may need to press and hold, or repeatedly tap, the "Esc" key to see the grub2 menu. Please post back if you have trouble getting to the grub2 menu. (You can try that without installing a new kernel.) > Will the 5.6 kernel come to Fedora 31 soon? I don't know. The best that I can suggest is to monitor the kernel builds on Koji: https://koji.fedoraproject.org/koji/packageinfo?packageID=8 > Regards, William
Created attachment 1674891 [details] kernel 5.5.13 log with journalctl --no-hostname -k Updating to the 5.5.13-200.fc31.x86_64 kernel from updates-testing didn't help. It still has messages like 'usb 1-1.3: device descriptor read/64, error -32', which I think is the built-in webcam. I attached a new boot log. I updated the kernel with the command 'dnf update kernel --enablerepo=updates-testing' Do I have to do anything to eventually get back to the stable kernel? Will dnfdragora switch to the stable kernel once the stable kernel has a higher version than 5.5.13? I get a list of kernels for a few seconds when I reboot, so I don't have a pressing need to remove the updates-testing kernel. Regards, William
(In reply to William Bader from comment #4) > Created attachment 1674891 [details] > kernel 5.5.13 log with journalctl --no-hostname -k > > Updating to the 5.5.13-200.fc31.x86_64 kernel from updates-testing didn't > help. It still has messages like 'usb 1-1.3: device descriptor read/64, > error -32', which I think is the built-in webcam. I attached a new boot log. Thanks for testing and for attaching the log. > I updated the kernel with the command 'dnf update kernel --enablerepo=updates-testing' > Do I have to do anything to eventually get back to the stable kernel? > Will dnfdragora switch to the stable kernel once the stable kernel has a higher version than 5.5.13? Actually, 5.5.13 will become the stable kernel, unless there are serious problems with it, which there don't appear to be. If you would prefer to go back to using 5.5.11, you can keep choosing it in the grub2 menu, or you can remove 5.5.13 and wait for the update. To remove 5.5.13, you can use a very nice feature of dnf -- the "history" command: # dnf history info last # This will show you the last thing done by dnf. # dnf history undo last # This will undo the last thing done by dnf. Just as with updates, dnf will ASK before doing anything, so you can always answer "N". NB: Boot from 5.5.11 before undoing, because dnf won't let you remove the currently running kernel. Documentation: "man dnf". > I get a list of kernels for a few seconds when I reboot, so I don't have a pressing need to remove the updates-testing kernel. Good. The default timer is for 5 seconds, but if you tap any key, the timer will stop counting down, so you can look at the grub2 menu for as long as you like. :-) > Regards, William
You can also use a transaction ID to undo a particular dnf transaction: # dnf history | head # dnf history undo NNN # Where NNN is a transaction ID from the above listing.
(In reply to Steve from comment #3) ... > > Is there an easy way to test the 5.6 Fedora 32 kernel without updating to Fedora 32 beta? > > Yes, very easy, but I would suggest trying 5.5.13 first. ... Actually, you could test F32-Beta as a Live image by downloading the ISO file and installing it on a USB flash drive or a DVD. This has an F32-Beta download link and instructions for putting the Live image on a bootable device: https://getfedora.org/en/workstation/download/ The F32-Beta Live ISO image file name is: Fedora-Workstation-Live-x86_64-32_Beta-1.2.iso. The kernel is: $ uname -r 5.6.0-0.rc5.git0.2.fc32.x86_64
Well, 5.5.13 seems to be even worse than 5.5.11. The "Manufacturer:" isn't even reported. If you still have 5.5.13 installed, could you post the output from "lsusb" for comparison with Comment 0? $ egrep -n 'Command line:|usb 1-1.3' dmesg-1.txt 4:Mar 30 15:00:58 kernel: Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.11-200.fc31.x86_64 root=UUID=01ea3428-c96d-4f4c-af30-2072ce724031 ro rhgb quiet elevator=noop LANG=en_US.UTF-8 mitigations=off 749:Mar 30 15:00:58 kernel: usb 1-1.3: new high-speed USB device number 4 using ehci-pci 750:Mar 30 15:00:58 kernel: usb 1-1.3: config 247 has too many interfaces: 120, using maximum allowed: 32 751:Mar 30 15:00:58 kernel: usb 1-1.3: config 247 descriptor has 1 excess byte, ignoring 752:Mar 30 15:00:58 kernel: usb 1-1.3: config 247 has 0 interfaces, different from the descriptor's value: 120 753:Mar 30 15:00:58 kernel: usb 1-1.3: New USB device found, idVendor=05ca, idProduct=18c0, bcdDevice= 7.32 754:Mar 30 15:00:58 kernel: usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 755:Mar 30 15:00:58 kernel: usb 1-1.3: Product: USB2.0 Camera 756:Mar 30 15:00:58 kernel: usb 1-1.3: Manufacturer: Ricoh Company Ltd. 757:Mar 30 15:00:58 kernel: usb 1-1.3: can't set config #247, error -32 $ egrep -n 'Command line:|usb 1-1.3' dmesg-2.txt 4:Mar 31 00:54:18 kernel: Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.13-200.fc31.x86_64 root=UUID=01ea3428-c96d-4f4c-af30-2072ce724031 ro rhgb quiet elevator=noop LANG=en_US.UTF-8 mitigations=off 750:Mar 31 00:54:18 kernel: usb 1-1.3: new full-speed USB device number 4 using ehci-pci 751:Mar 31 00:54:18 kernel: usb 1-1.3: device descriptor read/64, error -32 755:Mar 31 00:54:18 kernel: usb 1-1.3: device descriptor read/64, error -32 761:Mar 31 00:54:18 kernel: usb 1-1.3: new full-speed USB device number 5 using ehci-pci 763:Mar 31 00:54:18 kernel: usb 1-1.3: device descriptor read/64, error -32 764:Mar 31 00:54:18 kernel: usb 1-1.3: device descriptor read/64, error -32 777:Mar 31 00:54:19 kernel: usb 1-1.3: new full-speed USB device number 6 using ehci-pci 818:Mar 31 00:54:19 kernel: usb 1-1.3: device not accepting address 6, error -32 823:Mar 31 00:54:19 kernel: usb 1-1.3: new full-speed USB device number 7 using ehci-pci 824:Mar 31 00:54:20 kernel: usb 1-1.3: device not accepting address 7, error -32
The kernel command-line has two non-standard options: elevator=noop # What is this for? mitigations=off # Disable all optional CPU mitigations. Could you try removing them from the kernel command-line in grub2 and booting without them. (Press "e" while the grub2 menu is displayed to edit the kernel command-line. The change is not permanent.) Also, let's make sure the kernel isn't tainted: $ cat /proc/sys/kernel/tainted ("0" means untainted.) Documentation: The kernel’s command-line parameters https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html Tainted kernels https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html
(In reply to Steve from comment #9) ... > Also, let's make sure the kernel isn't tainted: > > $ cat /proc/sys/kernel/tainted > > ("0" means untainted.) ... Could you disable or remove the vboxdrv kernel module for all future testing: $ fgrep -in taint dmesg* dmesg-1.txt:864:Mar 30 15:01:02 kernel: vboxdrv: loading out-of-tree module taints kernel. dmesg-1.txt:865:Mar 30 15:01:02 kernel: vboxdrv: module verification failed: signature and/or required key missing - tainting kernel
Thanks for the reply. I still have the 5.5.13 kernel booted. $ lsusb Bus 002 Device 003: ID 093a:2510 Pixart Imaging, Inc. Optical Mouse Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0489:e036 Foxconn / Hon Hai Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub That is worse than before. It doesn't even identify the webcam. $ cat /proc/sys/kernel/tainted 0 The new kernel didn't load the vboxdrv module. Maybe the 5.5.11 stable module isn't compatible with 5.5.13, but shouldn't dnf have warned about that? Years ago when I got this laptop, I found a shop that let me boot a Fedora live CD to check that it would work. I have had other laptops that needed blobs with proprietary drivers for the video and the wifi. elevator=noop # What is this for? It is for a simple FIFO I/O scheduler. When I got the laptop, I replaced the Windows drive with an SSD, and at that time, the default Linux I/O scheduler did a lot of work to optimize for hard disks that was unnecessary on an SSD. It made a difference then, but it is probably not needed now. mitigations=off # Disable all optional CPU mitigations. It is a little risky, but my laptop lives on a home LAN. I have disabled all servers listening on outside ports. I have the firewall set to block every port that I don't need. The laptop has an i5-2450M CPU, which has all of the flaws, and enabling mitigations makes it run a few percent slower and slightly hotter. I made a file /etc/modprobe.d/blacklist-vboxdrv.conf with the line "blacklist vboxdrv". I used VirtualBox for a project a few years ago, and I have no immediate need for it. >Actually, 5.5.13 will become the stable kernel, unless there are serious problems with it, which there don't appear to be. So I can just leave it installed, and the stable will eventually catch up. >Actually, you could test F32-Beta as a Live image by downloading the ISO file and installing it on a USB flash drive or a DVD. I can't because I am in an area with coronavirus lockdown, and I don't have a blank DVD and I don't have a flash drive large enough. That is why I want to be very cautious about what I do. I'll reboot and see what happens.
Created attachment 1674987 [details] kernel 5.5.13 log with mitigation and no vbox with journalctl --no-hostname -k I rebooted 5.5.13 with the vbox module blacklisted and with the boot command edited to remove the options that you mentioned. It still didn't find the webcam.
(In reply to William Bader from comment #12) > Created attachment 1674987 [details] > kernel 5.5.13 log with mitigation and no vbox with journalctl --no-hostname > -k > > I rebooted 5.5.13 with the vbox module blacklisted and with the boot command > edited to remove the options that you mentioned. It still didn't find the > webcam. Thanks for testing that configuration, and for the "lsusb" output. Let's try an older kernel. This is the F31 release kernel: $ dnf -q repoquery kernel --repo=fedora kernel-0:5.3.7-301.fc31.x86_64 If that is not listed in the grub2 menu, you can boot the "rescue" kernel. It's the one with the long number in the name. You can check the kernel version with: $ file /boot/vmlinuz-0-rescue-* Please do not post the full name with the long number, it is your machine id and is considered confidential: "man machine-id", "/etc/machine-id".
(In reply to William Bader from comment #11) ... > Years ago when I got this laptop, I found a shop that let me boot a Fedora live CD to check that it would work. ... Do you have any older Live CDs or DVDs that you could try? Also, there is no Zoom app in Fedora. Where is that from? The idea would be to install Zoom into a Live image environment, assuming the Live image gives you a working camera. Although it would not survive a reboot, if the Zoom app is compatible with the Live image, using a Live image might serve as a workaround.
Created attachment 1675918 [details] photo of boot from Fedora 31 live CD with the webcam working The 5.3.7-301.fc31.x86_64 kernel on the Fedora 31 live CD works. It creates /dev/video0 and applications like 'cheese' work. So the problem is a kernel regression somewhere between 5.3.7 and 5.5.11. I have attached a photo as proof, and the photo has the kernel version numbers. I am running zoom by browsing to the meeting URL on chrome. It then uses xdg to run an executable, I think from the package zoom-3.5.372466.0322-1.x86_64 but the underlying problem is a kernel regression that fails to create /dev/video. Theoretically I could install zoom on the live CD, but my laptop has only 8GB RAM, and there isn't much space left on the live filesystem, and I would rather not mount my system disk from the live CD. I can use my phone for a zoom meeting work-around, but audio-only on my laptop is fine. Is there a way to install kernel binaries to try to bisect the version with the webcam regression? Is it OK to install only the kernel package, or do the corresponding kernel-core, kernel-headers, and kernel-modules also have to be installed? I am on virus lockdown, and I don't have a blank DVD or a spare pen drive, or an easy way to get one. I've always tested the live system on a DVD because I have never been successful getting my laptop to boot from a pen drive. While I'm thinking about it, shutting down the live CD is a pain. Once my laptop powers down, it doesn't listen to the eject button, and if I reboot, it starts reading the CD before checking the eject button. The live CD should have a short boot delay to make time to eject the CD and boot from the system disk. I end up either ejecting the CD while the live system is running and using the power button to turn off the laptop or else shutting down and then poking the drive with a paperclip. Regards, William
(In reply to William Bader from comment #15) > Created attachment 1675918 [details] > photo of boot from Fedora 31 live CD with the webcam working > > The 5.3.7-301.fc31.x86_64 kernel on the Fedora 31 live CD works. > It creates /dev/video0 and applications like 'cheese' work. > > So the problem is a kernel regression somewhere between 5.3.7 and 5.5.11. > I have attached a photo as proof, and the photo has the kernel version numbers. Thanks for testing with the Live CD and for the screenshot. > I am running zoom by browsing to the meeting URL on chrome. It then uses xdg > to run an executable, I think from the package zoom-3.5.372466.0322-1.x86_64 > but the underlying problem is a kernel regression that fails to create > /dev/video. Theoretically I could install zoom on the live CD, but my laptop > has only 8GB RAM, and there isn't much space left on the live filesystem, > and I would rather not mount my system disk from the live CD. You could install directly into the Live image without mounting your system disk. However, "zoom" doesn't seem to be a Fedora package. Do you have some non-Fedora repos configured? $ dnf repolist > I can use my phone for a zoom meeting work-around, but audio-only on my laptop is fine. > > Is there a way to install kernel binaries to try to bisect the version with the webcam regression? Yes. I will post a followup comment on that subject. > Is it OK to install only the kernel package, or do the corresponding > kernel-core, kernel-headers, and kernel-modules also have to be installed? > > I am on virus lockdown, and I don't have a blank DVD or a spare pen drive, > or an easy way to get one. I've always tested the live system on a DVD > because I have never been successful getting my laptop to boot from a pen > drive. While I'm thinking about it, shutting down the live CD is a pain. > Once my laptop powers down, it doesn't listen to the eject button, and if I > reboot, it starts reading the CD before checking the eject button. The live > CD should have a short boot delay to make time to eject the CD and boot from > the system disk. I end up either ejecting the CD while the live system is > running and using the power button to turn off the laptop or else shutting > down and then poking the drive with a paperclip. There is usually a special key that you can press to get a list of boot devices. > Regards, William
(In reply to William Bader from comment #15) ... > Is there a way to install kernel binaries to try to bisect the version with the webcam regression? > > Is it OK to install only the kernel package, or do the corresponding kernel-core, kernel-headers, and kernel-modules also have to be installed? ... All of the kernel builds are here: https://koji.fedoraproject.org/koji/packageinfo?packageID=8 You probably only need kernel-core and kernel-modules. You might need kernel-modules-extra (It's hard to say without testing.) The "kernel" package is a meta-package that pulls in other packages. It doesn't have any files in it and you don't need to install it for testing purposes: $ rpm -ql kernel-5.5.13-200.fc31.x86_64 (contains no files) However, dnf will try to remove old kernels unless you make this change: $ grep installonly_limit /etc/dnf/dnf.conf #installonly_limit=3 installonly_limit=0 Download into an empty directory and install with: # dnf install kernel*.rpm Documentation: "man dnf.conf"
(In reply to Steve from comment #16) ... > There is usually a special key that you can press to get a list of boot devices. ... Your manual would probably say which key to press. This Sony document says to repeatedly press the F11 key while booting (in the section titled "To recover from Recovery Media"): https://www.sony.com/electronics/support/res/manuals/Z019/Z019650111.PDF
(In reply to William Bader from comment #15) ... > I am running zoom by browsing to the meeting URL on chrome. It then uses xdg > to run an executable, I think from the package zoom-3.5.372466.0322-1.x86_64 > but the underlying problem is a kernel regression that fails to create > /dev/video. Theoretically I could install zoom on the live CD, but my laptop > has only 8GB RAM, and there isn't much space left on the live filesystem, > and I would rather not mount my system disk from the live CD. ... If "zoom" is a standalone package, you could put it on a USB flash drive and mount that in the Live environment.
Steve, thanks for the replies. If I can find the kernel with the regression, is there any chance that someone would try to look into what happened? A long time ago I wrote MSDOS programs that used C and masm to write to video ram. If I can narrow the regression to a few commits, I might get lucky and see what caused the problem. The messages starting 'kernel: usb 1-1.3: config 247' come from https://github.com/torvalds/linux/blob/v5.4/drivers/usb/core/config.c#L582 or from the later https://github.com/torvalds/linux/blame/v5.5/drivers/usb/core/config.c#L631 Could the webcam hardware be doing something bad? https://github.com/torvalds/linux/commit/3dd550a2d36596a1b0ee7955da3b611c031d3873 (but this is for 5.3.0 and 5.3.7 works) https://github.com/torvalds/linux/blame/v5.5/drivers/usb/core/config.c#L645 (but this was in since 2.6.12. Is it worth booting again with the live CD to see if it gets the 'config 247 descriptor has 1 excess byte, ignoring'?) Do you think that any of the issues below could be the same as mine: https://bugzilla.kernel.org/show_bug.cgi?id=111291 (uvcvideo 1-1.4:1.0: Entity type for entity Extension 4 was not initialized! / modified 2020-01-19) https://bugzilla.kernel.org/show_bug.cgi?id=199715 (hp_accel: probe of HPQ6007:00 failed with error -22 (HP Envy x360) / modified 2020-04-01) https://bugzilla.kernel.org/show_bug.cgi?id=205271 (Internal Webcams of Samsung Galaxy Tab (W728N) not working / reported 2019-10-20) https://bugzilla.kernel.org/show_bug.cgi?id=206357 (Linux Kernel 5.4.7 - vgacon_invert_region use-after-free / reported 2020-01-30) If I want to try building a kernel from source, is the best way https://fedoraproject.org/wiki/Building_a_custom_kernel#Building_a_Kernel_from_the_Fedora_source_tree ? Regards, William
(In reply to William Bader from comment #20) Since you already know about git, and have a reliable reproducer, I would suggest doing a kernel bisection. That involves repeatedly building kernels with various git commits included or excluded: Bisecting a bug https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html Artem did that and identified the commit that caused an early boot hang: Bug 1790115 - [CRITICAL REGRESSION] Fedora's configuration of kernel >= 5.4 is not bootable He also opened a bug upstream: Bug_206175 - Fedora >= 5.4 kernels instantly freeze on boot without producing any display output https://bugzilla.kernel.org/show_bug.cgi?id=206175 But before doing any of that, I would suggest trying newer kernels from Bodhi: https://bodhi.fedoraproject.org/updates/?packages=kernel This one is in the pipeline: kernel-5.5.15-200.fc31, kernel-headers-5.5.15-200.fc31, & 1 more https://bodhi.fedoraproject.org/updates/FEDORA-2020-666f3b1ac3 And doing "lite" bisection by trying various Fedora builds, including snapshot builds, from Koji: https://koji.fedoraproject.org/koji/packageinfo?packageID=8 Snapshot builds have "gitN" in the version, where "N" is a number starting with "0".
(In reply to William Bader from comment #20) ... > Is it worth booting again with the live CD to see if it gets the 'config 247 descriptor has 1 excess byte, ignoring'? ... Attaching a log from 5.3.7 is a good idea. If you don't want to mount your system disk from the Live session, you could save the log to a USB flash instead. The USB flash drive will probably be automounted. You can find the mountpoint with: $ lsblk -f $ cd /run/media/liveuser/XXX-YYY $ ls $ journalctl --no-hostname -k > dmesg-5.3.7.txt $ cd Unmount the USB flash drive with the "Files" app. BTW, this is also a useful command: $ findmnt /dev/sda1 TARGET SOURCE FSTYPE OPTIONS /run/media/liveuser/XXX-YYY /dev/sda1 vfat rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0022,codepage=437,ioc NB: "XXX-YYY" is the obfuscated vfat file system label on my USB flash drive. Tested in a VM with Fedora-Workstation-Live-x86_64-31-1.9.iso.
(In reply to Steve from comment #22) ... > Attaching a log from 5.3.7 is a good idea. If you don't want to mount your > system disk from the Live session, you could save the log to a USB flash > instead. ... > $ journalctl --no-hostname -k > dmesg-5.3.7.txt ... While you are in the Live session, it might be a good idea to do a full USB dump too: # lsusb -v > lsusb-v-5.3.7.txt # That's run as root.
(In reply to Steve from comment #17) ... > All of the kernel builds are here: > https://koji.fedoraproject.org/koji/packageinfo?packageID=8 ... The "koji" command can be used to list kernel builds and to download specific packages: # dnf install koji The "--after" date in the following koji command is the build date for 5.3.7 taken from "uname -a" when booted from the F31 Live image: $ uname -a Linux localhost-live 5.3.7-301.fc31.x86_64 #1 SMP Mon Oct 21 19:18:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Get a list of kernel builds for "fc31". $ koji list-builds --package=kernel --state=COMPLETE --after=2019-10-21 --reverse --quiet | fgrep '.fc31' kernel-5.5.9-200.fc31 jforbes COMPLETE kernel-5.5.8-200.fc31 jforbes COMPLETE ... kernel-5.3.11-300.fc31 jforbes COMPLETE kernel-5.3.10-300.fc31 labbott COMPLETE Specific kernel packages can then be downloaded as desired. I would suggest starting with the earliest 5.4 build, which appears to be 5.4.2: $ koji download-build --rpm kernel-core-5.4.2-300.fc31.x86_64.rpm Downloading: kernel-core-5.4.2-300.fc31.x86_64.rpm [====================================] 100% 31.12 MiB $ koji download-build --rpm kernel-modules-5.4.2-300.fc31.x86_64.rpm Downloading: kernel-modules-5.4.2-300.fc31.x86_64.rpm [====================================] 100% 28.41 MiB
NB: The packages are unsigned. However, they should pass this check: $ rpmkeys --checksig kernel*.rpm kernel-core-5.4.2-300.fc31.x86_64.rpm: digests OK kernel-modules-5.4.2-300.fc31.x86_64.rpm: digests OK
This will show you some snapshot (gitN) builds: $ koji list-builds --package=kernel --state=COMPLETE --after=2019-09-01 --reverse --quiet | fgrep '.fc32' ^^^^^^^^^^ ^^ I moved the date back to get this build: kernel-5.4.0-0.rc0.git1.1.fc32 jcline COMPLETE Which is here: kernel-5.4.0-0.rc0.git1.1.fc32 https://koji.fedoraproject.org/koji/buildinfo?buildID=1379422 Note, in particular, what the changelog says: * Tue Sep 17 2019 Jeremy Cline <jcline> - 5.4.0-0.rc0.git1.1 - Linux v5.3-2061-gad062195731b ^^^^^^^^^^^^^^^^^^^^^^^ That is a commit ID that can be passed into a git command: $ git show --oneline v5.3-2061-gad062195731b ad06219573 Merge tag 'platform-drivers-x86-v5.4-1' of git://git.infradead.org/linux-platform-drivers-x86 The "git describe" command generates those strings: $ git describe --abbrev=12 ad06219573 v5.3-2069-gad062195731b The commit IDs in the changelog are very important because they are what you would use to start a kernel bisection.
Created attachment 1676176 [details] good and bad dmesg and lsusb output I used koji to try a few kernels back to 5.3.7-300, and they didn't work to find the webcam. Then I tried the live cd, and it didn't work either, which was strange. I had been selecting 'restart' to reboot. I did a shutdown and a cold boot, and then the live cd found the webcam. Then I retried some of the kernels. 5.4.10-200 is good. 5.4.11-200 is bad. 5.3.7-300 good 5.3.9-300 good 5.4.2-300 good 5.4.10-200 good 5.4.11-200 bad 5.4.11-201 bad 5.4.11-202 bad 5.4.13-200 bad 5.4.20-200 bad 5.5.2-200 bad So there is a regression in 5.4.11-200, and in addition there might be a second bug not resetting something during warm boots. I attached a tar with an example of 'journalctl --no-hostname -k' and 'lsusb -v' on good and bad kernel. I tried F12 on reboot after shutting down the live cd. The console eventually got a "No operating system" message, which was a bit scary, but it let me eject the CD with the button, and then I could reboot from my system disk. Maybe F12 looks for a hidden recovery area on the original Sony OEM Windows hard drive. I added sort -Vr to your koji list-builds command koji list-builds --package=kernel --state=COMPLETE --after=2019-10-21 --reverse --quiet | fgrep '.fc31' | sort -Vr What is the next step? I want to avoid building a lot of kernels because this laptop runs hot, and my last laptop burnt up building gcc. It seems reproducible that a cold boot with 5.4.10-200 or earlier finds the webcam, while 5.4.11-200 or higher does not. There was a 5.4.11-201 and 5.4.11-202, which could be a sign that 5.4.11 had some big changes that broke some things. I set installonly_limit=0 in dnf.conf. If I put it back to 3, will it purge the kernel-core and kernel-modules that I downloaded from koji and installed? I want to keep the good 5.4.10-200 kernel but I need to let dnf purge 5.5 kernels because /boot will fill up. My /boot is 465MB, which is enough for normal updates but not big enough to keep every test kernel. Regards, William
After all of that, I booted back to 5.5.13-200.fc31.x86_64, and the webcam is working. I think that I did a warm boot from 5.4.10-200 (which I rechecked after 5.4.11-200 didn't work), so it seems like warm boots depend on whether the webcam was working before, while cold boots depend on the kernel that is booting. So 5.4.11-200 still has a regression, but there is a second bug, possibly hardware, that doesn't reinitialize something during warm boots. Regards, William
Very nice work investigating this problem, but first: > My /boot is 465MB, which is enough for normal updates but not big enough to keep every test kernel. You can use dnf to manually remove kernels you don't need: Get a list of installed kernels: # dnf list --installed kernel-core\* Remove packages for a specific kernel version (use copy and paste to form the command-line from the list): # dnf remove kernel\*-5.5.10-200.fc31 ^^ As usual, dnf will ask you before doing anything. The wildcard has to be in that exact position. That's my procedure when I have "installonly_limit=0" in dnf.conf. > I added sort -Vr to your koji list-builds command > koji list-builds --package=kernel --state=COMPLETE --after=2019-10-21 --reverse --quiet | fgrep '.fc31' | sort -Vr ^^^^^^^^^ ^ Thanks! I didn't know about the "-V" option, even though it is in the man page. I don't know why the "koji" command doesn't do a version sort. Your command-line reverses the "--reverse", so, if you prefer increasing order, both reversals can be removed: $ koji list-builds --package=kernel --state=COMPLETE --after=2019-10-21 --quiet | fgrep '.fc31' | sort -V
> 5.4.10-200 good > 5.4.11-200 bad Excellent. There are 165 commits in that range (the two "tag" commits don't count): $ git log --oneline v5.4.10^..v5.4.11 | wc -l 167 These explicitly mention "usb": $ git log --oneline --grep usb v5.4.10^..v5.4.11 7cbdf96cda usb: missing parentheses in USE_NEW_SCHEME 578289f847 USB: core: fix check for duplicate endpoints 158cbd970b usb: dwc3: gadget: Fix request complete check 72cd84ea52 net: usb: lan78xx: fix possible skb leak e36491f117 usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state' 27fc4a9e4a net: usb: lan78xx: Fix error message format specifier 61e861528e USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein And these are the commit IDs for the two tags: $ fig-tags.sh 'v5.4.1[01]' fd74b603ed 2020-01-12 12:23:15 +0100 v5.4.11 7622136b11 2020-01-09 10:25:55 +0100 v5.4.10 fig-tags.sh is my shell script with this git command-line: git tag --list --format='%(objectname:short) %(creatordate:iso) %(refname:short)' --sort='-creatordate' -- "$@" ("fig" is not "git", so I don't get confused ... :-))
Created attachment 1676213 [details] git-log-oneline-v5.4.10-v5.4.11.txt $ git log --oneline v5.4.10^..v5.4.11 > /tmp/git-log-oneline-v5.4.10-v5.4.11.txt
> I want to avoid building a lot of kernels because this laptop runs hot, and my last laptop burnt up building gcc. Do you have a desktop system that you could use for doing kernel builds? For kernel bisection, Artem used a desktop system to build the kernels and then transferred them to his laptop over the network. (Bug 1790115, Comment 55) A USB flash drive could also be used to transfer kernels between systems.
I have remote access to CentOS 6 and 7 VMs on a server in the office. Do I have to do anything special if I do a build on a more advanced CPU than my laptop? The CentOS 6 VM has 15GB free, gcc 4.4.7 20120313, and 4 virtual cpus. The CentOS 7 VM has 40GB free, gcc 4.8.5 20150623 (plus I have installed devtoolset-8), and 1 virtual cpu. How do I gather the results to copy them? A long time ago, I built custom kernels with slackware, and it was something like 'make menuconfig;make;make modules_install;make install', which would be messy to copy. How can I build something similar to the kernel-core and kernel-modules rpms that I downloaded with koji? I don't want to risk copying bzImage files and leaving my laptop unbootable, although I have the live cd and a usb drive with recent backup. Regards, William
(In reply to William Bader from comment #33) > I have remote access to CentOS 6 and 7 VMs on a server in the office. That's a good idea, but none of the Fedora tools have been tested on CentOS. Can anyone provision a Fedora 31 VM? > Do I have to do anything special if I do a build on a more advanced CPU than my laptop? If anything is to be configured, it would be in the kernel ".config" file. And, of course, the tools and packages needed to do the build. You will also need network access to install packages required for the build. > The CentOS 6 VM has 15GB free, gcc 4.4.7 20120313, and 4 virtual cpus. > The CentOS 7 VM has 40GB free, gcc 4.8.5 20150623 (plus I have installed devtoolset-8), and 1 virtual cpu. How much physical memory is on the server and how much memory has been allocated to the VMs? I have multiple VMs configured on my desktop system, which has 8GB of memory, but I never try to run more than one VM at a time, since each VM is configured with 4GB of memory. I'm not sure how well this would work, but could you create an F31 VM inside the CentOS VM? > How do I gather the results to copy them? A long time ago, I built custom kernels with slackware, and > it was something like 'make menuconfig;make;make modules_install;make install', which would be messy to copy. If you are asking how to control your session, you could login with "ssh". Or you could use remote desktop sharing. If you are asking how to transfer the kernel files from the VM to your laptop, you could use "scp". That is what I have done to transfer files from a VM on a Fedora host to the host: VM -> Host. I believe something like that would work for you too. Alternatively, you could set up an https or sftp server on the VM. (That, I don't have much experience with, though. And your sysadmin might have some concerns.) A third way would be to send the files up to the "cloud" and download them from there to your laptop. > How can I build something similar to the kernel-core and kernel-modules rpms that I downloaded with koji? That's a good question and I don't have a complete answer. I believe you would use a Fedora kernel config file. However, Fedora kernels are also built with Fedora patches, so I would suggest ignoring them and seeing if the problem occurs with a "vanilla" kernel build. > I don't want to risk copying bzImage files and leaving my laptop unbootable, although I have the live cd and a usb drive with recent backup. As long as you have grub2 configured and known-working kernels in /boot, you should be OK. I will have to research exactly how you configure grub2 to boot the test kernel, but I believe it would be as easy as adding a ".conf" file to /boot/loader/entries/. (Assuming you have BLS enabled.) > Regards, William
"Can anyone provision a Fedora 31 VM?" BTW, there is a Fedora Server edition, which is bare bones compared to the Fedora Workstation edition, but that might work for a terminal/command-line only session just for building kernels: Fedora Server. https://getfedora.org/en/server/
(In reply to Steve from comment #35) > "Can anyone provision a Fedora 31 VM?" Some companies offer free servers and, for a fee, you can get more memory and disk space. For example: Amazon Lightsail Virtual servers, storage, databases, and networking for a low, predictable price. https://aws.amazon.com/lightsail/
See, also: Linux virtual machines in Azure https://azure.microsoft.com/en-us/services/virtual-machines/linux/ Both AWS and Azure support Red Hat Linux.
Thanks, I will ask on Monday if someone in the office can provision a Fedora 31 VM. I have VPN access and about 20Mbps effective throughput, so copying 100MB files isn't a problem. When I asked about transferring kernel files, I meant whether it is just one vmlinuz file or a lot of little files with kernel modules, maps, ramfs images, modules, headers, that I have to locate, gather up, copy, and then install in the correct place on my laptop. I would do the builds on off hours, so I might only be able to do one per day. If I can build Fedora kernels, can I do bisections? If the rpmbuild script applies patches after pulling a git commit, will that confuse bisections? You said earlier that I could get some snapshots from https://koji.fedoraproject.org/koji/packageinfo?packageID=8 Could I use those snapshots to narrow the range of commits to narrow the range to search? It looks like they went from 5.4.10 to 5.4.11 without any git snapshots in between, so maybe it won't help.
(In reply to William Bader from comment #38) > Thanks, I will ask on Monday if someone in the office can provision a Fedora 31 VM. You might be able to remotely complete the install yourself after the VM is configured and the installer image is booted: Installing Using VNC https://docs.fedoraproject.org/en-US/fedora/f31/install-guide/advanced/VNC_Installations/ > I have VPN access and about 20Mbps effective throughput, so copying 100MB files isn't a problem. OK. > When I asked about transferring kernel files, I meant whether it is just one vmlinuz file or a lot of little files with kernel modules, maps, ramfs images, modules, headers, that I have to locate, gather up, copy, and then install in the correct place on my laptop. OK, you are asking about how to install the kernel on the target system. > I would do the builds on off hours, so I might only be able to do one per day. Artem did a complete bisection in about an hour and a half. (Bug 1790115, Comment 55) > If I can build Fedora kernels, can I do bisections? If the rpmbuild script applies patches after pulling a git commit, will that confuse bisections? The Fedora build tools can't do bisections, because they simply download a kernel tarball. You need a clone of the kernel git repo. The "git bisect" command chooses which commits are built in each iteration. > You said earlier that I could get some snapshots from https://koji.fedoraproject.org/koji/packageinfo?packageID=8 > Could I use those snapshots to narrow the range of commits to narrow the range to search? > It looks like they went from 5.4.10 to 5.4.11 without any git snapshots in between, so maybe it won't help. That's plenty narrow. Bisection is very efficient: log2(165 commits) is about 8 builds. (Comment 30) Documentation: $ git bisect --help Bisecting a bug https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html Subject: Re: git pull on Linux/ACPI release tree From: Linus Torvalds <...> Date: Tue, 10 Jan 2006 11:28:58 -0800 (PST) https://lore.kernel.org/git/Pine.LNX.4.64.0601101111110.4939@g5.osdl.org/
(In reply to Steve from comment #39) > You need a clone of the kernel git repo. The "git bisect" command chooses which commits are built in each iteration. This can be done on your laptop: $ git clone --shallow-exclude=linux-5.3.y --branch linux-5.4.y https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4 Cloning into 'linux-5.4'... remote: Enumerating objects: 256511, done. remote: Counting objects: 100% (256511/256511), done. remote: Compressing objects: 100% (122936/122936), done. remote: Total 256511 (delta 166877), reused 181371 (delta 131845) Receiving objects: 100% (256511/256511), 219.49 MiB | 1.80 MiB/s, done. Resolving deltas: 100% (166877/166877), done. Checking out files: 100% (65704/65704), done. $ cd linux-5.4 $ git branch * linux-5.4.y $ git tag --list --sort=-version:refname This is a simulated partial bisection. Note that git estimates the number of steps needed: $ git bisect start $ git bisect bad v5.4.11 $ git bisect good v5.4.10 Bisecting: 82 revisions left to test after this (roughly 6 steps) [97d9e8620f57f28f415b23ad88b97c87b6d53390] bnx2x: Do not handle requests from VFs after parity $ git branch * (no branch, bisect started on linux-5.4.y) linux-5.4.y $ git status HEAD detached at 97d9e8620 You are currently bisecting, started from branch 'linux-5.4.y'. (use "git bisect reset" to get back to the original branch) nothing to commit, working tree clean At this point, you would do the first build.
(In reply to William Bader from comment #38) ... > You said earlier that I could get some snapshots from https://koji.fedoraproject.org/koji/packageinfo?packageID=8 ... Thanks for bringing that up. The short answer is that there are no snapshot builds for stable releases. Stable releases have a three-part version, such as 5.4.y, where y is greater than 0. The snapshot builds are for pre-release mainline builds. Grepping for "git" shows that: $ koji list-builds --package=kernel --state=COMPLETE --after=2019-10-21 --quiet | fgrep git | sort -Vr | less ... kernel-5.5.0-0.rc0.git6.1.fc32 jforbes COMPLETE kernel-5.4.0-0.rc8.git1.2.fc32 labbott COMPLETE ...
Thanks for the reply. If it could be done in an hour, I could try it on my laptop. My laptop is an i5-2450M CPU 2.50GHz with 4 cores (or more precisely, I think 2 cores with 2 threads per core), 8GB RAM, and a 512MB SSD. I think that if I limit 'make' to one job, it rotates the core to control the temperature. Do you have an estimate for how long the build would take? 1 hour? 4 hours? Long builds using all of the cores can bring the cpu up to 95C according to gkrellm. I don't know if my laptop will withstand an hour of that. I have /tmp as a ram disk that can grow up to 4MB. I have ccache set to use /tmp. I suppose that a kernel build would require moving ccache to use the SSD. I have done git bisections to find bugs in ghostscript and poppler. With ccache, the builds go very fast near the end if headers haven't changed. Thanks for the commands to start the bisection. The issue with provisioning a VM is that I don't have permission. I have root to the VMs I use but not to the host. Opening a GUI is not a problem. Is the procedure to build kernels at https://www.kernel.org/doc/html/latest/admin-guide/README.html ? When I do 'make config', do I need to select or unselect any options? Can I use 'make localmodconfig'? My /boot has config-5.*.fc31.x86_64 files. Is there a way to use /boot/config-5.4.10-200.fc31.x86_64 as the initial source for 'make oldconfig'? The kernel.org page doesn't say anything about initramfs. Will the 'make' build it, and I just have to find it and copy it to /boot with bzImage and then create a new file in /boot/loader/entries ? Regards, William
(In reply to William Bader from comment #42) > Thanks for the reply. If it could be done in an hour, I could try it on my laptop. > My laptop is an i5-2450M CPU 2.50GHz with 4 cores (or more precisely, I think 2 cores with 2 threads per core), 8GB RAM, and a 512MB SSD. > I think that if I limit 'make' to one job, it rotates the core to control the temperature. You read the man page again. :-) I now see that "make" has a "-j" option to limit the number of jobs run simultaneously. When doing a kernel build on my 2 core/4 thread CPU, without "-j", "top" shows up to *four* CPUs in use. In "top", press "1" to see individual CPU use. > Do you have an estimate for how long the build would take? 1 hour? 4 hours? Closer to 1 hour. > Long builds using all of the cores can bring the cpu up to 95C according to gkrellm. I don't know if my laptop will withstand an hour of that. On my desktop system, the temp. gets up to ~50C, and I manually turn up the case fans to full speed. Can you configure any settings in the BIOS setup to change how the fan speed changes with load? > I have /tmp as a ram disk that can grow up to 4MB. > I have ccache set to use /tmp. I suppose that a kernel build would require moving ccache to use the SSD. > I have done git bisections to find bugs in ghostscript and poppler. With ccache, the builds go very fast near the end if headers haven't changed. OK. > Thanks for the commands to start the bisection. Evidently you know more about doing bisections than I do. :-) > The issue with provisioning a VM is that I don't have permission. I have root to the VMs I use but not to the host. Opening a GUI is not a problem. OK. > Is the procedure to build kernels at https://www.kernel.org/doc/html/latest/admin-guide/README.html ? That looks good. > When I do 'make config', do I need to select or unselect any options? Can I use 'make localmodconfig'? Try to use the same settings as Fedora. > My /boot has config-5.*.fc31.x86_64 files. Is there a way to use /boot/config-5.4.10-200.fc31.x86_64 as the initial source for 'make oldconfig'? Yes, but I have to research the details. I did a diff on the Fedora config files for 5.4.10 and 5.4.11, and they appear to be the same. > The kernel.org page doesn't say anything about initramfs. Will the 'make' build it, and I just have to find it and copy it to /boot with bzImage and Excellent question. Again, that is something I have to research. In Fedora, the "dracut" command is used to build an initramfs during a kernel update. The dates on the initramfs files in /boot should correspond to when you installed the kernels: $ ls -l /boot/initramfs-5.5.8-200.fc31.x86_64.img $ rpm -qi kernel-5.5.8-200.fc31.x86_64 | grep 'Install' > then create a new file in /boot/loader/entries ? If you always use "bzImage" as the kernel name, you will only need to create one ".conf" file that you can use repeatedly for each test boot. I have not tried this, but you might be able to use a link from "bzImage" to a uniquely named "bzImage" file, such as "bzImage-1". > Regards, William
Before starting the bisection, it would be a very good idea to do test builds of 5.4.10 and 5.4.11 to verify that they are indeed "good" and "bad", respectively, when you build them. So that's two more builds than what "git bisect" says. (Comment 40)
There is one difference in the config files for 5.4.10 and 5.4.11. They have different values for "CONFIG_BUILD_SALT". That seems to be an insignificant difference. However, for future reference here is the procedure: This procedure does not require installing the kernel packages. Download with "koji download-build --rpm" and extract with "rpmdev-extract" to get: $ ls -1F kernel-core-5.4.10-200.fc31.x86_64/ kernel-core-5.4.10-200.fc31.x86_64.rpm kernel-core-5.4.11-202.fc31.x86_64/ kernel-core-5.4.11-202.fc31.x86_64.rpm $ find . -name config ./kernel-core-5.4.10-200.fc31.x86_64/lib/modules/5.4.10-200.fc31.x86_64/config ./kernel-core-5.4.11-202.fc31.x86_64/lib/modules/5.4.11-202.fc31.x86_64/config $ diff -u0 ./kernel-core-5.4.10-200.fc31.x86_64/lib/modules/5.4.10-200.fc31.x86_64/config ./kernel-core-5.4.11-202.fc31.x86_64/lib/modules/5.4.11-202.fc31.x86_64/config ... @@ -3 +3 @@ -# Linux/x86_64 5.4.10-200.fc31.x86_64 Kernel Configuration +# Linux/x86_64 5.4.11-202.fc31.x86_64 Kernel Configuration @@ -30 +30 @@ -CONFIG_BUILD_SALT="5.4.10-200.fc31.x86_64" +CONFIG_BUILD_SALT="5.4.11-202.fc31.x86_64"
Thanks. I kept the 5.4.10-200 and 5.4.11-200 kernels installed, so I can diff their config files in /boot. Sorry for not mentioning that. My laptop idles at nearly 60C. powertop shows about 150 wakeups/sec with a few tabs open in chrome, including this one. Can I use Linux power management to control the fans? I tried pwmconfig but it said "There are no pwm-capable sensor modules installed." Every once in a while a new kernel makes it run hotter: https://bugzilla.redhat.com/show_bug.cgi?id=1329101 I'm not sure how powertop is calibrated, but when it says the cpu is 90C, the vent on the side of the laptop is too hot to touch. I'll see if I can get a VM for builds on Monday. Regards, William
(In reply to William Bader from comment #42) ... > My /boot has config-5.*.fc31.x86_64 files. > Is there a way to use /boot/config-5.4.10-200.fc31.x86_64 as the initial source for 'make oldconfig'? ... Since you have 5.4.10 and 5.4.11 installed, you can simply copy one of corresponding config files from /boot to ".config" in the kernel build directory. And then run "make oldconfig". Here is the procedure for 5.4.10: $ git checkout v5.4.10 Note: checking out 'v5.4.10'. ... HEAD is now at 7a02c1932 Linux 5.4.10 $ git branch * (HEAD detached at v5.4.10) linux-5.4.y $ cp -ip config-5.4.10 .config # config-5.4.10 is a link to the config file extracted from the kernel-core rpm, as in Comment 45. $ make oldconfig HOSTCC scripts/basic/fixdep HOSTCC scripts/kconfig/conf.o HOSTCC scripts/kconfig/confdata.o HOSTCC scripts/kconfig/expr.o LEX scripts/kconfig/lexer.lex.c YACC scripts/kconfig/parser.tab.[ch] HOSTCC scripts/kconfig/lexer.lex.o HOSTCC scripts/kconfig/parser.tab.o HOSTCC scripts/kconfig/preprocess.o HOSTCC scripts/kconfig/symbol.o HOSTLD scripts/kconfig/conf scripts/kconfig/conf --oldconfig Kconfig # # configuration written to .config # $ diff -u0 config-5.4.10 .config ... @@ -3 +3 @@ -# Linux/x86_64 5.4.10-200.fc31.x86_64 Kernel Configuration +# Linux/x86 5.4.10 Kernel Configuration @@ -8315 +8314,0 @@ -CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT=y @@ -8319 +8317,0 @@ -CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ=y
There doesn't seem to be any difference with v5.4.11: $ git checkout v5.4.11 $ cp -ip config-5.4.11 .config cp: overwrite '.config'? y $ make oldconfig $ diff -u0 config-5.4.11 .config ... @@ -3 +3 @@ -# Linux/x86_64 5.4.11-202.fc31.x86_64 Kernel Configuration +# Linux/x86 5.4.11 Kernel Configuration @@ -8315 +8314,0 @@ -CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT=y @@ -8319 +8317,0 @@ -CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ=y So the question is why are those config options in the Fedora configs, but not in the vanilla configs?
(In reply to Steve from comment #48) ... > So the question is why are those config options in the Fedora configs, but not in the vanilla configs? Fedora kernels have some patches that are not in the vanilla kernel, and those config options are in such patches: $ grep CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT .*.patch .efi-secureboot.patch:+#ifdef CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT $ grep CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ .*.patch .lift-lockdown-sysrq.patch:+#ifdef CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ .lift-lockdown-sysrq.patch:+#endif /* CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ */ Those patches were extracted from kernel-5.4.10-200.fc31.src.rpm in which there are 37 patches: $ ls -1 .*.patch | wc -l 37 To authentically rebuild the Fedora kernel, those patches would also need to be applied. My suggestion is to see if you can reproduce the problem after building *without* those patches.
I am now running a kernel build with: $ time make -j 1 "top" shows only one "cc1" process running at a time, and there is, at most, high CPU utilization on only one CPU at a time (out of 4). Further, the system is running as cool as a cucumber: ~40C to ~45C.
(In reply to Steve from comment #50) > I am now running a kernel build with: > > $ time make -j 1 Build time is about 2 hours, 15 minutes with an Intel i3 desktop CPU (3.40GHz max, 2 cores/4 threads; per /proc/cpuinfo; no other loads): real 134m39.631s user 115m36.398s sys 15m6.718s > "top" shows only one "cc1" process running at a time, and there is, at most, high CPU utilization on only one CPU at a time (out of 4). > > Further, the system is running as cool as a cucumber: ~40C to ~45C. Perhaps that is a bit too optimistic. Idle temps are ~25C to ~30C. Case fans are set to ~1000 RPM with manual controllers (dual Zalman Fan Mates).
Build results: $ ls -1sh vmlinux* 698M vmlinux* 810M vmlinux.o $ wc -l modules.builtin modules.order Module.symvers 258 modules.builtin 3514 modules.order 21317 Module.symvers ... The value of CONFIG_BUILD_SALT appears to get embedded in the output files: $ grep -l '5.4.11-202.fc31.x86_64' .config vmlinux* .config vmlinux vmlinux.o So it might be a good idea to change it, because the actual build was for: $ git branch * (HEAD detached at v5.4.10) linux-5.4.y And possibly set: $ grep -A1 'CONFIG_LOCALVERSION' .config CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_BUILD_SALT="5.4.11-202.fc31.x86_64"
(In reply to Steve from comment #52) ... > And possibly set: > > $ grep -A1 'CONFIG_LOCALVERSION' .config > CONFIG_LOCALVERSION="" > # CONFIG_LOCALVERSION_AUTO is not set > CONFIG_BUILD_SALT="5.4.11-202.fc31.x86_64" To change those, I used: $ make nconfig That gives you a nice ncurses user interface. Press "F2" to see what configuration option you are actually setting. Probably the only important setting is: CONFIG_LOCALVERSION_AUTO=y That inserts a version string into vmlinux and vmlinux.o: $ fgrep -a -m1 'Linux version' vmlinux-5.4.10.localversion2 Linux version 5.4.10.localversion2 (xxx@yyy) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #5 SMP ... ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^ My user name and hostname were also being inserted. They are obfuscated here as "xxx@yyy". It turns out that environment variables are also used to configure the build. Specifically: export KBUILD_BUILD_USER="test-user-1" export KBUILD_BUILD_HOST="test-host-1" Those strings ultimately end up in this file: include/generated/compile.h. Here are all the changes I made and the changes made by "make oldconfig": $ diff -u0 config-5.4.10 .config.EXP3 ... @@ -3 +3 @@ -# Linux/x86_64 5.4.10-200.fc31.x86_64 Kernel Configuration +# Linux/x86 5.4.10 Kernel Configuration @@ -28,3 +28,3 @@ -CONFIG_LOCALVERSION="" -# CONFIG_LOCALVERSION_AUTO is not set -CONFIG_BUILD_SALT="5.4.10-200.fc31.x86_64" +CONFIG_LOCALVERSION=".localversion2" +CONFIG_LOCALVERSION_AUTO=y +CONFIG_BUILD_SALT="buildidsalt2" @@ -43 +43 @@ -CONFIG_DEFAULT_HOSTNAME="(none)" +CONFIG_DEFAULT_HOSTNAME="bisection-hostname" @@ -8315 +8314,0 @@ -CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT=y @@ -8319 +8317,0 @@ -CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ=y
This is a summary of the first simulated bisection build: I removed the "-j 1" option, but caching in .ccache could explain some of the speedup: $ time make ... real 73m11.885s user 60m20.050s sys 8m2.752s They have slightly different sizes: $ ls -1s vmlinux* 713764 vmlinux* 713756 vmlinux-5.4.10.localversion2* 828660 vmlinux.o 828652 vmlinux.o-5.4.10.localversion2 $ grep -a 'Linux version' vmlinux | head Linux version 5.4.10.localversion2-00083-g97d9e8620 (test-user-1@test-host-1) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #6 SMP ... That version string can be used directly in git commands: $ git log --oneline -n1 5.4.10.localversion2-00083-g97d9e8620 97d9e8620 (HEAD) bnx2x: Do not handle requests from VFs after parity Which matches: $ git describe 97d9e8620 v5.4.10-83-g97d9e8620 The commit ID here: $ git branch * (no branch, bisect started on 9d61432ef) linux-5.4.y Is: $ git log --oneline -n1 9d61432ef 9d61432ef (tag: v5.4.11, refs/bisect/bad) Linux 5.4.11
(In reply to Steve from comment #39) ... > > When I asked about transferring kernel files, I meant whether it is just one vmlinuz file or a lot of little files with kernel modules, maps, ramfs images, modules, headers, that I have to locate, gather up, copy, and then install in the correct place on my laptop. > > OK, you are asking about how to install the kernel on the target system. ... "make help" shows a list of packaging targets, including "tarxz-pkg": $ make help ... Kernel packaging: ... tarxz-pkg - Build the kernel as a xz compressed tarball ... Since xz compression is what is used at kernel.org, that is what I chose. However, xz compression consumes nearly 100% of one CPU, and it takes a long time, but there is an uncompressed make target: tar-pkg - Build the kernel as an uncompressed tarball $ time make tarxz-pkg ... DEPMOD 5.4.10.localversion2-00083-g97d9e8620 './System.map' -> './tar-install/boot/System.map-5.4.10.localversion2-00083-g97d9e8620' '.config' -> './tar-install/boot/config-5.4.10.localversion2-00083-g97d9e8620' './vmlinux' -> './tar-install/boot/vmlinux-5.4.10.localversion2-00083-g97d9e8620' './arch/x86/boot/bzImage' -> './tar-install/boot/vmlinuz-5.4.10.localversion2-00083-g97d9e8620' Tarball successfully created in ./linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar.xz real 48m51.941s user 43m21.183s sys 1m48.391s Compression reduced the size significantly: $ ls -1sh *5.4.10.localversion2-00083-g97d9e8620* 750M linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar.xz 698M vmlinux-5.4.10.localversion2-00083-g97d9e8620* 810M vmlinux.o-5.4.10.localversion2-00083-g97d9e8620
Thanks again for the information. >My suggestion is to see if you can reproduce the problem after building *without* those patches. I'll try that first. Also, building with patches might complicate the bisection. >Build time is about 2 hours, 15 minutes That might fry my laptop. I'll wait to see if I can get a VM on a server. >tar-pkg - Build the kernel as an uncompressed tarball Thanks, that looks useful. The tar is fine because scp -C uses gzip, so for a one-time transfer, the time to compress the tar with xz costs more than the transfer time it saves. >$ time make tarxz-pkg >... > DEPMOD 5.4.10.localversion2-00083-g97d9e8620 >'./System.map' -> './tar-install/boot/System.map-5.4.10.localversion2-00083-g97d9e8620' >'.config' -> './tar-install/boot/config-5.4.10.localversion2-00083-g97d9e8620' >'./vmlinux' -> './tar-install/boot/vmlinux-5.4.10.localversion2-00083-g97d9e8620' >'./arch/x86/boot/bzImage' -> './tar-install/boot/vmlinuz-5.4.10.localversion2-00083-g97d9e8620' >Tarball successfully created in ./linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar.xz Do I need both vmlinux and vmlinuz? If I don't need vmlinux, I could remove it from the tarball before downloading it. Regards, William
I don't know why you would need vmlinux, and there is another quirk about what is in the tar file: Uncompress so we can see what is in the tar file: $ time unxz linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar.xz real 1m41.307s user 1m8.632s sys 0m3.018s $ ls -sh linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar 5.2G linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar The tar file includes two links that are absolute paths to the build and source directories: $ tar -tvf linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar \*/build \*source lrwxrwxrwx root/root 0 2020-04-05 14:03 lib/modules/5.4.10.localversion2-00083-g97d9e8620/build -> /home/[removed]/linux-5.4 lrwxrwxrwx root/root 0 2020-04-05 14:03 lib/modules/5.4.10.localversion2-00083-g97d9e8620/source -> /home/[removed]/linux-5.4 Both vmlinux and vmlinuz are included: $ tar -tvf linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar boot drwxrwxr-x root/root 0 2020-04-05 14:06 boot/ -rw-rw-r-- root/root 10623360 2020-04-05 14:07 boot/vmlinuz-5.4.10.localversion2-00083-g97d9e8620 -rw-rw-r-- root/root 4444488 2020-04-05 14:06 boot/System.map-5.4.10.localversion2-00083-g97d9e8620 -rwxrwxr-x root/root 738610672 2020-04-05 14:06 boot/vmlinux-5.4.10.localversion2-00083-g97d9e8620 -rw-rw-r-- root/root 213361 2020-04-05 14:06 boot/config-5.4.10.localversion2-00083-g97d9e8620 Modules are also included: $ tar -tf linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar lib | wc -l 4353
Since the tools create fully qualified names, "dracut" can be easily used to build an initramfs. Adapting the example in the "dracut" man page: # dracut --kver 5.4.10.localversion2-00083-g97d9e8620 Disclaimer: Although I have used "dracut" to rebuild the initramfs for Fedora kernels, I have not tested it with self-built kernels. NB: "Dracut" has a lot of options to control what goes into the initramfs.
I believe "grubby" can build the initramfs and write the ".conf" file for the grub2 menu item. Grubby has a lot of options, and I have never tried to use it, so it may take several test runs to get a suitable command-line: # dnf install grubby $ rpm -q grubby grubby-8.40-36.fc31.x86_64 $ man grubby
I transferred my test kernel tar file to a VM (with "scp") and attempted to install with: # tar -xvf linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar -C / That filled up my VM's 12GB disk and somehow, while attempting to recover, all my Fedora kernel modules were lost.* After resizing the VM's disk to 32GB with "qemu-img resize" and resizing the file system with "gparted" (from the F31 Live image**), I was able to successfully install the test kernel. But there is a big problem: $ du -s /lib/modules/* | sort -n 79108 /lib/modules/5.3.7-301.fc31.x86_64 82548 /lib/modules/5.5.13-200.fc31.x86_64 4760148 /lib/modules/5.4.10.localversion2-00083-g97d9e8620 After drilling down into the directory, it appears that all of the kernel modules were built with debug info, which makes them huge. For example: $ ls -lh rtl8723ae.ko -rw-rw-r--. 1 root root 7.2M Apr 5 14:05 rtl8723ae.ko $ file rtl8723ae.ko rtl8723ae.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=9f15845d60f2df736064391cd75878bcf17ca516, with debug_info, not stripped * I recovered by reinstalling the packages. ** Which can be selected in the VM from the "SeaBIOS" boot device menu.
>it appears that all of the kernel modules were built with debug info Is there a way to build with no debug info and also with a low optimization level like -O0 or -O1 to make the builds faster? I didn't realize that it would be so complicated. Do kernel developers really have to do so much by hand, or is this because I can't use 'make install'? Regards, William
(In reply to William Bader from comment #61) > >it appears that all of the kernel modules were built with debug info > > Is there a way to build with no debug info and also with a low optimization level like -O0 or -O1 to make the builds faster? Yes. Unset CONFIG_DEBUG_INFO in .config. See below. > I didn't realize that it would be so complicated. Do kernel developers really have to do so much by hand, or is this because I can't use 'make install'? It IS complicated. Fedora kernels are built for FIVE architectures and in non-debug and debug versions: Information for build kernel-5.4.10-200.fc31 https://koji.fedoraproject.org/koji/buildinfo?buildID=1427976 See, in particular, the x86_64 "build.log" on that web page. > Regards, William Unsetting CONFIG_DEBUG_INFO (with "make nconfig") reduces the file sizes, but that is not what is in the kernel-core config file: $ diff -u0 .config.EXP3 .config.EXP4 ... @@ -28 +28 @@ -CONFIG_LOCALVERSION=".localversion2" +CONFIG_LOCALVERSION=".localversion3" @@ -8744,6 +8744 @@ -CONFIG_DEBUG_INFO=y -# CONFIG_DEBUG_INFO_REDUCED is not set -# CONFIG_DEBUG_INFO_SPLIT is not set -# CONFIG_DEBUG_INFO_DWARF4 is not set -CONFIG_DEBUG_INFO_BTF=y -# CONFIG_GDB_SCRIPTS is not set +# CONFIG_DEBUG_INFO is not set For the record: $ time make ... real 108m13.920s user 95m13.409s sys 12m27.289s
The size is MUCH reduced with CONFIG_DEBUG_INFO unset: $ ls -1sh linux-5.4.10.localversion*.tar 4.6G linux-5.4.10.localversion2-00083-g97d9e8620-x86.tar 332M linux-5.4.10.localversion3-00083-g97d9e8620-x86.tar And building the uncompressed tar file is very fast: $ time make tar-pkg ... Tarball successfully created in ./linux-5.4.10.localversion3-00083-g97d9e8620-x86.tar real 3m31.590s user 2m50.464s sys 0m55.519s
(In reply to William Bader from comment #61) ... > Do kernel developers really have to do so much by hand, or is this because I can't use 'make install'? They use scripts. And I suspect that there is one for Fedora installs, but I haven't looked for it yet. In the mean time, I wrote a shell script to complete the install and successfully booted with my kernel. Since the shell script must be run as root, I am only going to post the essentials: KVER is the kernel version, for example: "5.4.10.localversion3-00083-g97d9e8620" MKINITRD is the "mkinitrd" command. "dracut" could be used instead. GRUBBY is the "grubby" command. BOOT="/boot" VMLINUZ="$BOOT/vmlinuz-$KVER" INITRAMFS="$BOOT/initramfs-$KVER.img" # mkinitrd won't overwrite an existing initramfs file; see the "--force" option. $MKINITRD "$INITRAMFS" "$KVER" # grubby creates new ".conf" files if there are already ones present, so the grub2 menu may have replicated entries. # That should be harmless, but see "man grubby" for options to handle various sorts of updates. $GRUBBY --add-kernel="$VMLINUZ" --initrd="$INITRAMFS" --title="$VMLINUZ" --copy-default --make-default
It sounds like you have only one computer. If so, I suggest borrowing, renting, or buying a second computer, so your system-under-test is not also your support/infrastructure/backup/what-do-I-do-now, computer.
(In reply to Steve from comment #64) > (In reply to William Bader from comment #61) > ... > > Do kernel developers really have to do so much by hand, or is this because I can't use 'make install'? > > They use scripts. And I suspect that there is one for Fedora installs, but I haven't looked for it yet. ... Here it is: $ rpm -q --scripts kernel-core-5.5.15-200.fc31.x86_64 postinstall scriptlet (using /bin/sh): if [ `uname -i` == "x86_64" -o `uname -i` == "i386" ] && [ -f /etc/sysconfig/kernel ]; then /bin/sed -r -i -e 's/^DEFAULTKERNEL=kernel-smp$/DEFAULTKERNEL=kernel/' /etc/sysconfig/kernel || exit $? fi preuninstall scriptlet (using /bin/sh): /bin/kernel-install remove 5.5.15-200.fc31.x86_64 /lib/modules/5.5.15-200.fc31.x86_64/vmlinuz || exit $? posttrans scriptlet (using /bin/sh): /bin/kernel-install add 5.5.15-200.fc31.x86_64 /lib/modules/5.5.15-200.fc31.x86_64/vmlinuz || exit $? $ rpm -qf /bin/kernel-install systemd-udev-243.8-1.fc31.x86_64 $ man kernel-install "kernel-install is used to install and remove kernel and initramfs images to and from the boot loader partition, ..."
Here is another reason for a difference in the sizes -- the Fedora kernel modules are compressed: $ find 5.5.15-200.fc31.x86_64 -name uvcvideo\* | xargs ls -l -rw-r--r--. 1 root root 46184 Apr 2 12:50 5.5.15-200.fc31.x86_64/kernel/drivers/media/usb/uvc/uvcvideo.ko.xz ^^^ $ find 5.4.10.localversion3-00083-g97d9e8620 -name uvcvideo\* | xargs ls -l -rw-rw-r--. 1 root root 213417 Apr 6 06:45 5.4.10.localversion3-00083-g97d9e8620/kernel/drivers/media/usb/uvc/uvcvideo.ko
I got the Fedora 31 VM. The first step is building vanilla 5.4.10 to test the build procedure and to confirm that the webcam works with the vanilla kernel. $ scp laptop:/boot/config-5.4.10-200.fc31.x86_64 . $ git clone --shallow-exclude=linux-5.3.y --branch linux-5.4.y https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4 $ cd linux-5.4/ $ git checkout v5.4.10 $ cp ../config-5.4.10-200.fc31.x86_64 .config edit .config a bit $ make oldconfig $ diff -u0 ../config-5.4.10-200.fc31.x86_64 .config --- ../config-5.4.10-200.fc31.x86_64 2020-01-09 15:12:02.000000000 -0500 +++ .config 2020-04-06 14:57:03.554427129 -0400 @@ -3 +3 @@ -# Linux/x86_64 5.4.10-200.fc31.x86_64 Kernel Configuration +# Linux/x86 5.4.10 Kernel Configuration @@ -7 +7 @@ -# Compiler: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1) +# Compiler: gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1) @@ -10 +10 @@ -CONFIG_GCC_VERSION=90201 +CONFIG_GCC_VERSION=90301 @@ -28,3 +28,3 @@ -CONFIG_LOCALVERSION="" -# CONFIG_LOCALVERSION_AUTO is not set -CONFIG_BUILD_SALT="5.4.10-200.fc31.x86_64" +CONFIG_LOCALVERSION=".localversion1" +CONFIG_LOCALVERSION_AUTO=y +CONFIG_BUILD_SALT="buildidsalt1" @@ -43 +43 @@ -CONFIG_DEFAULT_HOSTNAME="(none)" +CONFIG_DEFAULT_HOSTNAME="dev-william-1" @@ -8315 +8314,0 @@ -CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT=y @@ -8319 +8317,0 @@ -CONFIG_ALLOW_LOCKDOWN_LIFT_BY_SYSRQ=y @@ -8746,6 +8744 @@ -CONFIG_DEBUG_INFO=y -# CONFIG_DEBUG_INFO_REDUCED is not set -# CONFIG_DEBUG_INFO_SPLIT is not set -# CONFIG_DEBUG_INFO_DWARF4 is not set -CONFIG_DEBUG_INFO_BTF=y -# CONFIG_GDB_SCRIPTS is not set +# CONFIG_DEBUG_INFO is not set $ make # 75 minutes $ make targz-pkg # 2.5 minutes $ scp vm:linux-5.4.10.localversion1-x86.tar.gz laptop: # 1.5 minutes Now I have the tar on my laptop with boot/System.map-5.4.10.localversion1 boot/config-5.4.10.localversion1 boot/vmlinuz-5.4.10.localversion1 lib/modules/5.4.10.localversion1/... Is this the next step? Unpack the tarball as root in / KVER=5.4.10.localversion1 BOOT="/boot" VMLINUZ="$BOOT/vmlinuz-$KVER" INITRAMFS="$BOOT/initramfs-$KVER.img" mkinitrd "$INITRAMFS" "$KVER" grubby --add-kernel="$VMLINUZ" --initrd="$INITRAMFS" --title="$VMLINUZ" --copy-default # --make-default What does '/bin/kernel-install add "$KVER" "$VMLINUZ"' do? and then reboot. >It sounds like you have only one computer. If so, I suggest borrowing, renting, or buying a second computer, so your system-under-test is not also your support/infrastructure/backup/what-do-I-do-now, computer. It is a bit risky, but I did a backup to an external drive over the weekend, and I have the Fedora 31 Live CD. I think that if I mess up /boot, I can boot the live CD, mount the system disk, copy files from the live CD, and then reboot from the system disk and use rpm to restore the system files. Back in the old days, I used to patch SCO Xenix kernels with microemacs. Regards, William
(In reply to William Bader from comment #68) > I got the Fedora 31 VM. Awesome! > The first step is building vanilla 5.4.10 to test the build procedure and to > confirm that the webcam works with the vanilla kernel. > > $ scp laptop:/boot/config-5.4.10-200.fc31.x86_64 . > $ git clone --shallow-exclude=linux-5.3.y --branch linux-5.4.y > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4 > $ cd linux-5.4/ ... > $ make # 75 minutes > $ make targz-pkg # 2.5 minutes > $ scp vm:linux-5.4.10.localversion1-x86.tar.gz laptop: # 1.5 minutes OK, on all that. > Now I have the tar on my laptop with > boot/System.map-5.4.10.localversion1 > boot/config-5.4.10.localversion1 > boot/vmlinuz-5.4.10.localversion1 > lib/modules/5.4.10.localversion1/... Looks good. > Is this the next step? > Unpack the tarball as root in / # tar -xvf linux-5.4.10.localversion1-x86.tar.gz -C / ^^^^-- With "-C", you can run the tar command from your current directory. Tested by me. > KVER=5.4.10.localversion1 ... Don't do any of that. It's now "background" info. :-) > What does '/bin/kernel-install add "$KVER" "$VMLINUZ"' do? You are catching up with me. :-) I just completed a test with /bin/kernel-install and will post an exact command-line separately. > and then reboot. Yes. > >It sounds like you have only one computer. If so, I suggest borrowing, renting, or buying a second computer, so your system-under-test is not also your support/infrastructure/backup/what-do-I-do-now, computer. > > It is a bit risky, but I did a backup to an external drive over the weekend, > and I have the Fedora 31 Live CD. I think that if I mess up /boot, I can > boot the live CD, mount the system disk, copy files from the live CD, and > then reboot from the system disk and use rpm to restore the system files. > Back in the old days, I used to patch SCO Xenix kernels with microemacs. OK, you have taken all the precautions that I would. When I accidentally removed my kernel modules (in a VM, admittedly), that was a wake-up call. > Regards, William
After unpacking the tar file, run the following command as root (with your own kernel version, of course): # /bin/kernel-install add 5.4.10.localversion3-00083-g97d9e8620 /boot/vmlinuz-5.4.10.localversion3-00083-g97d9e8620 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (That's copied right from my root shell history in my F31 test VM. The kernel version is in the line twice, so it isn't as complicated as it looks.) I didn't add the "--verbose" option, but that might be a good idea. I got some error messages about missing shell scripts. There might be a package that needs to be installed. After that, check that you have a new initramfs in /boot and a new grub2 ".conf" file in /boot/loader/entries/. # ls -lt /boot/initramfs-* # ls -lt /boot/loader/entries/ Reboot and test: "lsusb".
(In reply to William Bader from comment #68) > I got the Fedora 31 VM. ... Could you go into more detail about that? What software is used to configure and manage VMs on CentOS? Who decided how it would be provisioned (disk, memory)? How did you do the F31 install? On site? Or remotely? What software did you have to enable or configure on the installed F31 VM to use it remotely? My experience with VMs is on a desktop system with Fedora software: $ rpm -q virt-manager qemu-kvm virt-manager-2.1.0-2.fc30.noarch qemu-kvm-3.1.1-2.fc30.x86_64 And those are fantastic tools, I must say.
>Could you go into more detail about that? The office is on coronavirus lockdown. I am on lockdown also, stranded far from the office. I showed this bug report to the person who manages the VMs, and he connected remotely and created a Fedora 31 VM with 8GB RAM, 44GB disk (11GB currently used by the OS and the kernel build), and one virtual cpu that shows as an Intel Skylake Processor. Ten years ago, we had a computer room full of headless desktops and towers. We got a single big server and migrated everything to VMs on the big server. We do daily backups of the important VMs, but it would still be a pain to lose one, so only a few people have access, and I am not on that list. It is better that way, so I don't get the blame if something breaks. I don't know what tool he uses or how he installed Fedora. He installed gcc, but I had to install make, flex, bison, and a few libraries. Before I did an in-place update from Fedora 30 to 31 on my laptop, he made a Fedora 30 VM and tested the update, and I think that he has already set up a Fedora 31 VM for another project that needed an OS more recent than CentOS 7. I have some VMs on my laptop under VirtualBox, but my laptop supports only 8GB RAM, so I can't do much in the VMs. I suppose that since the webcam is a hardware issue, if it doesn't work on the OS on the bare metal, it won't work inside a VM.
I need help... I did 'tar -xvf linux-5.4.10.localversion1-x86.tar.gz -C /' Since /boot was almost full and I had a bunch of kernels from koji, I did 'sudo dnf remove kernel-modules-5.4.11-200.fc31.x86_64 kernel-core-5.4.11-200.fc31.x86_64' It got some errors, and when I checked, /lib/modules had only 5.4.10.localversion1 Before starting, I made a tar of all of /boot and of /lib/modules/5.5.13-200.fc31.x86_64 so I put the 5.5.13 modules back. To be safe, I downloaded kernel 5.5.14 core and modules 5.5.14 from koji and installed it. What went wrong? Every file in the tarball has 5.4.10 in its name. I also noticed that I can't run 32 bit executables. It wiped out all of /lib. Something seems to have broken dnf, so when I removed kernel-modules-5.4.11-200.fc31.x86_64, it removed all of /lib.
Is there a way to validate that all of the files from all of the installed rpms are present? I've probably lost some information by restoring a /lib that is a few days old. After doing in-place Fedora updates, I sometimes run 'rpm --rebuilddb' and 'dnf distro-sync', but those probably won't help now. I restored the files by going to /lib on my backup drive and running 'sudo tar cf - . | (cd /lib ; sudo tar xf -)' I use rsync make a few cycles of backups. I know that there are more efficient ways to make backups, but this makes it easy to copy files back as needed.
Since I did the backup before installing koji modules, my backup didn't restore the modules for 5.4.10-200.fc31 (the last kernel that supported the webcam). dnf wouldn't let me install it because it thought that it was already installed. So I ran 'sudo dnf reinstall kernel-modules-5.4.10*.rpm' and it came back with some errors. Are those bad or normal for a reinstall? Downloading Packages: Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Reinstalling : kernel-modules-5.4.10-200.fc31.x86_64 1/2 Running scriptlet: kernel-modules-5.4.10-200.fc31.x86_64 1/2 depmod: WARNING: could not open modules.order at /lib/modules/5.4.10-200.fc31.x86_64: No such file or directory depmod: WARNING: could not open modules.builtin at /lib/modules/5.4.10-200.fc31.x86_64: No such file or directory Cleanup : kernel-modules-5.4.10-200.fc31.x86_64 2/2 Running scriptlet: kernel-modules-5.4.10-200.fc31.x86_64 2/2 depmod: WARNING: could not open modules.order at /lib/modules/5.4.10-200.fc31.x86_64: No such file or directory depmod: WARNING: could not open modules.builtin at /lib/modules/5.4.10-200.fc31.x86_64: No such file or directory Verifying : kernel-modules-5.4.10-200.fc31.x86_64 1/2 Verifying : kernel-modules-5.4.10-200.fc31.x86_64 2/2 Reinstalled: kernel-modules-5.4.10-200.fc31.x86_64 Complete! The modules.order and modules.builtin files are missing in /lib/modules/5.4.10-200.fc31.x86_64 but present in other module directories. I can't reinstall kernel-core-5.4.10-200.fc31.x86_64 because I don't have enough space in /boot, and I don't want to try removing any other kernels until I hear back from you and have an answer about why dnf removed my /lib.
(In reply to William Bader from comment #74) There was a BZ mid-air collision. > Is there a way to validate that all of the files from all of the installed rpms are present? Yes. I would suggest verifying your kernel packages first, because that produced a lot of error messages when I did that after I thought I had reinstalled all of the kernel packages: # rpm -Va kernel\* # You don't have to run that as root, but a few files will be flagged with "?", meaning they are unreadable (per "man rpm"). You can then verify all packages with: # rpm -Va I am running that right now and seeing a lot of missing files in /lib/. For example, this shows every file is missing: $ rpm -V kbd-misc What concerns me is whether running out of disk space caused the problem or if there is something wrong with the tar command being run as root. > I've probably lost some information by restoring a /lib that is a few days old. > After doing in-place Fedora updates, I sometimes run 'rpm --rebuilddb' and 'dnf distro-sync', but those probably won't help now. > I restored the files by going to /lib on my backup drive and running 'sudo tar cf - . | (cd /lib ; sudo tar xf -)' > I use rsync make a few cycles of backups. I know that there are more efficient ways to make backups, but this makes it easy to copy files back as needed. That sounds like a good system.
This could be the problem: On my F30 primary system, /lib is a link: $ ls -lF -d /lib lrwxrwxrwx. 1 root root 7 Feb 11 2019 /lib -> usr/lib/ But in my F31 test VM, /lib is a directory: $ ls -lF -d /lib drwxrwxr-x. 3 root root 4096 Apr 6 06:44 /lib/ And here is where all the modules went: $ ls -F /usr/lib/modules 5.3.7-301.fc31.x86_64/ 5.5.11-200.fc31.x86_64/ 5.5.8-200.fc31.x86_64/ 5.5.10-200.fc31.x86_64/ 5.5.13-200.fc31.x86_64/ 5.5.9-200.fc31.x86_64/ The "/lib" link needs to be restored! And that points the finger at the tar command.
This is a bit convoluted, but I wanted to make sure the "ln" command produced the correct result: # cd / # ln -s usr/lib lib1 # mv -i lib lib2 # mv -i lib1 lib Now, this verifies as expected: # rpm -V kbd-misc
Thanks. I did cd /; mv lib lib-; ln -s usr/lib lib If I don't have problems after a day or two, I'll remove lib- I saw that the tarball had /lib, but I didn't realize that /lib on Fedora is a symlink.
(In reply to William Bader from comment #79) > Thanks. I did cd /; mv lib lib-; ln -s usr/lib lib > If I don't have problems after a day or two, I'll remove lib- > I saw that the tarball had /lib, but I didn't realize that /lib on Fedora is a symlink. I didn't either. And there is a tar option to preserve the link: $ man tar ... Overwrite control These options control tar actions when extracting a file over an existing copy on disk. ... --keep-directory-symlink Don't replace existing symlinks to directories when extracting.
I rebooted to 5.5.14-200.fc31.x86_64 so it seems as if nothing got messed up from having /lib changed. I moved the modules created by the tarball in the saved /lib- to the correct place. $ cd / $ sudo /bin/kernel-install --verbose add 5.4.10.localversion1-x86 /boot/vmlinuz-5.4.10.localversion1-x86 Kernel image argument /boot/vmlinuz-5.4.10.localversion1-x86 not a file $ sudo /bin/kernel-install --verbose add 5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/00-entry-directory.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/10-devicetree.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/20-grub.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/20-grubby.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/50-depmod.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 Running depmod -a 5.4.10.localversion1 +/usr/lib/kernel/install.d/50-dracut.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/51-dracut-rescue.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/90-loaderentry.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/95-akmodsposttrans.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/95-kernel-hooks.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 +/usr/lib/kernel/install.d/99-grub-mkconfig.install add 5.4.10.localversion1 /boot/9f668c5979cb49f8ad387216c22a4693/5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 It made the initramfs. A directory list of /boot shows -rwxrwxr-x 1 root root 76420472 Apr 6 21:57 vmlinux-5.4.10.localversion1 <- do I need this? -rwxr-xr-x. 1 root root 5977368 Nov 18 2015 vmlinuz-0-rescue-9f668c5979cb49f8ad387216c22a4693 -rwxr-xr-x 1 root root 10285768 Jan 9 20:12 vmlinuz-5.4.10-200.fc31.x86_64 -rw-rw-r-- 1 root root 9468288 Apr 6 21:57 vmlinuz-5.4.10.localversion1 <- why isn't this executable? -rwxr-xr-x 1 root root 10846920 Mar 23 17:45 vmlinuz-5.5.11-200.fc31.x86_64 -rwxr-xr-x 1 root root 10842824 Mar 25 22:09 vmlinuz-5.5.13-200.fc31.x86_64 -rwxr-xr-x 1 root root 10842824 Apr 1 17:50 vmlinuz-5.5.14-200.fc31.x86_64 Can I remove vmlinux-5.4.10.localversion1? Why isn't vmlinuz-5.4.10.localversion1 executable? (I just made it match the permissions of the vmlinuz files.) /boot/loader/entries/9f668c5979cb49f8ad387216c22a4693-5.4.10.localversion1.conf is title Fedora (5.4.10.localversion1) 31 (Workstation Edition) version 5.4.10.localversion1 linux /vmlinuz-5.4.10.localversion1 initrd /initramfs-5.4.10.localversion1.img options $kernelopts id fedora-20200407072107-5.4.10.localversion1 grub_users $grub_users grub_arg --unrestricted grub_class kernel so it looks like it will use vmlinuz. It looks like the tarball files are in place: # tar df /u/william/linux-5.4.10.localversion1-x86.tar.gz boot/vmlinuz-5.4.10.localversion1: Mode differs <- because I made it executable lib: File type differs <- because I restored the symlink lib/modules: Mode differs lib/modules/5.4.10.localversion1/modules.alias: Mod time differs lib/modules/5.4.10.localversion1/modules.softdep: Mod time differs lib/modules/5.4.10.localversion1/modules.devname: Mod time differs lib/modules/5.4.10.localversion1/modules.dep.bin: Mod time differs lib/modules/5.4.10.localversion1/modules.symbols.bin: Mod time differs lib/modules/5.4.10.localversion1/modules.dep: Mod time differs lib/modules/5.4.10.localversion1/modules.alias.bin: Mod time differs lib/modules/5.4.10.localversion1/modules.builtin.bin: Mod time differs lib/modules/5.4.10.localversion1/modules.symbols: Mod time differs
The 5.4.10 new kernel booted, and the webcam works. $ uname -a Linux laptop37 5.4.10.localversion1 #1 SMP Mon Apr 6 14:59:42 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux $ l /dev/video* crw-rw----+ 1 root video 81, 0 Apr 7 08:37 /dev/video0 crw-rw----+ 1 root video 81, 1 Apr 7 08:37 /dev/video1 So next, I'll build 5.4.11 to confirm that it doesn't work with the webcam, and then start the bisections. Questions: To clean generated files before a build, should I 'make distclean' or is clean or mrproper better? I suspect that I need 'make distclean' before each 'git checkout' or 'git bisect', and then I have to remake the config file. How do I remove the old kernels from my laptop? sudo /bin/kernel-install --verbose remove 5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 and then hunt down the files in the tarball?
(In reply to William Bader from comment #81) > [partial session transcript] $ cd / It would probably be OK to continue from the directory with the tar file. $ sudo /bin/kernel-install --verbose add 5.4.10.localversion1-x86 /boot/vmlinuz-5.4.10.localversion1-x86 Kernel image argument /boot/vmlinuz-5.4.10.localversion1-x86 not a file ^^^^ That is in the tar file name, but not in the kernel file name. Presumably, builds for different architectures could be done on one platform, so the tar files would need to have distinct names, but the kernel files would not. > Can I remove vmlinux-5.4.10.localversion1? AFAIK, that isn't needed. However, I would suggest leaving files in place until you need to do a cleanup to recover disk space, at which point you can remove all of the files with something like: # rm -i *localversion* > Why isn't vmlinuz-5.4.10.localversion1 executable? (I just made it match the permissions of the vmlinuz files.) Good catch. That doesn't seem to affect the boot process, but it is not clear where those permissions are set. > [annotated session transcript] # tar df /u/william/linux-5.4.10.localversion1-x86.tar.gz boot/vmlinuz-5.4.10.localversion1: Mode differs <- because I made it executable lib: File type differs <- because I restored the symlink lib/modules: Mode differs OK. I didn't know about the "d" option for verifying a tar install. That sounds very useful. There doesn't seem to be a tar option to uninstall ...
(In reply to William Bader from comment #82) > The 5.4.10 new kernel booted, and the webcam works. Fantastic! > $ uname -a > Linux laptop37 5.4.10.localversion1 #1 SMP Mon Apr 6 14:59:42 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux > $ l /dev/video* > crw-rw----+ 1 root video 81, 0 Apr 7 08:37 /dev/video0 > crw-rw----+ 1 root video 81, 1 Apr 7 08:37 /dev/video1 Nice. > So next, I'll build 5.4.11 to confirm that it doesn't work with the webcam, and then start the bisections. > > Questions: > > To clean generated files before a build, should I 'make distclean' or is clean or mrproper better? > > I suspect that I need 'make distclean' before each 'git checkout' or 'git bisect', and then I have to remake the config file. I will post separately on that subject. > How do I remove the old kernels from my laptop? # cd /boot # rm -i *localversion* > sudo /bin/kernel-install --verbose remove 5.4.10.localversion1 /boot/vmlinuz-5.4.10.localversion1 and then hunt down the files in the tarball? That would probably work, but the files are in only two places: /boot/*localversion* and /lib/modules/*localversion*/, so suitable "rm" commands should suffice. If the removal process were for a released product, it could be more automated. However, I suggest leaving as many files in place for as long as possible until you need to recover disk space. From what you said before, that would probably only be in /boot. As for the grub2 ".conf" files, they can be left in place even if there is no kernel to run. Grub2 will simply send you back to the menu. If you want to save them, I believe they can be moved to a subdirectory and grub2 will ignore them. However, after the disappearing "/lib" scare, I believe it would be a good idea to write a shell script for the install process, so that no options are forgotten or mistyped. In outline: Accept one argument: the kernel version string. Run the tar command. Run the kernel-install command. It should be possible to run it from the directory with the tar file. If this were for a released product, a removal option could be added. But since this script must be run as root, testing and debugging a removal option would be risky and time-consuming. This is starting to sound like a software development project with specifications. :-)
> How do I remove the old kernels from my laptop? # cd /boot # rm -i *localversion* The kernel file names should all be *distinct* because: CONFIG_LOCALVERSION_AUTO=y So, it should be possible to *save* all the bisection kernels and module directories in one directory on your backup drive. (Or the tar files.) NB: The number N in ".localversionN" doesn't need to be incremented unless the config file itself changes. $ less init/Kconfig ... config LOCALVERSION_AUTO bool "Automatically append version information to the version string" ... This will try to automatically determine if the current tree is a release tree by looking for git tags that belong to the current top of tree revision. A string of the format -gxxxxxxxx will be added to the localversion [Comment 70 has an example.] if a git-based tree is found. The string generated by this will be appended after any matching localversion* files, and after the value set in CONFIG_LOCALVERSION. (The actual string used here is the first eight characters produced by running the command: $ git rev-parse --verify HEAD which is done within the script "scripts/setlocalversion".) ...
(In reply to Steve from comment #85) ... > A string of the format -gxxxxxxxx will be added to the localversion [Comment 70 has an example.] ... As a side note, the Fedora kernel snapshot builds use "gitN", where "N" is a small integer. Those are easier to read and they can be sorted numerically. The actual commit ID is in the changelog (Comment 26). "git bisect" saves a log, so it should be possible to figure out which build a specific kernel is for by looking at the log: $ git bisect log git bisect start # bad: [9d61432efb21c224b710f397809f3a4fef281f9c] Linux 5.4.11 git bisect bad 9d61432efb21c224b710f397809f3a4fef281f9c # good: [7a02c193298ec15f2ba1344b6bcd5d578a41b2e0] Linux 5.4.10 git bisect good 7a02c193298ec15f2ba1344b6bcd5d578a41b2e0
(In reply to William Bader from comment #82) ... > To clean generated files before a build, should I 'make distclean' or is clean or mrproper better? ... If you did some scratch builds while experimenting with the config file and such-like, starting with "make mrproper" would be a good idea.* Since "make mrproper" removes the ".config" file, you would need to ensure that you have a "master" copy of your ".config" file. I use a crude form of version control. If I were starting over, I would make sure that the 'N' in '.localversionN' matches the 'N' in '.EXPN': $ fgrep 'CONFIG_LOCALVERSION=' .config.EXP* .config.EXP1:CONFIG_LOCALVERSION="localversion1" .config.EXP2:CONFIG_LOCALVERSION=".localversion2" .config.EXP3:CONFIG_LOCALVERSION=".localversion2" .config.EXP4:CONFIG_LOCALVERSION=".localversion3" .config.EXP4.old:CONFIG_LOCALVERSION=".localversion3" NB: I believe that the ".old" file was generated by one of the kernel config commands. As for the actual bisection builds, don't clean anything. "Make" is supposed figure out what needs to be rebuilt. And that should save a lot of time on subsequent bisection builds (depending, of course, on what actually changed). For the second simulated bisection build, the build time was only 14 minutes: $ git bisect bad Bisecting: 41 revisions left to test after this (roughly 5 steps) [110440a0eb4e340a0f353f9df86783aa4365f899] ARM: exynos_defconfig: Restore debugfs support $ time make ... real 14m0.614s ... $ fgrep -a 'Linux version' vmlinux Linux version 5.4.10.localversion3-00041-g110440a0e (test-user-1@test-host-1) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #10 SMP ... ^^^^^^^^^--This matches the beginning of the commit ID above. * Re "make distclean": We shouldn't have any patch files. Depending on your editor, there could be some editor backup files, but those shouldn't affect the builds. This will show you what it actually does: $ grep -A7 'distclean: mrproper' Makefile
(In reply to Steve from comment #85) ... > So, it should be possible to *save* all the bisection kernels and module directories in one directory on your backup drive. (Or the tar files.) ... Here is a non-destructive process: # cd /boot # mkdir ARCHIVE # mv -i *localversion* ARCHIVE # cd loader/entries/ # mkdir ARCHIVE # mv -i *localversion* ARCHIVE If you need to offload the kernel and intramfs from your /boot partition, you could make /boot/ARCHIVE a link to a directory on your backup drive. And grub2 nicely ignores the ARCHIVE directory. :-) NB: I am ignoring the modules on the assumption that they don't cause any problems in /lib/modules/.
# tar --keep-directory-symlink -xvf linux-5.4.10.localversion3-00083-g97d9e8620-x86.tar -C / ^^^^^^^^^^^^^^^^^^^^^^^^ That worked as expected -- /lib was a link before running the command and the link was still there after running the command. If you want to be more cautious, you could unpack the tar file into a local directory and then manually move the files and directories to /boot and /lib/modules/: # mkdir linux-5.4.10.localversion3-00083-g97d9e8620 # tar -xvf linux-5.4.10.localversion3-00083-g97d9e8620-x86.tar -C ./linux-5.4.10.localversion3-00083-g97d9e8620/
(In reply to William Bader from comment #82) ... > How do I remove the old kernels from my laptop? ... The kernel can be built as an rpm package and installed with dnf: $ make binrpm-pkg $ ls -1sh ~/rpmbuild/RPMS/x86_64/ total 74M 73M kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64.rpm 1.3M kernel-headers-5.4.10.localversion3_00041_g110440a0e-10.x86_64.rpm [scp to the target system] On the target system: # dnf install kernel*.rpm $ rpm -qi kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64 Name : kernel Version : 5.4.10.localversion3_00041_g110440a0e ... The initramfs file is built, the grub2 ".conf" file is written, and it boots. After rebooting with another kernel: # dnf list --installed kernel\*localversion\* Installed Packages kernel.x86_64 5.4.10.localversion3_00041_g110440a0e-10 @@commandline The remove command succeeds, although there are several error messages about missing files: # dnf remove kernel-5.4.10.localversion3_00041_g110440a0e-10 Dependencies resolved. ============================================================================================================================== Package Architecture Version Repository Size ============================================================================================================================== Removing: kernel x86_64 5.4.10.localversion3_00041_g110440a0e-10 @@commandline 270 M Transaction Summary ============================================================================================================================== Remove 1 Package Freed space: 270 M Is this ok [y/N]: y Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Running scriptlet: kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64 1/1 Erasing : kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64 1/1 warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.symbols.bin: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.symbols: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.softdep: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.devname: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.dep.bin: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.dep: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.builtin.bin: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.alias.bin: remove failed: No such file or directory warning: file /lib/modules/5.4.10.localversion3-00041-g110440a0e/modules.alias: remove failed: No such file or directory warning: file /boot/vmlinuz-5.4.10.localversion3-00041-g110440a0e: remove failed: No such file or directory warning: file /boot/config-5.4.10.localversion3-00041-g110440a0e: remove failed: No such file or directory warning: file /boot/System.map-5.4.10.localversion3-00041-g110440a0e: remove failed: No such file or directory Running scriptlet: kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_# dnf remove kernel-5.4.10.localversion3_00041_g110440a0e-10 64 1/1 Verifying : kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64 1/1 Removed: kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64 Complete!
(In reply to Steve from comment #90) ... > $ ls -1sh ~/rpmbuild/RPMS/x86_64/ ... "~/rpmbuild" was a pre-existing directory from previous experiments with building rpm packages. As a test, I renamed it and ran "make binrpm-pkg" again. The "~/rpmbuild" directory was recreated.
(In reply to Steve from comment #67) > Here is another reason for a difference in the sizes -- the Fedora kernel modules are compressed: > > $ find 5.5.15-200.fc31.x86_64 -name uvcvideo\* | xargs ls -l -rw-r--r--. 1 root root 46184 Apr 2 12:50 > 5.5.15-200.fc31.x86_64/kernel/drivers/media/usb/uvc/uvcvideo.ko.xz ... There appears to be another discrepancy between the packaged config file and what is actually built: $ fgrep 'CONFIG_MODULE_COMPRESS' config-5.4.10 # CONFIG_MODULE_COMPRESS is not set So, try a build with: $ diff -u0 .config.EXP4 .config.EXP5 ... @@ -28 +28 @@ -CONFIG_LOCALVERSION=".localversion3" +CONFIG_LOCALVERSION=".localversion5" # I bumped the number to match the config file version number. @@ -854 +854,3 @@ -# CONFIG_MODULE_COMPRESS is not set +CONFIG_MODULE_COMPRESS=y +# CONFIG_MODULE_COMPRESS_GZIP is not set +CONFIG_MODULE_COMPRESS_XZ=y $ time make ... real 12m7.546s ... $ time make binrpm-pkg ... real 5m9.922s ... Unfortunately, module compression seems to have made the package bigger: $ ls -1sh kernel-5.4.10.localversion*_00041_g110440a0e-*.x86_64.rpm 73M kernel-5.4.10.localversion3_00041_g110440a0e-10.x86_64.rpm 75M kernel-5.4.10.localversion5_00041_g110440a0e-11.x86_64.rpm
The version with compressed modules boots: $ uname -r 5.4.10.localversion5-00041-g110440a0e The compressed module directory is smaller than the Fedora kernel module directories: $ du -s /lib/modules/* | sort -n 65412 /lib/modules/5.4.10.localversion5-00041-g110440a0e 79108 /lib/modules/5.3.7-301.fc31.x86_64 ... 82600 /lib/modules/5.5.15-200.fc31.x86_64 258152 /lib/modules/5.4.10.localversion3-00083-g97d9e8620 That could be because the Fedora kernel module directories have additional files, including a copy of "vmlinuz": $ diff -q ./5.3.7-301.fc31.x86_64/ ./5.4.10.localversion5-00041-g110440a0e/ | cat -n 1 Only in ./5.3.7-301.fc31.x86_64/: bls.conf 2 Only in ./5.3.7-301.fc31.x86_64/: build ... 26 Only in ./5.3.7-301.fc31.x86_64/: vmlinuz 27 Only in ./5.3.7-301.fc31.x86_64/: .vmlinuz.hmac $ ls -sh ./5.3.7-301.fc31.x86_64/vmlinuz 8.9M ./5.3.7-301.fc31.x86_64/vmlinuz $ file ./5.3.7-301.fc31.x86_64/vmlinuz ./5.3.7-301.fc31.x86_64/vmlinuz: Linux kernel x86 boot executable bzImage, version 5.3.7-301.fc31.x86_64 ...
Thanks for the research. >That is in the tar file name, but not in the kernel file name. I typed it wrong and accidentally included it when I did a cut and paste from my shell window. >I didn't know about the "d" option for verifying a tar install. I used to use the old single letter options a lot. A long time ago, I modified a version of pdtar (the predecessor to gtar) to build on MSDOS and read and write SCO Xenix-compatible tar files to floppies using BIOS calls for IO. >I would make sure that the 'N' in '.localversionN' matches the 'N' in '.EXPN' I saved my config with cp -p .config ../config-`grep Linux .config | head -1 | awk '{print $3}'`-`grep -i CONFIG_LOCALVERSION= .config | sed -e 's/.*=".//' -e 's/"//g'` >rm -i *localversion* I removed the old kernel with the commands below. I've started deleting files by moving them to /tmp instead of deleting them. It makes typos easier to fix. I reboot my laptop every day, which clears out /tmp. mv -iv /boot/*localversion1 /boot/*localversion1.img /boot/loader/entries/*localversion1.conf /tmp/ mv -i /lib/modules/*localversion1/ /tmp/ My laptop is booted from 5.5.14-200.fc31.x86_64 from koji. Even idle, the cpu is 80C. I don't see anything unusual in top or gkrellm, and powertop shows 100 wakeups/second and 1.1% CPU use. [later] I updated to 5.5.15, and it seems ok. The cpu is below 60C. >I don't know why you would need vmlinux, and there is another quirk about what is in the tar file: >Both vmlinux and vmlinuz are included: Removing the large vmlinux kernel from /boot works. Just the vmlinuz kernel referenced in the /boot/loader/entries/ config file is enough. The vmlinuz kernel does not need to be executable. My /boot is only about 400MB, and the vmlinux file is about 100MB, so being able to remove it helps a lot. >The compressed module directory is smaller than the Fedora kernel module directories: I noticed the size difference, but as long as it boots, I didn't look into it. I only need it running enough to boot and test the webcam video. >make binrpm-pkg I can try that on the next bisection. I am making progress with the bisections. 5.4.10 good 5.4.11 bad [97d9e8620f57f28f415b23ad88b97c87b6d53390] bnx2x: Do not handle requests from VFs after parity / good Regards, William
> I typed it wrong and accidentally included it when I did a cut and paste from my shell window. Actually, that was very helpful, because it showed that the architecture is needed in some file names, but not in others. > I used to use the old single letter options a lot. The "ps" man page shows a lot of those "x" options, along "-x" and "--xyz" options. That should make everyone happy, until they get confused about whether "x" means the same thing as "-x" or "--xyz". :-) > A long time ago, I modified a version of pdtar (the predecessor to gtar) to build on MSDOS and read and write SCO Xenix-compatible tar files to floppies using BIOS calls for IO. "tar" has been useful for a long time. Now, with various standards, compatibility problems don't seem to be as common. > I saved my config with > cp -p .config ../config-`grep Linux .config | head -1 | awk '{print $3}'`-`grep -i CONFIG_LOCALVERSION= .config | sed -e 's/.*=".//' -e 's/"//g'` Automating that is a good idea, although debugging something like that would take me a "few" tries. :-) I used to write big awk programs, so here is how I got the local version number with just awk: $ cat /tmp/foo/.config | awk '/CONFIG_LOCALVERSION=/ { print "\nDEBUG:", $0; gsub("[=\"]+", " "); sub(".localversion", ""); lver=$2 }; END { print ".config.EXP" lver }' DEBUG: CONFIG_LOCALVERSION=".localversion5" .config.EXP5 In a regular expression, "." matches any character, so the second regular expression ("sub") is not very robust. > I removed the old kernel with the commands below. > I've started deleting files by moving them to /tmp instead of deleting them. > It makes typos easier to fix. > I reboot my laptop every day, which clears out /tmp. > mv -iv /boot/*localversion1 /boot/*localversion1.img /boot/loader/entries/*localversion1.conf /tmp/ > mv -i /lib/modules/*localversion1/ /tmp/ Excellent idea. You are using /tmp something like the trash can in desktop environments. I use /tmp a lot, but I never thought of using it that way. > My laptop is booted from 5.5.14-200.fc31.x86_64 from koji. > Even idle, the cpu is 80C. I don't see anything unusual in top or gkrellm, and powertop shows 100 wakeups/second and 1.1% CPU use. There could be a mechanical problem -- have you tried vacuuming the vents? [later] I updated to 5.5.15, and it seems ok. The cpu is below 60C. I've seen bug reports about fans running continuously, so it could happen the other way. Can you monitor fan speeds with gkrellm? I have a shell script for monitoring temperatures and fan speeds. Here is a snippet: $ egrep -n 'temp1|fan1' ~/bin/mbmon3 14: TEMP1=$(cat /sys/class/hwmon/hwmon1/temp1_input) 26: FAN1=$(cat /sys/class/hwmon/hwmon2/fan1_input) > Removing the large vmlinux kernel from /boot works. > Just the vmlinuz kernel referenced in the /boot/loader/entries/ config file is enough. > The vmlinuz kernel does not need to be executable. The kernel has an odd status: it is certainly executable code, but the boot loader is the only program that starts it. > My /boot is only about 400MB, and the vmlinux file is about 100MB, so being able to remove it helps a lot. ~500MB used to be the recommended size for /boot. Now I standardize on ~1000MB. > I noticed the size difference, but as long as it boots, I didn't look into it. > I only need it running enough to boot and test the webcam video. Agreed. >>make binrpm-pkg > I can try that on the next bisection. OK. > I am making progress with the bisections. > 5.4.10 good > 5.4.11 bad > [97d9e8620f57f28f415b23ad88b97c87b6d53390] bnx2x: Do not handle requests from VFs after parity / good Excellent. That is the same commit that I got for the first bisection build, except that I called it "bad". (per "git bisect log") The "git bisect" man page has a section on how to fix "a mistake in specifying the status of a revision" by using the "git bisect replay" subcommand. > Regards, William
(In reply to William Bader from comment #94) ... > >I would make sure that the 'N' in '.localversionN' matches the 'N' in '.EXPN' > > I saved my config with cp -p .config ../config-`grep Linux .config | head -1 | awk '{print $3}'`-`grep -i CONFIG_LOCALVERSION= .config | sed -e 's/.*=".//' -e 's/"//g'` ... As a side note, I was actually doing just the opposite: .config.EXP4 -> "make nconfig" -> .config.EXP5 -> "cp" -> .config. The problem with my procedure is that I could not remember the various numbers, so I had to background "make nconfig" to check the file names and then foreground it again to save the updated .config.EXP[N+1] file. Your approach is more like doing a git commit: Change the file and then save a copy with the new version number.
Making rpms and combined with a virus stay-at-home order for Apr 9-13 helped finish the bisection. The first bad commit is 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 usb: missing parentheses in USE_NEW_SCHEME The bad commit is drivers/usb/core/hub.c -#define USE_NEW_SCHEME(i, scheme) ((i) / 2 == (int)scheme) +#define USE_NEW_SCHEME(i, scheme) ((i) / 2 == (int)(scheme)) That seems to be a reasonable change, because the macro is used only once in use_new_scheme(struct usb_device *udev, int retry, struct usb_port *port_dev) return USE_NEW_SCHEME(retry, old_scheme_first_port || old_scheme_first || quick_enumeration); hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, int retry_counter) has if (use_new_scheme(udev, retry_counter, port_dev)) { hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, u16 portchange) has for (i = 0; i < SET_CONFIG_TRIES; i++) { status = hub_port_init(hub, udev, port1, i); #define SET_CONFIG_TRIES (2 * (use_both_schemes + 1)) My guess is that either old_scheme_first or quick_enumeration is set, which makes the old USE_NEW_SCHEME true for i = 0 and the new USE_NEW_SCHEME false. Maybe the USB errors in my first comment happen on the initial try if USE_NEW_SCHEME is false, but the errors aren't hard enough to make it retry the new scheme or happen too late to be able to retry. Notes since the last comment: >Unfortunately, module compression seems to have made the package bigger: Probably the archive is compressed, and compressing compressed files doesn't usually have much gain. At least for now, the transfer is fast enough, and my system disk (with /lib/modules) is large enough, that I don't need to worry about compressing modules. My main problem was the uncompressed vmlinux filling up my /boot. The RPM does not have the vmlinux, plus installing and removing the RPM is safer and easier than typing tar and rm commands. I think you were right when you suggested that the laptop could be dusty inside. The last time I opened it up was a few years ago to install a larger SSD, and I blew out a lot of dust. I hear the fans spinning but something from the other side sounds like a hard disk, and my laptop doesn't have a hard disk. [later] It didn't stop when I shut down my laptop. It might be a hot water pipe for a radiator on the other side of a wall. [more later] I called a serviceman because the bathroom had no hot water. The serviceman couldn't fix it, and now the heat for the radiators is broken also. As far as I know, the hardware on my laptop does not expose the fan speed, and gkrellm does not show anything under 'Fans'. On 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 , I accidentally downloaded and installed the headers rpm instead of the kernel. It replaced the 5.5.15-200 headers, and when I tried removing kernel-headers-5.4.10.localversion8_00165_g7cbdf96cd-1.x86_64.rpm , it wanted to remove a lot of packages. I reinstalled the 5.5.15-200 headers, and then I could remove the 5.4.10 headers. I was one 'y' away from doing some damage, although 'dnf history rollback #' would probably have fixed it. After I downloaded the rpm, I cleared the rpms from rpmbuild/RPMS/x86_64/ , and when I rebuilt the rpms with 'binrpm-pkg', the header rpm came out a slightly different size. It is a little worrying that repeating 'make binrpm-pkg' on the same kernel build produces rpms with different sizes. Does it embed timestamps or logs? Here are my bisections: 5.4.10 good 5.4.11 bad 2 good [97d9e8620f57f28f415b23ad88b97c87b6d53390] bnx2x: Do not handle requests from VFs after parity 3 good 5.4.10.localversion3-00124-g43b0b3300-rpm 4 good [72cd84ea52407323b241571691b2426fb25c41ef] net: usb: lan78xx: fix possible skb leak / 5.4.10.localversion4-00145-g72cd84ea5 5 good [f479506e5164cb9eff4c60531bd48026dd433e4a] macb: Don't unregister clks unconditionally 6 good [caef8a716245726ede87417113db03f045fc1989] net/mlx5e: Fix hairpin RSS table size 7 good [578289f8476c3044f73ff15e138bfca555567ffe] USB: core: fix check for duplicate endpoints 8 bad [7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430] usb: missing parentheses in USE_NEW_SCHEME 9 good [093d658a06cd1831c629ceeee207572895c1a872] USB: serial: option: add Telit ME910G1 0x110a composition I had to remove the installed kernels from /boot, but I saved the rpms. Here is the message from git on reaching the end: 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 is the first bad commit commit 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 Author: Qi Zhou <atmgnd> Date: Sat Jan 4 11:02:01 2020 +0000 usb: missing parentheses in USE_NEW_SCHEME commit 1530f6f5f5806b2abbf2a9276c0db313ae9a0e09 upstream. According to bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") the kernel will try the old enumeration scheme first for high speed devices. This can happen when a high speed device is plugged in. But due to missing parentheses in the USE_NEW_SCHEME define, this logic can get messed up and the incorrect result happens. Acked-by: Alan Stern <stern.edu> Signed-off-by: Qi Zhou <atmgnd> Link: https://lore.kernel.org/r/ht4mtag8ZP-HKEhD0KkJhcFnVlOFV8N8eNjJVRD9pDkkLUNhmEo8_cL_sl7xy9mdajdH-T8J3TFQsjvoYQT61NFjQXy469Ed_BbBw_x4S1E=@protonmail.com [ fixup changelog text - gregkh] Cc: stable <stable.org> Fixes: bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") Signed-off-by: Greg Kroah-Hartman <gregkh> drivers/usb/core/hub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
> Making rpms and combined with a virus stay-at-home order for Apr 9-13 helped finish the bisection. > The first bad commit is 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 usb: missing parentheses in USE_NEW_SCHEME Awesome! Please put "[BISECTED]" at the beginning of the bug summary: "[BISECTED] built-in laptop webcam no longer found on Sony Vaio on Fedora 31"
> when I tried removing kernel-headers-5.4.10.localversion8_00165_g7cbdf96cd-1.x86_64.rpm , it wanted to remove a lot of packages. The "--noautoremove" option to dnf can sometimes reduce the number of packages that dnf wants to remove. BTW, I also accidentally transferred the headers package to my F31 test VM, but I didn't install it. My mistake was in creating a link in /tmp/ that made the path passed to "scp" simpler. From my shell history: 963 ln -s `pwd`/kernel-headers-5.4.10.localversion5_00124_g43b0b3300-12.x86_64.rpm /tmp/ # WRONG 968 ln -s `pwd`/kernel-5.4.10.localversion5_00124_g43b0b3300-12.x86_64.rpm /tmp/ # RIGHT Those commands were run from "~/rpmbuild/RPMS/x86_64". I didn't install the headers package because I ran some validation checks after transferring the rpm package. From the shell history in my F31 test VM: 960 rpmkeys --checksig -v kernel-headers-5.4.10.localversion5_00124_g43b0b3300-12.x86_64.rpm 963 rpm -qi -p kernel-headers-5.4.10.localversion5_00124_g43b0b3300-12.x86_64.rpm IIRC, the description from the second command caught my attention as being wrong. Interestingly, the "headers" in "kernel-headers ..." never caught my attention. I believe that is due to the file names being so long and tab-completion being so easy.
(In reply to Steve from comment #99) > IIRC, the description from the second command caught my attention as being wrong. It was because the "scp" transfer seemed unusually fast. That provoked me to investigate further.
> My guess is that either old_scheme_first or quick_enumeration is set, which makes the old USE_NEW_SCHEME true for i = 0 and the new USE_NEW_SCHEME false. > Maybe the USB errors in my first comment happen on the initial try if USE_NEW_SCHEME is false, but the errors aren't hard enough to make it retry the new scheme or happen too late to be able to retry. You could be right about that. The whole approach may need to be rethought. But guessing in software about how long some otherwise unknown piece of hardware should take to do something is an insoluble problem. I believe that some hardware can export expected timings for various commands -- but first you have to talk to the hardware. Anyway, you have enough to open an upstream bug if you want to: https://bugzilla.kernel.org/
(In reply to Steve from comment #101) > Anyway, you have enough to open an upstream bug if you want to: > > https://bugzilla.kernel.org/ I guess you have to register to open a new bug. A search of bugs under Drivers/USB found some instructions for collecting debug info: https://bugzilla.kernel.org/show_bug.cgi?id=203419#c21 I'm not sure if they are exactly applicable when the problem occurs while booting and the device is built in, but some of those options could probably be put on the kernel command-line. You probably don't need to do the first step because: $ mount -t debugfs debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,seclabel) Tested in an F31 VM with: $ uname -r 5.5.16-200.fc31.x86_64
Thanks for updating the bug summary with "[BISECTED]". (In reply to Steve from comment #102) ... > You probably don't need to do the first step because: > > $ mount -t debugfs > debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,seclabel) ... No problem: $ grep CONFIG_DYNAMIC_DEBUG .config CONFIG_DYNAMIC_DEBUG=y Dynamic debug https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html See, in particular, the section titled "Debug messages during Boot Process" and the examples at the end under "Kernel command line:". To set up "dyndbg" from the kernel command-line, this appears to work (Mathias adds some additional options that I haven't worked out yet): $ grep '^GRUB_CMDLINE_LINUX' /etc/default/grub GRUB_CMDLINE_LINUX="dyndbg='module xhci_hcd =p ; module usbcore =p'" Although this looks wrong (the first quote should be after "dyndbg=".): $ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.16-200.fc31.x86_64 root=UUID=54f79645-f858-46e0-af7a-97aecc88ff87 ro "dyndbg=module xhci_hcd =p ; module usbcore =p" Verify with: # egrep 'xhci_hcd|usbcore' /sys/kernel/debug/dynamic_debug/control | less # Look for "=p" in the third field.
(In reply to Steve from comment #103) ... $ grep '^GRUB_CMDLINE_LINUX' /etc/default/grub GRUB_CMDLINE_LINUX="dyndbg='module xhci_hcd =p ; module usbcore =p'" ... I repeatedly used the following process as I worked out that kernel command-line: $ sudo su Edit GRUB_CMDLINE_LINUX in /etc/default/grub. # cd /boot/grub2 # grub2-mkconfig -o grub.cfg # Rebuild grub.cfg. (Make a backup before starting.) # grep kernelopts grub.cfg # See what grub2-mkconfig actually generated. # reboot Press "e" when the grub2 menu is displayed and verify that the kernel command-line looks correct. I had various problems with nested quotes and what appeared to be incorrect interpretation of the ";" in the command-line. Press "ctrl-x" to boot. Login as usual, start a terminal session, and see what is actually on the kernel command-line: $ cat /proc/cmdline # As noted in Comment 103, the quotes end up like this "dyndbg=xxx" instead of like this dyndbg="xxx". ^ ^ ^ ^ $ sudo su # egrep 'xhci_hcd|usbcore' /sys/kernel/debug/dynamic_debug/control | less # Look for "=p" in the third field.
Thanks for the reply. >CONFIG_DYNAMIC_DEBUG=y That was already set in my .config, which came from a Fedora config. I would have expected that it would be off by default for performance reasons. >grep '^GRUB_CMDLINE_LINUX' /etc/default/grub Is there a way to set it other than /etc/default/grub ? I am worried that if I mess up that file, I could leave my laptop unbootable.
(In reply to William Bader from comment #105) > Thanks for the reply. > > >CONFIG_DYNAMIC_DEBUG=y > > That was already set in my .config, which came from a Fedora config. I would have expected that it would be off by default for performance reasons. Good point. Although, based on previous observations, the packaged config file does not match what is in the actual kernel. However, this much is certain: $ mount -t debugfs debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,seclabel) > >grep '^GRUB_CMDLINE_LINUX' /etc/default/grub > > Is there a way to set it other than /etc/default/grub ? I am worried that if I mess up that file, I could leave my laptop unbootable. You could type it all in by hand and "mess" that up instead. :-) If the command-line were simple, typing it in by hand would definitely be the best way to change the kernel options. As far as making your system unbootable, that is possible, but the kernel is very lenient about bogus command-line options -- it ignores them and doesn't even tell you. :-) And grub2 is helpful too -- it will drop you into its rescue shell. See below. However, there are two recovery strategies: 1. Boot from the F31 Live image, mount /boot, and replace the bad grub.cfg with the backup grub.cfg. 2. Use the grub2 "normal" command. (NB: That works great but it would be a good idea to read the grub2 docs first and do a few practice runs before having to use it "in extremis". That is a perfect application for a VM -- practice breaking things and then fixing them. :-))
Further testing of the dyndbg configuration. NB: This is in a VM, so it may not be entirely realistic. # tail -f /sys/kernel/debug/dynamic_debug/control Plug in a known-good USB flash drive (The one used for this test has a Fedora 31 Live image on it). There is no change in the "tail -f" display. So "ctrl-c" "tail -f" and run: # tail -14 /sys/kernel/debug/dynamic_debug/control net/netfilter/xt_MASQUERADE.c:28 [xt_MASQUERADE]masquerade_tg_check =_ "bad rangesize %u\012" net/netfilter/xt_MASQUERADE.c:24 [xt_MASQUERADE]masquerade_tg_check =_ "bad MAP_IPS.\012" drivers/usb/storage/usb.c:1127 [usb_storage]storage_probe =_ "Use Bulk-Only transport with the Transparent SCSI protocol for dynamic id: 0x%04x 0x%04x\012" drivers/usb/storage/usb.c:1064 [usb_storage]usb_stor_probe2 =_ "waiting for device to settle before scanning\012" drivers/usb/storage/usb.c:914 [usb_storage]usb_stor_scan_dwork =_ "scan complete\012" drivers/usb/storage/usb.c:896 [usb_storage]usb_stor_scan_dwork =_ "starting scan\012" drivers/usb/storage/sierra_ms.c:110 [usb_storage]truinst_show =_ "SWIMS: failed SWoC query\012" drivers/usb/storage/sierra_ms.c:89 [usb_storage]debug_swoc =_ "SWIMS: Linux Version: %04X\012" drivers/usb/storage/sierra_ms.c:88 [usb_storage]debug_swoc =_ "SWIMS: Linux SKU: %04X\012" drivers/usb/storage/sierra_ms.c:87 [usb_storage]debug_swoc =_ "SWIMS: SWoC Rev: %02d\012" drivers/usb/storage/sierra_ms.c:69 [usb_storage]sierra_get_swoc_info =_ "SWIMS: Attempting to get TRU-Install info\012" drivers/usb/storage/sierra_ms.c:51 [usb_storage]sierra_set_ms_mode =_ "SWIMS: %s" drivers/usb/storage/uas.c:126 [uas]uas_scan_work =_ "scan complete\012" drivers/usb/storage/uas.c:124 [uas]uas_scan_work =_ "starting scan\012" None of those are for "usbcore", so get nasty: # grep 'usbcore' /sys/kernel/debug/dynamic_debug/control | wc -l 152 # grep -A8 -n 'deviceremovable' /sys/kernel/debug/dynamic_debug/control 1322:drivers/usb/core/hub.c:6073 [usbcore]usb_hub_adjust_deviceremovable =p "DeviceRemovable is changed to 1 according to platform information.\012" 1323:drivers/usb/core/hub.c:6057 [usbcore]usb_hub_adjust_deviceremovable =p "DeviceRemovable is changed to 1 according to platform information.\012" 1324-drivers/usb/core/hub.c:5906 [usbcore]usb_reset_device =p "%s for root hub!\012" 1325-drivers/usb/core/hub.c:5899 [usbcore]usb_reset_device =p "device reset not allowed in state %d\012" 1326-drivers/usb/core/hub.c:5736 [usbcore]usb_reset_and_verify_device =p "device reset not allowed in state %d\012" 1327-drivers/usb/core/hub.c:5595 [usbcore]hub_event =p "over-current change\012" 1328-drivers/usb/core/hub.c:5583 [usbcore]hub_event =p "power change\012" 1329-drivers/usb/core/hub.c:5543 [usbcore]hub_event =p "error resetting hub: %d\012" 1330-drivers/usb/core/hub.c:5539 [usbcore]hub_event =p "resetting for error %d\012" 1331-drivers/usb/core/hub.c:5530 [usbcore]hub_event =p "Can't autoresume: %d\012" There is more than that, but my main observation is that there is no clear indication as to which device those messages apply to. Notes: 1. qemu automatically forwards USB device detection from the host to the VM, which is a VERY nice feature, but it may not reflect the behavior of bare metal. 2. "14" and "8" are empirically determined numbers for demonstrative purposes only.
On second thought, it might not be so bad to type this in: dyndbg="module xhci_hcd =p ; module usbcore =p" Disclaimer: I have not tested that by hand-typing it. NB: The "=" sign is a flag change operator, of which there are three: "-", "+", "=". The doc explains what they do in the section that starts: "The flags specification comprises a change operation followed by one or more flag characters." Dynamic debug https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html
OK, NOW I tested: :-) dyndbg="module usbcore =p" As before, the quote ends up in the wrong place: $ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.16-200.fc31.x86_64 root=UUID=54f79645-f858-46e0-af7a-97aecc88ff87 ro "dyndbg=module usbcore =p" However, the "=p" here means that it worked: # grep -m1 'usbcore' /sys/kernel/debug/dynamic_debug/control drivers/usb/core/hub.c:6073 [usbcore]usb_hub_adjust_deviceremovable =p "DeviceRemovable is changed to 1 according to platform information.\012"
(In reply to Steve from comment #106) ... > 2. Use the grub2 "normal" command. (NB: That works great but it would be a good idea to read the grub2 docs first and do a few practice runs before having to use it "in extremis". > That is a perfect application for a VM -- practice breaking things and then fixing them. :-)) Getting out of this "fix" is a perfect exercise in recovering from a grub2 boot failure. I made the following changes *intentionally*, but it was still a bit alarming to get the grub2 rescue prompt: # diff -u0 grub.cfg.BAK2 grub.cfg.EXP-UNBOOTABLE-1 ... @@ -128,2 +128,2 @@ -insmod blscfg -blscfg +#insmod blscfg +#blscfg In a VM, however ...
I just noticed that Mathias posted a patch *yesterday*: testpatch that doesn't clear TT buffer after protocol STALL https://bugzilla.kernel.org/show_bug.cgi?id=203419#c27 It applies to two files: $ grep diff 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c There is a git command to apply the patch: $ git apply --help And there is an option to "--reverse" the patch.
I'm doing a build now with the patch from Mathias: Go back to 5.4.11: $ git bisect reset Previous HEAD position was 43b0b3300 arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list HEAD is now at 9d61432ef Linux 5.4.11 Check that the patch applies cleanly to 5.4.11: $ git apply --check -v 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch Apply the patch: $ git apply -v 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch Verify that git knows that the code has been modified: $ git status HEAD detached at 9d61432ef Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: drivers/usb/host/xhci-ring.c modified: drivers/usb/host/xhci.c ... $ time make ...
(In reply to Steve from comment #112) ... > $ git apply -v 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch ... There is a typo in that patch that causes a build failure, so a second patch is needed to patch the first patch: $ git apply -v 0002-Fix-a-typo-in-prevoius-patch.patch That second patch is linked from Vincenzo's comment: https://bugzilla.kernel.org/show_bug.cgi?id=203419#c28 After building the rpm, we are informed that our build is "dirty", presumably because the patch created uncommitted files: $ ls -1sh kernel-5.4.11.localversion6_dirty-14.x86_64.rpm 75M kernel-5.4.11.localversion6_dirty-14.x86_64.rpm After booting and enabling all of Mathias's debug settings*, insert and remove USB devices. Or see if the video works. :-) Mathias's debug messages should be visible: # grep 'TT' /sys/kernel/debug/dynamic_debug/control # grep 'xhci_hcd' /sys/kernel/debug/dynamic_debug/control | less Tested in a VM and on bare metal (a laptop). * From https://bugzilla.kernel.org/show_bug.cgi?id=203419#c21. (Actually, the "usbcore" logging doesn't seem to be needed.)
> As far as I know, the hardware on my laptop does not expose the fan speed, and gkrellm does not show anything under 'Fans'. The "lm_sensors" package has a "sensors-detect" command that can search for hardware monitoring chips and then configure the appropriate kernel modules to be loaded. Indeed, there is a whole "hwmon" directory for such kernel modules: $ find /lib/modules/`uname -r`/kernel/drivers/hwmon | sort This will find info supplied by modules that have already been loaded: $ find -L /sys/class/hwmon -maxdepth 2 2>/dev/null | xargs grep -s '' | sort | egrep 'temp|fan|label|name'
>grub.cfg I didn't do this today because I have been up all night the last few nights doing bisections, and I don't want to do it when I am tired and risk making a mistake. >I'm doing a build now with the patch from Mathias: Since the 5.5.15 kernel has the problem, and since the patch is only a few days old, would it be better if I tried applying the patch to that kernel? If it fixes the problem on the old kernel, I would still have to retest it on the current kernel, so trying the new kernel first could save a step. >After booting and enabling all of Mathias's debug settings, insert and remove USB devices. Or see if the video works. :-) The webcam is integrated with the laptop. Wouldn't I need to enable the debug settings in /etc/default/grub? It is currently GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="rhgb quiet elevator=noop LANG=en_US.UTF-8 mitigations=off" GRUB_DISABLE_RECOVERY="true" GRUB_ENABLE_BLSCFG=true Did you want me to change GRUB_CMDLINE_LINUX to GRUB_CMDLINE_LINUX="rhgb quiet elevator=noop LANG=en_US.UTF-8 mitigations=off dyndbg='module xhci_hcd =p ; module usbcore =p'" Do I still have an option to boot and edit the command line at boot time and add dyndbg='module xhci_hcd =p ; module usbcore =p' ? You said before that I could press 'e' from the grub menu to edit the command line. I think that it is safer because if I type it wrong, hopefully the worst that can happen is that it won't boot, and I can just power off and reboot. I have in my notes that after I added mitigations=off to /etc/default/grub, I ran "grub2-mkconfig -o /boot/grub2/grub.cfg". I still have the tar of /boot I made on Apr 6. >The "lm_sensors" package has a "sensors-detect" command I have a log below. Should I try YES for any of the risky probes? Is the main risk of the probe a crash or could it also do damage or write on the SSD? ---------- $ sudo /usr/sbin/sensors-detect # sensors-detect revision $Revision$ <-- wrong RCS option to do the build? # System: Sony Corporation VPCCB4Q1E [C60A58AK] (laptop) # Board: Sony Corporation VAIO # Kernel: 5.5.15-200.fc31.x86_64 x86_64 # Processor: Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (6/42/7) This program will help you determine which kernel modules you need to load to use lm_sensors most effectively. It is generally safe and recommended to accept the default answers to all questions, unless you know what you're doing. Some south bridges, CPUs or memory controllers contain embedded sensors. Do you want to scan for them? This is totally safe. (YES/no): YES Silicon Integrated Systems SIS5595... No VIA VT82C686 Integrated Sensors... No VIA VT8231 Integrated Sensors... No AMD K8 thermal sensors... No AMD Family 10h thermal sensors... No AMD Family 11h thermal sensors... No AMD Family 12h and 14h thermal sensors... No AMD Family 15h thermal sensors... No AMD Family 16h thermal sensors... No AMD Family 17h thermal sensors... No AMD Family 15h power sensors... No AMD Family 16h power sensors... No Intel digital thermal sensor... Success! (driver `coretemp') Intel AMB FB-DIMM thermal sensor... No Intel 5500/5520/X58 thermal sensor... No VIA C7 thermal sensor... No VIA Nano thermal sensor... No Some Super I/O chips contain embedded sensors. We have to write to standard I/O ports to probe them. This is usually safe. Do you want to scan for Super I/O sensors? (YES/no): no Some hardware monitoring chips are accessible through the ISA I/O ports. We have to write to arbitrary I/O ports to probe them. This is usually safe though. Yes, you do have ISA I/O ports even if you do not have any ISA slots! Do you want to scan the ISA I/O ports? (YES/no): no Lastly, we can probe the I2C/SMBus adapters for connected hardware monitoring devices. This is the most risky part, and while it works reasonably well on most systems, it has been reported to cause trouble on some systems. Do you want to probe the I2C/SMBus adapters now? (YES/no): no Now follows a summary of the probes I have just done. Just press ENTER to continue: Driver `coretemp': * Chip `Intel digital thermal sensor' (confidence: 9) Do you want to overwrite /etc/sysconfig/lm_sensors? (YES/no): no To load everything that is needed, add this to one of the system initialization scripts (e.g. /etc/rc.d/rc.local): #----cut here---- # Chip drivers modprobe coretemp /usr/bin/sensors -s #----cut here---- You really should try these commands right now to make sure everything is working properly. Monitoring programs won't work until the needed modules are loaded. ---------- I did "modprobe coretemp" and "/usr/bin/sensors -s" manually. $ find -L /sys/class/hwmon -maxdepth 2 2>/dev/null | xargs grep -s '' | sort | egrep 'temp|fan|label|name' /sys/class/hwmon/hwmon0/name:ADP1 /sys/class/hwmon/hwmon0/temp1_label:temp ambient /sys/class/hwmon/hwmon0/temp2_label:temp /sys/class/hwmon/hwmon1/name:acpitz /sys/class/hwmon/hwmon1/temp1_crit:96000 /sys/class/hwmon/hwmon1/temp1_input:58000 /sys/class/hwmon/hwmon1/temp2_crit:96000 /sys/class/hwmon/hwmon1/temp2_input:58000 /sys/class/hwmon/hwmon2/name:BAT0 /sys/class/hwmon/hwmon2/temp1_label:temp ambient /sys/class/hwmon/hwmon2/temp2_label:temp /sys/class/hwmon/hwmon3/name:radeon /sys/class/hwmon/hwmon3/temp1_crit:120000 /sys/class/hwmon/hwmon3/temp1_crit_hyst:90000 /sys/class/hwmon/hwmon4/name:coretemp /sys/class/hwmon/hwmon4/temp1_crit:100000 /sys/class/hwmon/hwmon4/temp1_crit_alarm:0 /sys/class/hwmon/hwmon4/temp1_input:60000 /sys/class/hwmon/hwmon4/temp1_label:Package id 0 /sys/class/hwmon/hwmon4/temp1_max:86000 /sys/class/hwmon/hwmon4/temp2_crit:100000 /sys/class/hwmon/hwmon4/temp2_crit_alarm:0 /sys/class/hwmon/hwmon4/temp2_input:59000 /sys/class/hwmon/hwmon4/temp2_label:Core 0 /sys/class/hwmon/hwmon4/temp2_max:86000 /sys/class/hwmon/hwmon4/temp3_crit:100000 /sys/class/hwmon/hwmon4/temp3_crit_alarm:0 /sys/class/hwmon/hwmon4/temp3_input:59000 /sys/class/hwmon/hwmon4/temp3_label:Core 1 /sys/class/hwmon/hwmon4/temp3_max:86000 hwmon has a lot of files: $ sudo find /lib/modules/`uname -r`/kernel/drivers/hwmon | wc -l 162 None of them say 'fan'. Is it aspeed-pwm-tacho? $ lc -R /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/hwmon/ /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/hwmon/: abituguru.ko.xz adt7462.ko.xz f71882fg.ko.xz k10temp.ko.xz ltc2945.ko.xz max6642.ko.xz sht15.ko.xz vt1211.ko.xz abituguru3.ko.xz adt7470.ko.xz f75375s.ko.xz k8temp.ko.xz ltc2947-core.ko.xz max6650.ko.xz sht21.ko.xz vt8231.ko.xz acpi_power_meter.ko.xz adt7475.ko.xz fam15h_power.ko.xz lineage-pem.ko.xz ltc2947-i2c.ko.xz max6697.ko.xz sht3x.ko.xz w83627ehf.ko.xz ad7314.ko.xz adt7x10.ko.xz fschmd.ko.xz lm63.ko.xz ltc2947-spi.ko.xz mcp3021.ko.xz shtc1.ko.xz w83627hf.ko.xz ad7414.ko.xz amc6821.ko.xz ftsteutates.ko.xz lm70.ko.xz ltc2990.ko.xz mlxreg-fan.ko.xz sis5595.ko.xz w83773g.ko.xz ad7418.ko.xz applesmc.ko.xz g760a.ko.xz lm73.ko.xz ltc4151.ko.xz nct6683.ko.xz smsc47b397.ko.xz w83781d.ko.xz adc128d818.ko.xz asb100.ko.xz g762.ko.xz lm75.ko.xz ltc4215.ko.xz nct6775.ko.xz smsc47m1.ko.xz w83791d.ko.xz adcxx.ko.xz asc7621.ko.xz gl518sm.ko.xz lm77.ko.xz ltc4222.ko.xz nct7802.ko.xz smsc47m192.ko.xz w83792d.ko.xz adm1021.ko.xz aspeed-pwm-tacho.ko.xz gl520sm.ko.xz lm78.ko.xz ltc4245.ko.xz nct7904.ko.xz tc654.ko.xz w83793.ko.xz adm1025.ko.xz asus_atk0110.ko.xz hwmon-vid.ko.xz lm80.ko.xz ltc4260.ko.xz npcm750-pwm-fan.ko.xz tc74.ko.xz w83795.ko.xz adm1026.ko.xz atxp1.ko.xz i5500_temp.ko.xz lm83.ko.xz ltc4261.ko.xz ntc_thermistor.ko.xz thmc50.ko.xz w83l785ts.ko.xz adm1029.ko.xz coretemp.ko.xz i5k_amb.ko.xz lm85.ko.xz max1111.ko.xz pc87360.ko.xz tmp102.ko.xz w83l786ng.ko.xz adm1031.ko.xz dell-smm-hwmon.ko.xz ibmaem.ko.xz lm87.ko.xz max16065.ko.xz pc87427.ko.xz tmp103.ko.xz adm9240.ko.xz dme1737.ko.xz ibmpex.ko.xz lm90.ko.xz max1619.ko.xz pcf8591.ko.xz tmp108.ko.xz ads7828.ko.xz ds1621.ko.xz ina209.ko.xz lm92.ko.xz max1668.ko.xz pmbus tmp401.ko.xz ads7871.ko.xz ds620.ko.xz ina2xx.ko.xz lm93.ko.xz max197.ko.xz powr1220.ko.xz tmp421.ko.xz adt7310.ko.xz emc1403.ko.xz ina3221.ko.xz lm95234.ko.xz max31722.ko.xz sch5627.ko.xz tmp513.ko.xz adt7410.ko.xz emc6w201.ko.xz it87.ko.xz lm95241.ko.xz max31790.ko.xz sch5636.ko.xz via-cputemp.ko.xz adt7411.ko.xz f71805f.ko.xz jc42.ko.xz lm95245.ko.xz max6639.ko.xz sch56xx-common.ko.xz via686a.ko.xz /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/hwmon/pmbus: adm1275.ko.xz lm25066.ko.xz ltc3815.ko.xz max20751.ko.xz max8688.ko.xz pmbus_core.ko.xz tps53679.ko.xz ucd9200.ko.xz bel-pfe.ko.xz ltc2978.ko.xz max16064.ko.xz max34440.ko.xz pmbus.ko.xz tps40422.ko.xz ucd9000.ko.xz zl6100.ko.xz
>>I'm doing a build now with the patch from Mathias: > Since the 5.5.15 kernel has the problem, and since the patch is only a few days old, would it be better if I tried applying the patch to that kernel? I looked at the upstream bug reports, and it wasn't clear what kernel version they were using to test the patch, so I did what was easiest: $ git bisect reset And then applied the two patches, ran "make clean", rebuilt, and tested. > If it fixes the problem on the old kernel, I would still have to retest it on the current kernel, so trying the new kernel first could save a step. You are on the upstream 5.4.y branch, which is up to 5.4.31 (per kernel.org). To test against 5.5.15, you would need to clone the 5.5.y branch. You would also need to update the ".config" file. Anyway, I would go with testing against 5.4.11, since you already know it is "bad". Doing that takes the incremental approach -- make one change at a time. Clarification: You don't need to make any of the dyndbg kernel command-line changes to test the patch. Just test the same way you tested when doing the bisection. I apologize for not making that clear. And please take your time. You already did a great job getting the VM set up and doing the bisection, which confirms that the problem is indeed in the USB code. NB: I don't know for certain that the patch will fix the problem you are seeing. Since the patch applies to USB code, and it fixes problems with some other USB devices (per upstream bug reports), it seems like the patch is worth testing. Specifically, the patch applies to these two files in the USB code: $ grep diff 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
(In reply to Steve from comment #116) > ... it fixes problems with some other USB devices (per upstream bug reports) ... Important note: The patch applies to ALL USB devices, it is not device-specific. Indeed, Alan explicitly refers to "the USB-2.0 specification" here: https://bugzilla.kernel.org/show_bug.cgi?id=207065#c6 NB: Mathias posted his patch in TWO bug reports: This one, which was opened 2020-04-02: Bug_207065 - C-media USB audio device stops working from 5.2.0-rc3 onwards https://bugzilla.kernel.org/show_bug.cgi?id=207065 And this one, which was opened 2019-04-25: Bug_203419 - Logitech Group USB audio stopped working in 5.1-rc6 https://bugzilla.kernel.org/show_bug.cgi?id=203419
>>The "lm_sensors" package has a "sensors-detect" command >I have a log below. Should I try YES for any of the risky probes? I don't know, although I answered "yes" to everything when I ran "sensors-detect" on my systems (desktop, laptop). >Is the main risk of the probe a crash or could it also do damage or write on the SSD? According to the "sensors-detect" man page: "sensors-detect needs to access the hardware for most of the chip detections. By definition, it doesn't know which chips are there before it manages to identify them. This means that it can access chips in a way these chips do not like, causing problems ranging from SMBus lockup to permanent hardware damage (a rare case, thankfully.)" I don't think "sensors-detect" would access the SSD, because the SSD is a SATA device on a SATA controller on the PCI bus, which wouldn't be a likely candidate for a sensors device: $ lspci | grep SATA In the past, after I ran "memtest86+", I discovered that my BIOS settings were slightly changed. There is a separate program for monitoring HDD temperatures, and it uses SMART commands: $ rpm -qi hddtemp >I did "modprobe coretemp" and "/usr/bin/sensors -s" manually. If you run "sensors" without any arguments, it should report some sensors data, depending on what kernel modules are loaded. On my desktop system: $ sensors nct6791-isa-0290 Adapter: ISA adapter Vcore: +0.87 V (min = +0.00 V, max = +1.74 V) in1: +1.01 V (min = +0.00 V, max = +0.00 V) ALARM ... coretemp-isa-0000 Adapter: ISA adapter Package id 0: +29.0°C (high = +80.0°C, crit = +100.0°C) Core 0: +28.0°C (high = +80.0°C, crit = +100.0°C) Core 1: +27.0°C (high = +80.0°C, crit = +100.0°C) NB: I posted the "find" command so you could see where programs actually get sensors data. >hwmon has a lot of files [that are kernel modules]: There are a lot of sensors devices out there, so there are a lot of kernel modules to support them. :-) >None of them say 'fan'. Is it aspeed-pwm-tacho? Unless you know exactly what sensors chips are in your system, you are better off running "sensors-detect". However, if you want to know more about a specific kernel module: $ modinfo aspeed-pwm-tacho
Sometimes there are vendor-specific kernel modules, but they don't always work too well. This found some modules: $ find /lib/modules/`uname -r`/ -name \*sony\* Some of them might already be loaded: $ lsmod | grep sony
> Wouldn't I need to enable the debug settings in /etc/default/grub? Yes, to enable dyndbg logging for the camera while booting. However, dyndbg logging can *tested* after booting by running the commands given by Mathias in a terminal window, running this in a full-screen terminal window: $ journalctl --no-hostname -k -f And then inserting and removing a USB device. That won't test your camera, but it will ensure that dyndbg is correctly enabled. > It is currently > ... > GRUB_CMDLINE_LINUX="rhgb quiet elevator=noop LANG=en_US.UTF-8 mitigations=off" > ... > Did you want me to change GRUB_CMDLINE_LINUX to > GRUB_CMDLINE_LINUX="rhgb quiet elevator=noop LANG=en_US.UTF-8 mitigations=off dyndbg='module xhci_hcd =p ; module usbcore =p'" Yes, except that I would remove "rhgb quiet". Those options just make debugging harder. I ALWAYS remove "rhgb quiet" after installing a new system. What I usually do is make a "back-up" copy of the line I am about to modify and put a "#" in front so that it becomes a comment: #GRUB_CMDLINE_LINUX="list of options" <<<< This is the "back-up" copy preserved as a comment. GRUB_CMDLINE_LINUX="modified list of options" > Do I still have an option to boot and edit the command line at boot time and add dyndbg='module xhci_hcd =p ; module usbcore =p' ? Yes, and you can remove them if they are already there. > You said before that I could press 'e' from the grub menu to edit the command line. I think that it is safer because if I type it wrong, hopefully the worst that can happen is that it won't boot, and I can just power off and reboot. OK, and you can check your typing with: $ cat /proc/cmdline The main concern is reliable reproducibility during testing. > I have in my notes that after I added mitigations=off to /etc/default/grub, I ran "grub2-mkconfig -o /boot/grub2/grub.cfg". > I still have the tar of /boot I made on Apr 6. If you modify /etc/default/grub, you need to run grub2-mkconfig to update grub.cfg. It's a cumbersome process, admittedly.
Created attachment 1678085 [details] shell script to enable dyndbg for USB testing (commands are from Mathias Nyman) # Enable dyndbg logging for USB testing. # # This command must be run as root. # # Usage: # # sudo dyndbg-1.sh # # < connect the USB device > # # Attach to the bug report the output from: # # journalctl --no-hostname -k > /tmp/dmesg-1.txt # sudo cat /sys/kernel/debug/tracing/trace > /tmp/trace-1.txt
(In reply to Steve from comment #121) > Created attachment 1678085 [details] > shell script to enable dyndbg for USB testing Those commands are from: Mathias Nyman 2020-04-02 13:34:44 UTC https://bugzilla.kernel.org/show_bug.cgi?id=203419#c21
Created attachment 1678098 [details] shell script to enable dyndbg for USB testing (commands are from Mathias Nyman) # Enable dyndbg logging for USB testing. # # Commands are from Mathias Nyman # https://bugzilla.kernel.org/show_bug.cgi?id=203419#c21 # # Usage: # # sudo dyndbg-1.sh # # < connect the USB device > # # Attach to the bug report the output from: # # journalctl --no-hostname -k > /tmp/dmesg-1.txt # sudo cat /sys/kernel/debug/tracing/trace > /tmp/trace-1.txt
I completed a test of enabling dyndbg from the kernel command-line on *bare metal* (a laptop) with the built-in web cam enabled. I have this in /etc/default/grub: $ grep '^GRUB_CMDLINE_LINUX' /etc/default/grub GRUB_CMDLINE_LINUX="dyndbg='module xhci_hcd =p ; module usbcore =p'" After running grub2-mkconfig and rebooting, the kernel command-line shows: $ cat /proc/cmdline BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro "dyndbg=module xhci_hcd =p ; module usbcore =p" And this is what is in the log for the built-in web cam: $ journalctl --no-hostname -k | grep -v audit | grep -C1 -n 'usb 1-1.2' > /tmp/dmesg-usb-dyndbg-1.txt 871-Apr 11 06:24:29 kernel: usb 2-1-port2: status 0101, change 0000, 12 Mb/s 872:Apr 11 06:24:29 kernel: usb 1-1.2: new high-speed USB device number 3 using ehci-pci 873-Apr 11 06:24:29 kernel: random: systemd: uninitialized urandom read (16 bytes read) -- 882-Apr 11 06:24:29 kernel: usb 2-1.2: new high-speed USB device number 3 using ehci-pci 883:Apr 11 06:24:29 kernel: usb 1-1.2: skipped 1 descriptor after configuration 884:Apr 11 06:24:29 kernel: usb 1-1.2: skipped 6 descriptors after interface 885:Apr 11 06:24:29 kernel: usb 1-1.2: skipped 1 descriptor after endpoint 886:Apr 11 06:24:29 kernel: usb 1-1.2: skipped 10 descriptors after interface 887:Apr 11 06:24:29 kernel: usb 1-1.2: default language 0x0409 888:Apr 11 06:24:29 kernel: usb 1-1.2: udev 3, busnum 1, minor = 2 889:Apr 11 06:24:29 kernel: usb 1-1.2: New USB device found, idVendor=13d3, idProduct=5710, bcdDevice=11.30 890:Apr 11 06:24:29 kernel: usb 1-1.2: New USB device strings: Mfr=3, Product=1, SerialNumber=2 891:Apr 11 06:24:29 kernel: usb 1-1.2: Product: USB 2.0 UVC VGA WebCam 892:Apr 11 06:24:29 kernel: usb 1-1.2: Manufacturer: Azurewave 893:Apr 11 06:24:29 kernel: usb 1-1.2: SerialNumber: 0x0001 894:Apr 11 06:24:29 kernel: usb 1-1.2: usb_probe_device 895:Apr 11 06:24:29 kernel: usb 1-1.2: configuration #1 chosen from 1 choice 896:Apr 11 06:24:29 kernel: usb 1-1.2: adding 1-1.2:1.0 (config #1, interface 0) 897:Apr 11 06:24:29 kernel: usb 1-1.2: adding 1-1.2:1.1 (config #1, interface 1) 898-Apr 11 06:24:29 kernel: usb 2-1.2: USB quirks for this device: 400 -- 1066-Apr 11 13:24:59 kernel: intel_rapl_common: Found RAPL domain uncore 1067:Apr 11 13:24:59 kernel: usb 1-1.2: usb auto-suspend, wakeup 0 1068-Apr 11 13:24:59 kernel: hub 1-1:1.0: hub_suspend
Let's amend the command-line by adding the "m" and "f" flags to show the module and function names (per doc, Comment 108): GRUB_CMDLINE_LINUX="dyndbg='module xhci_hcd =pmf ; module usbcore =pmf'" ^^ ^^ The output is MUCH more informative and allows for more precise grepping, if needed. Note that module ("usbcore") and function names (e.g. "usb_parse_configuration") are now logged: $ cat /proc/cmdline BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro "dyndbg=module xhci_hcd =pmf ; module usbcore =pmf" $ journalctl --no-hostname -k | grep -v audit | grep -C1 -n 'usb 1-1.2' > /tmp/dmesg-usb-dyndbg-2.txt 847-Apr 11 07:12:14 systemd[1]: Running in initial RAM disk. 848:Apr 11 07:12:14 kernel: usb 1-1.2: new high-speed USB device number 3 using ehci-pci 849-Apr 11 07:12:14 systemd[1]: Set hostname to <removed>. 850:Apr 11 07:12:14 kernel: usbcore:usb_parse_configuration: usb 1-1.2: skipped 1 descriptor after configuration 851:Apr 11 07:12:14 kernel: usbcore:usb_parse_interface: usb 1-1.2: skipped 6 descriptors after interface 852:Apr 11 07:12:14 kernel: usbcore:usb_parse_endpoint: usb 1-1.2: skipped 1 descriptor after endpoint 853:Apr 11 07:12:14 kernel: usbcore:usb_parse_interface: usb 1-1.2: skipped 10 descriptors after interface 854:Apr 11 07:12:14 kernel: usbcore:usb_get_langid: usb 1-1.2: default language 0x0409 855:Apr 11 07:12:14 kernel: usbcore:usb_new_device: usb 1-1.2: udev 3, busnum 1, minor = 2 856:Apr 11 07:12:14 kernel: usb 1-1.2: New USB device found, idVendor=13d3, idProduct=5710, bcdDevice=11.30 857:Apr 11 07:12:14 kernel: usb 1-1.2: New USB device strings: Mfr=3, Product=1, SerialNumber=2 858:Apr 11 07:12:14 kernel: usb 1-1.2: Product: USB 2.0 UVC VGA WebCam 859:Apr 11 07:12:14 kernel: usb 1-1.2: Manufacturer: Azurewave 860:Apr 11 07:12:14 kernel: usb 1-1.2: SerialNumber: 0x0001 861:Apr 11 07:12:14 kernel: usbcore:usb_probe_device: usb 1-1.2: usb_probe_device 862:Apr 11 07:12:14 kernel: usbcore:usb_choose_configuration: usb 1-1.2: configuration #1 chosen from 1 choice 863:Apr 11 07:12:14 kernel: usbcore:usb_set_configuration: usb 1-1.2: adding 1-1.2:1.0 (config #1, interface 0) 864:Apr 11 07:12:14 kernel: usbcore:usb_set_configuration: usb 1-1.2: adding 1-1.2:1.1 (config #1, interface 1) 865-Apr 11 07:12:14 kernel: random: systemd: uninitialized urandom read (16 bytes read) -- 1067-Apr 11 14:12:32 kernel: input: HDA Intel PCH HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1b.0/sound/card0/input18 1068:Apr 11 14:12:34 kernel: usbcore:usb_port_suspend: usb 1-1.2: usb auto-suspend, wakeup 0 1069-Apr 11 14:12:34 kernel: usbcore:hub_suspend: hub 1-1:1.0: hub_suspend
Here is a command-line for cloning 5.5.y: $ time git clone --shallow-exclude=linux-5.4.y --branch linux-5.5.y https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.5 Cloning into 'linux-5.5'... ... real 2m28.679s ... $ cd linux-5.5 $ git branch * linux-5.5.y $ git tag --list | sort -Vr v5.5.16 v5.5.15 ... v5.5-rc1 v5.5
># sensors-detect revision $Revision$ <-- wrong RCS option to do the build? I'm glad you pointed that out. I was trying to figure out a way to get the kernel ".config" file under version control without intermingling it with the kernel git repo. And RCS seemed like it might work. And it does: $ cd linux-5.4 $ mkdir RCS $ ci -l .config RCS/.config,v <-- .config enter description, terminated with single '.' or end of file: NOTE: This is NOT the log message! >> . initial revision: 1.1 done $ make nconfig # change localversion to 8 $ rcsdiff -u0 .config =================================================================== RCS file: RCS/.config,v retrieving revision 1.1 diff -u0 -r1.1 .config ... @@ -26 +26 @@ -CONFIG_LOCALVERSION=".localversion7" +CONFIG_LOCALVERSION=".localversion8" And for the real test: $ git status That doesn't even mention the RCS/ directory. NB: I've been using RCS for years for important files, such as system config files, and for notes, some of which have nothing to do with software. $ rpm -q rcs rcs-5.9.4-11.fc30.x86_64
Created attachment 1678187 [details] shell script to enable dyndbg for USB testing (based on commands from Mathias Nyman) # Enable dyndbg logging for USB testing. # # Based on commands from Mathias Nyman: # https://bugzilla.kernel.org/show_bug.cgi?id=203419#c21 # # Usage: # # sudo dyndbg-1.sh # # < connect the USB device > # # Attach to the bug report the output from: # # journalctl --no-hostname -k > /tmp/dmesg-1.txt # sudo cat /sys/kernel/debug/tracing/trace > /tmp/trace-1.txt This version adds the "m" and "f" flags to show modules and functions: -echo 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control -echo 'module usbcore =p' > /sys/kernel/debug/dynamic_debug/control +echo 'module xhci_hcd =pmf' > /sys/kernel/debug/dynamic_debug/control +echo 'module usbcore =pmf' > /sys/kernel/debug/dynamic_debug/control
Created attachment 1678188 [details] journalctl logs The patches didn't seem to help. I attached the results of journalctl --no-hostname -k | grep -v audit | grep -C1 -n 'usb 1-1.3' > dmesg-usb-dyndbg-`uname -r`-`date '+%Y%m%d-%H%M%S'`.txt I saved but did not attach the entire journalctl log without the grep, so I can do other searches without rebooting. I wasn't sure if the full log had anything that needed to be redacted. The bad logs are all similar to each other, and the good logs are all similar to each other. dmesg-usb-nodyndbg-5.4.10.localversion10-00164-g093d658a0-dirty-old-good-after-patch-20200412-000716.txt <- final good kernel from the bisection + the patch dmesg-usb-dyndbg-5.4.10.localversion9-00164-g093d658a0-old-good-before-patch-20200412-001250.txt <- final good kernel from the bisection, no patch dmesg-usb-dyndbg-5.5.15-200.fc31.x86_64-bad-nopatch-20200412-003043.txt <- Fedora 5.5.15 stable kernel built by Fedora, bad dmesg-usb-dyndbg-5.4.10.localversion8-00165-g7cbdf96cd-old-bad-nopatch-20200412-003754.txt <- the kernel with the 'usb: missing parentheses in USE_NEW_SCHEME' commit that broke the webcam dmesg-usb-dyndbg-5.4.11.localversion11-dirty-bad-patch-20200412-062831.txt <- 5.4.11 kernel built from 'git checkout v5.4.11' with the two usb patches. (I did not reapply the patches. 'git status' and 'git diff' show that git maintained them through the 'git bisect reset' and the checkout.) I still have patched 5.4.11 kernel in my build area, and I can do checks on it if you want to be sure that I applied the patches correctly. I suppose that I can also put in some debug code.
Here are some of the sensor tests. I tried answering yes to everything in sensors-detect. $ sudo sensors-detect # sensors-detect revision $Revision$ # System: Sony Corporation VPCCB4Q1E [C60A58AK] (laptop) # Board: Sony Corporation VAIO # Kernel: 5.5.15-200.fc31.x86_64 x86_64 # Processor: Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (6/42/7) This program will help you determine which kernel modules you need to load to use lm_sensors most effectively. It is generally safe and recommended to accept the default answers to all questions, unless you know what you're doing. Some south bridges, CPUs or memory controllers contain embedded sensors. Do you want to scan for them? This is totally safe. (YES/no): YES Silicon Integrated Systems SIS5595... No VIA VT82C686 Integrated Sensors... No VIA VT8231 Integrated Sensors... No AMD K8 thermal sensors... No AMD Family 10h thermal sensors... No AMD Family 11h thermal sensors... No AMD Family 12h and 14h thermal sensors... No AMD Family 15h thermal sensors... No AMD Family 16h thermal sensors... No AMD Family 17h thermal sensors... No AMD Family 15h power sensors... No AMD Family 16h power sensors... No Intel digital thermal sensor... Success! (driver `coretemp') Intel AMB FB-DIMM thermal sensor... No Intel 5500/5520/X58 thermal sensor... No VIA C7 thermal sensor... No VIA Nano thermal sensor... No Some Super I/O chips contain embedded sensors. We have to write to standard I/O ports to probe them. This is usually safe. Do you want to scan for Super I/O sensors? (YES/no): YES Probing for Super-I/O at 0x2e/0x2f Trying family `National Semiconductor/ITE'... No Trying family `SMSC'... No Trying family `VIA/Winbond/Nuvoton/Fintek'... No Trying family `ITE'... No Probing for Super-I/O at 0x4e/0x4f Trying family `National Semiconductor/ITE'... No Trying family `SMSC'... No Trying family `VIA/Winbond/Nuvoton/Fintek'... No Trying family `ITE'... No Some hardware monitoring chips are accessible through the ISA I/O ports. We have to write to arbitrary I/O ports to probe them. This is usually safe though. Yes, you do have ISA I/O ports even if you do not have any ISA slots! Do you want to scan the ISA I/O ports? (YES/no): YES Probing for `National Semiconductor LM78' at 0x290... No Probing for `National Semiconductor LM79' at 0x290... No Probing for `Winbond W83781D' at 0x290... No Probing for `Winbond W83782D' at 0x290... No Lastly, we can probe the I2C/SMBus adapters for connected hardware monitoring devices. This is the most risky part, and while it works reasonably well on most systems, it has been reported to cause trouble on some systems. Do you want to probe the I2C/SMBus adapters now? (YES/no): YES Using driver `i2c-i801' for device 0000:00:1f.3: Intel Cougar Point (PCH) Module i2c-dev loaded successfully. Next adapter: i915 gmbus ssc (i2c-0) Do you want to scan it? (yes/NO/selectively): yes Next adapter: i915 gmbus vga (i2c-1) Do you want to scan it? (yes/NO/selectively): yes Next adapter: i915 gmbus panel (i2c-2) Do you want to scan it? (yes/NO/selectively): yes Client found at address 0x28 Probing for `National Semiconductor LM78'... No Probing for `National Semiconductor LM79'... No Probing for `National Semiconductor LM80'... No Probing for `National Semiconductor LM96080'... No Probing for `Winbond W83781D'... No Probing for `Winbond W83782D'... No Probing for `Nuvoton NCT7802Y'... No Probing for `Winbond W83627HF'... No Probing for `Winbond W83627EHF'... No Probing for `Winbond W83627DHG/W83667HG/W83677HG'... No Probing for `Asus AS99127F (rev.1)'... No Probing for `Asus AS99127F (rev.2)'... No Probing for `Asus ASB100 Bach'... No Probing for `Analog Devices ADM1029'... No Probing for `ITE IT8712F'... No Next adapter: i915 gmbus dpc (i2c-3) Do you want to scan it? (yes/NO/selectively): yes Next adapter: i915 gmbus dpb (i2c-4) Do you want to scan it? (yes/NO/selectively): yes Next adapter: i915 gmbus dpd (i2c-5) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x90 (i2c-6) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x91 (i2c-7) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x92 (i2c-8) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x93 (i2c-9) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x94 (i2c-10) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x95 (i2c-11) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x96 (i2c-12) Do you want to scan it? (yes/NO/selectively): yes Next adapter: Radeon i2c bit bus 0x97 (i2c-13) Do you want to scan it? (yes/NO/selectively): yes Next adapter: DPDDC-C (i2c-14) Do you want to scan it? (yes/NO/selectively): yes Next adapter: SMBus I801 adapter at e040 (i2c-15) Do you want to scan it? (YES/no/selectively): yes Client found at address 0x50 Probing for `Analog Devices ADM1033'... No Probing for `Analog Devices ADM1034'... No Probing for `SPD EEPROM'... Yes (confidence 8, not a hardware monitoring chip) Probing for `EDID EEPROM'... No Client found at address 0x52 Probing for `Analog Devices ADM1033'... No Probing for `Analog Devices ADM1034'... No Probing for `SPD EEPROM'... Yes (confidence 8, not a hardware monitoring chip) Now follows a summary of the probes I have just done. Just press ENTER to continue: Driver `coretemp': * Chip `Intel digital thermal sensor' (confidence: 9) Do you want to overwrite /etc/sysconfig/lm_sensors? (YES/no): YES Unloading i2c-dev... OK $ sudo cat /etc/sysconfig/lm_sensors # Generated by sensors-detect on Sat Apr 11 22:14:25 2020 # This file is sourced by /etc/init.d/lm_sensors and defines the modules to # be loaded/unloaded. # # The format of this file is a shell script that simply defines variables: # HWMON_MODULES for hardware monitoring driver modules, and optionally # BUS_MODULES for any required bus driver module (for example for I2C or SPI). HWMON_MODULES="coretemp" $ lspci | grep SATA 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Mobile SATA AHCI Controller (rev 04) >There is a separate program for monitoring HDD temperatures, and it uses SMART commands: $ rpm -qi hddtemp package hddtemp is not installed I am more worried about the temperature of the CPU than the SSD. >If you run "sensors" without any arguments, it should report some sensors data, depending on what kernel modules are loaded. $ sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +54.0°C (high = +86.0°C, crit = +100.0°C) Core 0: +54.0°C (high = +86.0°C, crit = +100.0°C) Core 1: +52.0°C (high = +86.0°C, crit = +100.0°C) BAT0-acpi-0 Adapter: ACPI interface in0: N/A radeon-pci-0100 Adapter: PCI adapter temp1: N/A (crit = +120.0°C, hyst = +90.0°C) acpitz-acpi-0 Adapter: ACPI interface temp1: +54.0°C (crit = +96.0°C) temp2: +54.0°C (crit = +96.0°C) >Sometimes there are vendor-specific kernel modules, but they don't always work too well. $ find /lib/modules/`uname -r`/ -name \*sony\* /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/hid/hid-sony.ko.xz <- support for the Sony PS3 BD Remote Control /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/media/rc/ir-sony-decoder.ko.xz <- IR port and decoder for Sony IR Pulse/Space protocol /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/media/i2c/sony-btf-mpx.ko.xz <- support for the internal mux of the Sony BTF-PG472Z video tuner /lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/platform/x86/sony-laptop.ko.xz I got this laptop on short notice in 2012 after my previous laptop (a 2008 Lenovo ThinkPad T61p) burned up building gcc while I was traveling. It had reasonable specs for the time and a good keyboard, and the shop let me test it with a Fedora live CD before buying it. It has some multimedia options that I have never used, including an IR port. >Some of them might already be loaded $ lsmod | grep sony sony_laptop 65536 0 rfkill 28672 7 bluetooth,cfg80211,sony_laptop video 53248 2 i915,sony_laptop
> Created attachment 1678188 [details] > journalctl logs Thanks. "xarchiver" makes it very easy to open that (dmesg-12apr20.tar.bz2) from Bugzilla: $ rpm -q xarchiver xarchiver-0.5.4.14-1.fc30.x86_64 > The patches didn't seem to help. That's disappointing, but you did a very nice job collecting those logs. > I attached the results of journalctl --no-hostname -k | grep -v audit | grep -C1 -n 'usb 1-1.3' > dmesg-usb-dyndbg-`uname -r`-`date '+%Y%m%d-%H%M%S'`.txt Great idea to put a timestamp in the file names. > I saved but did not attach the entire journalctl log without the grep, so I can do other searches without rebooting. I wasn't sure if the full log had anything that needed to be redacted. Thanks for saving the entire logs. Attaching them probably won't be necessary, but the "Code:" line in three of the attached snippets looks very peculiar. The line is the same in all three cases: $ grep -h -n 'Code:' *.txt # The "-h" option suppresses the file names, so that the lines all align for easy comparison. Could you look at the preceding log entries to see where they are coming from: $ grep -B10 -n 'Code:' *.txt "10" is a complete guess. Please adjust as you see fit. > The bad logs are all similar to each other, and the good logs are all similar to each other. Thanks for the annotated list: dmesg-usb-nodyndbg-5.4.10.localversion10-00164-g093d658a0-dirty-old-good-after-patch-20200412-000716.txt <- final good kernel from the bisection + the patch dmesg-usb-dyndbg-5.4.10.localversion9-00164-g093d658a0-old-good-before-patch-20200412-001250.txt <- final good kernel from the bisection, no patch dmesg-usb-dyndbg-5.5.15-200.fc31.x86_64-bad-nopatch-20200412-003043.txt <- Fedora 5.5.15 stable kernel built by Fedora, bad dmesg-usb-dyndbg-5.4.10.localversion8-00165-g7cbdf96cd-old-bad-nopatch-20200412-003754.txt <- the kernel with the 'usb: missing parentheses in USE_NEW_SCHEME' commit that broke the webcam dmesg-usb-dyndbg-5.4.11.localversion11-dirty-bad-patch-20200412-062831.txt <- 5.4.11 kernel built from 'git checkout v5.4.11' with the two usb patches. (I did not reapply the patches. 'git status' and 'git diff' show that git maintained them through the 'git bisect reset' and the checkout.) > I still have patched 5.4.11 kernel in my build area, and I can do checks on it if you want to be sure that I applied the patches correctly. > I suppose that I can also put in some debug code. I know very little about the USB protocol, but based in these lines, I would say that the driver is reading corrupt or unexpected data from the USB device: $ less dmesg-usb-dyndbg-5.5.15-200.fc31.x86_64-bad-nopatch-20200412-003043.txt ... 943:Apr 12 00:29:30 kernel: usbcore:usb_probe_device: usb 1-1.3: usb_probe_device 944:Apr 12 00:29:30 kernel: usbcore:usb_choose_configuration: usb 1-1.3: configuration #247 chosen from 1 choice 945:Apr 12 00:29:30 kernel: usb 1-1.3: can't set config #247, error -32 ... IMO, this should be taken upstream. The only further thing we could try is to enable dyndbg tracing from the kernel command-line, but I haven't figured out how to do that yet.
> IMO, this should be taken upstream. Here is a proposed bug summary: [BISECTED] usb 1-1.3: can't set config #247, error -32 Here is a list of kernel bugs under Component: USB, Product: Drivers: https://bugzilla.kernel.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=USB&order=bug_id%20DESC&product=Drivers&query_format=advanced
> ... I would say that the driver is reading corrupt or unexpected data from the USB device: $ less dmesg-usb-dyndbg-5.5.15-200.fc31.x86_64-bad-nopatch-20200412-003043.txt ... 943:Apr 12 00:29:30 kernel: usbcore:usb_probe_device: usb 1-1.3: usb_probe_device 944:Apr 12 00:29:30 kernel: usbcore:usb_choose_configuration: usb 1-1.3: configuration #247 chosen from 1 choice 945:Apr 12 00:29:30 kernel: usb 1-1.3: can't set config #247, error -32 ... Compare that with the "good" case (unpatched): $ less dmesg-usb-dyndbg-5.4.10.localversion9-00164-g093d658a0-old-good-before-patch-20200412-001250.txt ... 956:Apr 12 00:10:37 kernel: usbcore:usb_probe_device: usb 1-1.3: usb_probe_device 957:Apr 12 00:10:37 kernel: usbcore:usb_choose_configuration: usb 1-1.3: configuration #1 chosen from 1 choice 958:Apr 12 00:10:37 kernel: usbcore:usb_set_configuration: usb 1-1.3: adding 1-1.3:1.0 (config #1, interface 0) 959:Apr 12 00:10:37 kernel: usbcore:usb_set_configuration: usb 1-1.3: adding 1-1.3:1.1 (config #1, interface 1) ... BTW, "#247" could have significance: $ python -c 'print hex(247), bin(247)' 0xf7 0b11110111
> I suppose that I can also put in some debug code. Feel free. It's open source, so you can modify it any way you want. :-) Unfortunately, the USB 2.0 specification doesn't appear to be "free" -- I can't see a link to it here: USB 2.0 Specification https://usb.org/document-library/usb-20-specification
I extracted the "lsusb -v" output* for the Ricoh camera and found this warning: $ grep -C7 -n short lsusb-v-5.4.10-200-good-Ricoh-only.txt 414- bmControls 0x00040004 415- Auto-Exposure Priority 416- Privacy 417- VideoControl Interface Descriptor: 418- bLength 11 419- bDescriptorType 36 420- bDescriptorSubtype 5 (PROCESSING_UNIT) 421: Warning: Descriptor too short <<<<<<<<<<<<<<<<<<<<<<<<<<<< 422- bUnitID 2 423- bSourceID 1 424- wMaxMultiplier 0 425- bControlSize 2 426- bmControls 0x0000177f 427- Brightness 428- Contrast * lsusb-v-5.4.10-200-good.txt lsusb-v-5.4.10-200-good.txt Attachment 1676176 [details]
Created attachment 1678267 [details] lsusb-v-5.4.10-200-good-Ricoh-only.txt ("lsusb -v" output for the Ricoh USB camera) This is the "lsusb -v" output for the Ricoh USB camera extracted from: lsusb-v-5.4.10-200-good.txt dmesg-lsusb.tar.bz2 Attachment 1676176 [details] $ wc -l lsusb-v-5.4.10-200-good-Ricoh-only.txt 427 lsusb-v-5.4.10-200-good-Ricoh-only.txt $ grep -A10 -n 'Configuration Descriptor' lsusb-v-5.4.10-200-good-Ricoh-only.txt 17: Configuration Descriptor: 18- bLength 9 19- bDescriptorType 2 20- wTotalLength 0x0265 21- bNumInterfaces 2 22- bConfigurationValue 1 <<<< Bogus "#247" value is from here.* 23- iConfiguration 0 24- bmAttributes 0x80 25- (Bus Powered) 26- MaxPower 200mA 27- Interface Association: * per usb_choose_configuration() in drivers/usb/core/generic.c:
Created attachment 1678293 [details] grep -B15 -n 'Code:' dmesg-all*.txt >"#247" could have significance: When I first had the error, I looked for 247 and F7 in the kernel bugzilla. I was thinking that the 247 came from reading into another data structure because it comes up every time. $ grep -h 'Code:' dmesg-all*.txt Apr 12 00:36:41 kernel: Code: 1f 80 00 00 00 00 e8 9b c2 ff ff 48 8d bd 38 ff ff ff be 3d 00 00 00 48 89 85 28 ff ff ff 48 89 85 38 ff ff ff e8 2c f6 ff ff <80> 38 69 0f 85 2b 02 00 00 80 78 01 70 0f 85 21 02 00 00 0f b6 58 Apr 12 00:10:37 kernel: Code: 1f 80 00 00 00 00 e8 9b c2 ff ff 48 8d bd 38 ff ff ff be 3d 00 00 00 48 89 85 28 ff ff ff 48 89 85 38 ff ff ff e8 2c f6 ff ff <80> 38 69 0f 85 2b 02 00 00 80 78 01 70 0f 85 21 02 00 00 0f b6 58 Apr 12 06:27:09 kernel: Code: 1f 80 00 00 00 00 e8 9b c2 ff ff 48 8d bd 38 ff ff ff be 3d 00 00 00 48 89 85 28 ff ff ff 48 89 85 38 ff ff ff e8 2c f6 ff ff <80> 38 69 0f 85 2b 02 00 00 80 78 01 70 0f 85 21 02 00 00 0f b6 58 Apr 12 00:29:30 kernel: Code: 1f 80 00 00 00 00 e8 9b c2 ff ff 48 8d bd 38 ff ff ff be 3d 00 00 00 48 89 85 28 ff ff ff 48 89 85 38 ff ff ff e8 2c f6 ff ff <80> 38 69 0f 85 2b 02 00 00 80 78 01 70 0f 85 21 02 00 00 0f b6 58 The attachment is the result of grep -B15 -n 'Code:' dmesg-all*.txt
(In reply to William Bader from comment #137) > Created attachment 1678293 [details] > grep -B15 -n 'Code:' dmesg-all*.txt > > >"#247" could have significance: > > When I first had the error, I looked for 247 and F7 in the kernel bugzilla. I was thinking that the 247 came from reading into another data structure because it comes up every time. Good idea to search for it. Could you explain what you mean by "reading into another data structure"? > $ grep -h 'Code:' dmesg-all*.txt ... > The attachment is the result of grep -B15 -n 'Code:' dmesg-all*.txt Thanks. Your system is overheating: ... kernel: mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) ("mce" probably means "machine check event" -- "man mcelog".) And nm-initrd-generator is crashing: ... kernel: nm-initrd-gener[353]: segfault at 0 ip 000055f1d5068d24 sp 00007ffe3bab86c0 error 4 in nm-initrd-generator[55f1d5063000+64000] journalctl can pick out errors and display them in bright red: $ journalctl --no-hostname -b -p err # The journalctl man page lists other priorities for the "-p" option. ABRT might have saved some crash dumps ("Problem Reporting").
> dmesg-usb-dyndbg-5.4.11.localversion11-dirty-bad-patch-20200412-062831.txt <- 5.4.11 kernel built from 'git checkout v5.4.11' with the two usb patches. (I did not reapply the patches. 'git status' and 'git diff' show that git maintained them through the 'git bisect reset' and the checkout.) Thanks for pointing that out. I may have misunderstood how "git bisect reset" works, because it left me with a detached HEAD: $ git branch * (no branch, bisect started on 9d61432ef) linux-5.4.y $ git bisect reset Previous HEAD position was 97d9e8620 bnx2x: Do not handle requests from VFs after parity HEAD is now at 9d61432ef Linux 5.4.11 $ git branch * (HEAD detached at 9d61432ef) linux-5.4.y So to get to the main branch: $ git checkout linux-5.4.y Previous HEAD position was 9d61432ef Linux 5.4.11 Switched to branch 'linux-5.4.y' Your branch is up to date with 'origin/linux-5.4.y'. $ git branch * linux-5.4.y And then checkout, apply the patches, and attempt to checkout a different commit: $ git checkout v5.4.11 Note: checking out 'v5.4.11'. You are in 'detached HEAD' state. ... (This has a longer informational note.) $ git apply 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch $ git apply 0002-Fix-a-typo-in-prevoius-patch.patch Git lets you checkout with only an informational note: $ git checkout v5.4.10 M drivers/usb/host/xhci-ring.c M drivers/usb/host/xhci.c Previous HEAD position was 9d61432ef Linux 5.4.11 HEAD is now at 7a02c1932 Linux 5.4.10 $ git branch * (HEAD detached at v5.4.10) linux-5.4.y So it seems like there is some potential for mistakes when mixing checkouts and patches.
(In reply to Steve from comment #138) ... > And nm-initrd-generator is crashing: ... The man page says: "nm-initrd-generator scans the command line for options relevant to network configuration ..." It's not clear what "command line" that refers to, but if it were the _kernel_ command line, that would mean that nm-initrd-generator can't handle the dyndbg options on the _kernel_ command line. Later, the man page actually does refer to "the kernel command line". Now I have to check my logs ... :-)
(In reply to Steve from comment #138) > And nm-initrd-generator is crashing: Bug 1823217 - nm-initrd-gener[333]: segfault at 0 ...
>Could you explain what you mean by "reading into another data structure"? I was guessing that something passed a bad length because one of the first logs I posted had the lines below. "Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 has too many interfaces: 120, using maximum allowed: 32" "Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 descriptor has 1 excess byte, ignoring" >Your system is overheating I know. The cpu work from shutting down and then booting can make it overheat. It happened when it was new. That is why I have avoided doing kernel builds on it. >nm-initrd-generator is crashing I've given up trying to debug NetworkManager. >https://bugzilla.redhat.com/show_bug.cgi?id=1823217 Thanks for researching it. Does that mean that there is a bug in the nm debug code? Normal user mode programs shouldn't see any difference when dyndbg is enabled, right? I was wondering if dyndbg could make the usb code run slightly slower, which might possibly fix errors due to timing issues. >So it seems like there is some potential for mistakes when mixing checkouts and patches. I had expect that git checkout would overwrite all of the changed files and drop the patches. I was surprised when it kept the patches. Maybe 'git apply' does more than just running 'patch'. Is it worth starting fresh with 5.5 and applying the two usb patches?
>>Could you explain what you mean by "reading into another data structure"? >I was guessing that something passed a bad length because one of the first logs I posted had the lines below. >"Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 has too many interfaces: 120, using maximum allowed: 32" >"Mar 30 15:00:58 scslaptop37 kernel: usb 1-1.3: config 247 descriptor has 1 excess byte, ignoring" Thanks. Now I see. As can be seen in the "lsusb -v" output, the USB data structures use numerous lengths and counts. If any one of those is corrupt, everything else will be corrupt too. And that makes me wonder whether USB has any built-in support for integrity checking, such as checksums or hashes. That bogus "#247" value would never be interpreted if the integrity check failed. >>Your system is overheating >I know. The cpu work from shutting down and then booting can make it overheat. It happened when it was new. That is why I have avoided doing kernel builds on it. I can see why. >>nm-initrd-generator is crashing >I've given up trying to debug NetworkManager. >>https://bugzilla.redhat.com/show_bug.cgi?id=1823217 >Thanks for researching it. Does that mean that there is a bug in the nm debug code? Possibly. But the allowed kernel command-line syntax could be inadequately specified, so dyndbg could be using what seems to be legal syntax that "nm" isn't expecting. That would indicate an interface specification bug. >Normal user mode programs shouldn't see any difference when dyndbg is enabled, right? Presumably. >I was wondering if dyndbg could make the usb code run slightly slower, which might possibly fix errors due to timing issues. I was wondering about that too. >>So it seems like there is some potential for mistakes when mixing checkouts and patches. >I had expect that git checkout would overwrite all of the changed files and drop the patches. I was surprised when it kept the patches. Maybe 'git apply' does more than just running 'patch'. git tries not to trash any working files. While experimenting, I found that git, in some cases, refuses to checkout files that would overwrite working files (and git provides an informative error message saying so). >Is it worth starting fresh with 5.5 and applying the two usb patches? No. We need to turn this over to the USB experts upstream. :-) If they have a patch, we are now perfectly positioned to test it.
>HWMON_MODULES="coretemp" I ran "sensors-detect" on my laptop and, despite answering "yes" for every probe, got exactly the same result. Idle temps are 42C to 48C. "sensors" shows a "cpu_fan" that always seems to be at 2000 RPM or 2100 RPM, so I don't know if that speed is a credible value. That section of the output is headed "asus-isa-0000", so believe it is coming from an asus-specific kernel module.
>While experimenting, I found that git, in some cases, refuses to checkout files that would overwrite working files (and git provides an informative error message saying so). Here is an example: $ git branch * linux-5.4.y $ git apply 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol-1.patch $ git apply 0002-Fix-a-typo-in-prevoius-patch.patch $ git checkout v5.4.11 error: Your local changes to the following files would be overwritten by checkout: drivers/usb/host/xhci-ring.c Please commit your changes or stash them before you switch branches. Aborting $ git stash --help ... "Use git stash when you want to record the current state of the working directory and the index, but want to go back to a clean working directory."
I submitted a report at https://bugzilla.kernel.org/show_bug.cgi?id=207219
(In reply to William Bader from comment #146) > I submitted a report at https://bugzilla.kernel.org/show_bug.cgi?id=207219 Thanks! Including the log entries for the "good" kernel was a good idea. You can also include a link to it at the top of the Fedora BZ page in the "Links" section. The pull-down menu has a "Linux Kernel" item that looks like the relevant choice. Interestingly, the "Links" menu is not visible in this bug report, even when logged in, so I had to look at one of my own bug reports to see the menu.
Replying to William in Comment 72: >>Could you go into more detail about that [VM configuration]? >The office is on coronavirus lockdown. I am on lockdown also, stranded far from the office. Thank goodness for the internet. >I showed this bug report to the person who manages the VMs, and he connected remotely and created a Fedora 31 VM with 8GB RAM, 44GB disk (11GB currently used by the OS and the kernel build), and one virtual cpu that shows as an Intel Skylake Processor. Wow! All you had to do is ask, and you got a "desktop" VM. :-) >Ten years ago, we had a computer room full of headless desktops and towers. We got a single big server and migrated everything to VMs on the big server. That's a lot more convenient to administer, but it seems like the old-school redundancy has some advantages. >We do daily backups of the important VMs, but it would still be a pain to lose one, so only a few people have access, and I am not on that list. OK. >It is better that way, so I don't get the blame if something breaks. That's why I try not to release shell scripts that must be run as root. :-) (Comment 64) >I don't know what tool he uses or how he installed Fedora. OK. I was mainly interested how it is done on "big iron". >He installed gcc, but I had to install make, flex, bison, and a few libraries. >Before I did an in-place update from Fedora 30 to 31 on my laptop, he made a Fedora 30 VM and tested the update, and I think that he has already set up a Fedora 31 VM for another project that needed an OS more recent than CentOS 7. Testing the F30->F31 system upgrade in a VM was a really good idea. In another bug report, the reporter had a boot failure with a kernel panic after a system upgrade. (Bug 1815102) >I have some VMs on my laptop under VirtualBox, but my laptop supports only 8GB RAM, so I can't do much in the VMs. My desktop system has 8GB RAM, but I usually run only one VM at a time. The few times I ran more than one, I got completely confused about which was which. :-) >I suppose that since the webcam is a hardware issue, if it doesn't work on the OS on the bare metal, it won't work inside a VM. VMs are great for testing, but testing on bare metal is sometimes the only option. Sometimes even seemingly similar hardware doesn't behave the same way. (Bug 1814810, Comment 14)
>You can also include a link to it at the top of the Fedora BZ page in the "Links" section. Thanks, I added the link. The kernel bugzilla doesn't have anyone entered in the mailing list. Are there any chances that someone will look into it? I could probably submit a patch for the call to the USE_NEW_SCHEME macro, but I suspect that it would not be accepted. I saw something about quirks. Is there a way to tell the kernel that webcams with USB 05ca:18c0 need special initialization? >Testing the F30->F31 system upgrade in a VM was a really good idea. In another bug report, the reporter had a boot failure with a kernel panic after a system upgrade. Our hardware person has another rule that we don't update any operating systems (other than expendable test VMs) until a month after a release so the serious problems are shaken out. Before doing an in-place Fedora upgrade, I make a backup of my laptop, and then I boot from the Live CD and check that hardware like the wifi, speakers and microphone work.
(In reply to William Bader from comment #149) > >You can also include a link to it at the top of the Fedora BZ page in the "Links" section. > > Thanks, I added the link. WFM. > The kernel bugzilla doesn't have anyone entered in the mailing list. Are there any chances that someone will look into it? If you are referring to the assignee, I believe that "Default virtual assignee for Drivers/USB" means bugs are posted to a mailing list. Anyway, having "[BISECTED]" in the bug summary should attract attention because it means that: 1. Maintainers won't have to pull teeth to get essential information. 2. You know what you are doing. :-) > I could probably submit a patch for the call to the USE_NEW_SCHEME macro, but I suspect that it would not be accepted. The USB specification is notoriously complicated, so I would suggest letting the USB maintainers figure out what to do. > I saw something about quirks. Is there a way to tell the kernel that webcams with USB 05ca:18c0 need special initialization? Again -- leave it to the maintainers. > >Testing the F30->F31 system upgrade in a VM was a really good idea. In another bug report, the reporter had a boot failure with a kernel panic after a system upgrade. > > Our hardware person has another rule that we don't update any operating systems (other than expendable test VMs) until a month after a release so the serious problems are shaken out. That's a good rule. And that is why my prime system is F30, not F31. But that didn't save me from a system lockup: Bug 1806747. > Before doing an in-place Fedora upgrade, I make a backup of my laptop, and then I boot from the Live CD and check that hardware like the wifi, speakers and microphone work. That's a very good idea.
(In reply to William Bader from comment #149) ... > The kernel bugzilla doesn't have anyone entered in the mailing list. Are there any chances that someone will look into it? ... Many Linux patches are posted on Patchwork. You can draw your own conclusions after looking at this list: Linux USB - Patchwork https://patchwork.kernel.org/project/linux-usb/list/
Replying to William in Comment 15: "... I have never been successful getting my laptop to boot from a pen drive." That could depend on the pen drive. I had one low-end, name-brand pen drive that was unusable with Linux. Now, for boot drives, I try to get drives for which the manufacturer reports read and write speeds. Based on such specs, the drives with larger capacities seem to be faster too. This is my current "premium" USB flash drive: Kingston 64GB DataTraveler Elite G2 Black Metal Casing Fast 180MB/s R, 70MB/W USB 3.1 Flash Drive with LED light indicator (DTEG2/64GB) I have also had good luck with these lower-end drives: SanDisk 16GB 2.0 Flash Cruzer Glide USB Drive SanDisk 32GB Ultra Fit USB 3.1 Flash Drive (This is small, so it can be left always plugged into a laptop. I use it as grub2 boot drive for my laptop: "man grub2-mkrescue".)
If you just want a working camera, you could "git revert" the "bad" commit. For this test, I already had "v5.4.11" checked out. Since "git revert" creates new commits, start with a new branch: $ git branch test-1 $ git checkout test-1 $ git branch linux-5.4.y * test-1 Verify that HEAD is at v5.4.11 in our new branch: $ git log --oneline -n1 HEAD 9d61432ef (HEAD -> test-1, tag: v5.4.11) Linux 5.4.11 Verify that we have the right commit for the revert: $ git log --oneline -n1 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 7cbdf96cd usb: missing parentheses in USE_NEW_SCHEME This will throw you into an editor. Just write and quit: $ git revert 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430 [test-1 a23bddd0e] Revert "usb: missing parentheses in USE_NEW_SCHEME" 1 file changed, 1 insertion(+), 1 deletion(-) Now we should have a "working" kernel: $ git log --oneline -n2 HEAD a23bddd0e (HEAD -> test-1) Revert "usb: missing parentheses in USE_NEW_SCHEME" 9d61432ef (tag: v5.4.11) Linux 5.4.11 $ git show -q HEAD commit a23bddd0e1e15f92d17ae16e787d768ac8e7b029 (HEAD -> test-1) Author: No Name <noemail> Date: Mon Apr 13 00:42:12 2020 -0700 Revert "usb: missing parentheses in USE_NEW_SCHEME" This reverts commit 7cbdf96cda1fbffb17ec26ea65e1fe63c9aed430.
(In reply to Steve from comment #153) ... > Since "git revert" creates new commits, start with a new branch: ... The "-n" option lets you undo the "bad" commit without creating a new commit -- it just modifies the "working tree and the index". That is similar to what "git apply" does with patches, except that "git apply" lets you control what combination of changes to files and to the index are made.
BTW, if you want to know more about USB at a technical level, this is a good book: "USB Complete: The Developer's Guide", Fifth Edition, by Jan Axelson (2015). (available at a well-known online seller) And, based on the online index, USB supports some "error-checking". (re my Comment 143) But even the error checking can have bugs: USB: core: fix check for duplicate endpoints https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e4f8e21c4f27bcf30a48486b9dcc269512b79ff USB: fix problems with duplicate endpoint addresses https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a8fd1346254974c3a852338508e4a4cddbb35f1
Thanks for the additional information. The kernel people think that the problem was an earlier commit that caused an issue only after the commit that you helped me find by bisection. They asked me to build a bad kernel without that commit. https://bugzilla.kernel.org/show_bug.cgi?id=207219#c4 I used the procedure that you suggested yesterday in https://bugzilla.redhat.com/show_bug.cgi?id=1818952#c153 For now, when I need the webcam, I am booting from the distributed Fedora 5.4.10-200.fc31.x86_64 kernel, so my main goal is coming up with a solution that eventually gets into the official Fedora kernels so I don't have to keep rebooting or building kernels. Thanks for the pen drive information. I'll try it after the virus thing.
(In reply to William Bader from comment #156) > Thanks for the additional information. > The kernel people think that the problem was an earlier commit that caused an issue only after the commit that you helped me find by bisection. They asked me to build a bad kernel without that commit. > https://bugzilla.kernel.org/show_bug.cgi?id=207219#c4 > I used the procedure that you suggested yesterday in > https://bugzilla.redhat.com/show_bug.cgi?id=1818952#c153 I was quite surprised that you were asked to test a revert after having posted Comment 153. BTW, I noticed that you didn't create a shallow clone: $ git clone --branch linux-5.4.y https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4 As you must have discovered, commit bd0e6c9614b9 isn't in a repo with a commit history in the range from v5.3.y to v5.4.y. $ git describe bd0e6c9614b9 fatal: Not a valid object name bd0e6c9614b9 The earliest tag in the shallow clone is: $ fig-tags.sh | tail -1 24cb7d728 2019-09-30 10:35:53 -0700 v5.4-rc1 The commit is dated 2018-10-02, which is almost a year earlier: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=bd0e6c9614b95352eb31d0207df16dc156c527fa For the record, I use shallow clones to reduce download time and disk usage, but they are insufficient for some problems. Git will sometimes show a commit as "grafted", which is displayed when there is no history before a particular commit. It took me a long time to realize that when git shows a commit as "grafted", it is telling me that I shouldn't have been so cheap and should have downloaded the whole history instead. :-) > For now, when I need the webcam, I am booting from the distributed Fedora 5.4.10-200.fc31.x86_64 kernel, so my main goal is coming up with a solution that eventually gets into the official Fedora kernels so I don't have to keep rebooting or building kernels. Agreed, although booting 5.4.10 sounds like a better workaround than using the F31 Live image because it is faster and you don't have to install apps into a live environment. > Thanks for the pen drive information. I'll try it after the virus thing. You're welcome and good health.
(In reply to William Bader from comment #156) > I used the procedure that you suggested yesterday in https://bugzilla.redhat.com/show_bug.cgi?id=1818952#c153 BZ tip: You can type in the literal string, Comment 153, and BZ will automatically create a link. Test with the "Preview" tab. 2.7.1. Autolinkification https://bugzilla.redhat.com/docs/en/html/using/tips.html#autolinkification
There may be another workaround that doesn't require rebooting. This kernel parameter can be set at runtime: /sys/module/usbcore/parameters/old_scheme_first To test, use two terminal windows: In the first, run this to monitor what happens: $ journalctl --no-hostname -k -f In the second, run: # cat /sys/module/usbcore/parameters/old_scheme_first # Verify.* N # echo 1 > /sys/module/usbcore/parameters/old_scheme_first # Change. # cat /sys/module/usbcore/parameters/old_scheme_first # Verify. Y Next, use Alan's method** to restart the controller: # echo 0 >/sys/bus/usb/devices/1-1/bConfigurationValue # echo 1 >/sys/bus/usb/devices/1-1/bConfigurationValue And then see if your video devices are present: # ls /dev/video* NB: "old_scheme_first" is global, so changing it could cause something to break. However, it is not a permanent change, so rebooting should restore it to its original value. The relevant kernel source code is in drivers/usb/core/hub.c. Disclaimer: I tested the setting and restarting in a VM, but that may not reflect what happens on bare metal. * The "cat" and "ls" commands don't need to be run as root, it is just more convenient that way. ** https://bugzilla.kernel.org/show_bug.cgi?id=207219#c1
>As you must have discovered, commit bd0e6c9614b9 isn't in a repo with a commit history in the range from v5.3.y to v5.4.y. Yes, I saw that it wasn't there, and I wasn't sure how to find it, and they might have asked me to revert other commits even further back, so I cloned everything. >booting 5.4.10 sounds like a better workaround I have the kernel from koji installed in /boot, so I don't need the Live CD. My laptop has an SSD, and I have very few services running, so shutdown and reboot is around a minute. The biggest pain is that I have 20 workspaces open under Mate Desktop, so I have a lot of applications to close and then reopen. >BZ will automatically create a link. Thanks, I'll try that next time. >Then see if your video devices are present: It didn't work. Maybe the webcam is permanently messed up after it boots. It would have been nice if it had worked. $ uname -r 5.5.15-200.fc31.x86_64 # cat /sys/module/usbcore/parameters/old_scheme_first N # echo 1 > /sys/module/usbcore/parameters/old_scheme_first # cat /sys/module/usbcore/parameters/old_scheme_first Y # echo 0 >/sys/bus/usb/devices/1-1/bConfigurationValue # echo 1 >/sys/bus/usb/devices/1-1/bConfigurationValue # ls /dev/video* ls: cannot access '/dev/video*': No such file or directory $ journalctl --no-hostname -k -f -- Logs begin at Sat 2019-07-27 03:29:49 WEST. -- ... Apr 15 05:22:58 kernel: usb 1-1.2: Product: Bluetooth USB Host Controller Apr 15 05:22:58 kernel: usb 1-1.2: Manufacturer: Atheros Communications Apr 15 05:22:58 kernel: usb 1-1.2: SerialNumber: Alaska Day 2006 Apr 15 05:22:58 kernel: usb 1-1.3: new full-speed USB device number 7 using ehci-pci Apr 15 05:22:59 kernel: usb 1-1.3: device not accepting address 7, error -32 Apr 15 05:22:59 kernel: usb 1-1.3: new full-speed USB device number 8 using ehci-pci Apr 15 05:22:59 kernel: usb 1-1.3: device not accepting address 8, error -32 Apr 15 05:22:59 kernel: usb 1-1-port3: attempt power cycle Apr 15 05:23:00 kernel: usb 1-1.3: new full-speed USB device number 9 using ehci-pci Apr 15 05:23:00 kernel: usb 1-1.3: device descriptor read/64, error -32 Apr 15 05:23:00 kernel: usb 1-1.3: device descriptor read/64, error -32 Apr 15 05:23:00 kernel: usb 1-1.3: new full-speed USB device number 10 using ehci-pci Apr 15 05:23:00 kernel: usb 1-1.3: device descriptor read/64, error -32 Apr 15 05:23:00 kernel: usb 1-1.3: device descriptor read/64, error -32 Apr 15 05:23:00 kernel: usb 1-1-port3: unable to enumerate USB device
> It didn't work. Maybe the webcam is permanently messed up after it boots. Thanks for testing that. Here is another variant: Append "usbcore.old_scheme_first=1" to the kernel command-line (without the quotes) from grub2. Press "ctrl-x" to boot and login. Verify: $ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.16-200.fc31.x86_64 root=UUID=c5f30768-dfaf-4dc8-bb85-9b8f088fb16e ro usbcore.old_scheme_first=1 $ cat /sys/module/usbcore/parameters/old_scheme_first Y Test: $ ls /dev/video* I tested the command-line setting in a VM with 5.5.16-200.fc31.x86_64.
Here is bit more about finding module parameters. Module directories are listed here: $ ls /sys/module/ | wc -l 181 Most of them have a "parameters" subdirectory: $ ls -d /sys/module/*/parameters | wc -l 113 Even when "modinfo" won't show anything: $ modinfo usbcore modinfo: ERROR: Module usbcore not found. Module parameters can still be listed: $ grep '' /sys/module/usbcore/parameters/* /sys/module/usbcore/parameters/authorized_default:-1 /sys/module/usbcore/parameters/autosuspend:2 /sys/module/usbcore/parameters/blinkenlights:N /sys/module/usbcore/parameters/initial_descriptor_timeout:5000 /sys/module/usbcore/parameters/nousb:N /sys/module/usbcore/parameters/old_scheme_first:N /sys/module/usbcore/parameters/quirks: /sys/module/usbcore/parameters/usbfs_memory_mb:16 /sys/module/usbcore/parameters/usbfs_snoop:N /sys/module/usbcore/parameters/usbfs_snoop_max:65536 /sys/module/usbcore/parameters/use_both_schemes:Y > ... I have 20 workspaces open under Mate Desktop, so I have a lot of applications to close and then reopen [after rebooting with a kernel that supports the webcam]. MATE has an option to "Automatically remember running applications when logging out". (Under System:Preferences:Personal) However, it is somewhat inconsistent about what is and what is not restored. And the behavior seems to change between the first restart and later restarts. Tested in an F31 MATE VM purpose-built for testing that feature.
It's not just your camera. I have a SanDisk USB flash drive installed as a grub2 boot device on my laptop, so it is present at boot-time. With the default "old_scheme_first" ("N"), there is an error: $ cat usb-sandisk-new-scheme-first-1.txt Apr 14 21:52:19 kernel: Command line: BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro Apr 14 21:52:19 kernel: usb 2-1: new high-speed USB device number 2 using ehci-pci Apr 14 21:52:19 kernel: usb 2-1: device not accepting address 2, error -71 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Apr 14 21:52:19 kernel: usb 2-1: new high-speed USB device number 3 using ehci-pci Apr 14 21:52:19 kernel: usb 2-1: New USB device found, idVendor=8087, idProduct=0024, bcdDevice= 0.00 Apr 14 21:52:19 kernel: usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Apr 14 21:52:19 kernel: usb 2-1.2: new high-speed USB device number 4 using ehci-pci Apr 14 21:52:19 kernel: usb 2-1.2: New USB device found, idVendor=0781, idProduct=5583, bcdDevice= 1.00 Apr 14 21:52:19 kernel: usb 2-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Apr 14 21:52:19 kernel: usb 2-1.2: Product: Ultra Fit Apr 14 21:52:19 kernel: usb 2-1.2: Manufacturer: SanDisk Apr 14 21:52:19 kernel: usb 2-1.2: SerialNumber: [removed] However, there is NO ERROR with "old_scheme_first" set to "Y" on the kernel command-line: $ cat usb-sandisk-old-scheme-first-1.txt Apr 14 21:59:06 kernel: Command line: BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro usbcore.old_scheme_first=1 Apr 14 21:59:06 kernel: usb 2-1: new high-speed USB device number 2 using ehci-pci Apr 14 21:59:06 kernel: usb 2-1: New USB device found, idVendor=8087, idProduct=0024, bcdDevice= 0.00 Apr 14 21:59:06 kernel: usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Apr 14 21:59:06 kernel: usb 2-1.2: new high-speed USB device number 3 using ehci-pci Apr 14 21:59:06 kernel: usb 2-1.2: New USB device found, idVendor=0781, idProduct=5583, bcdDevice= 1.00 Apr 14 21:59:06 kernel: usb 2-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Apr 14 21:59:06 kernel: usb 2-1.2: Product: Ultra Fit Apr 14 21:59:06 kernel: usb 2-1.2: Manufacturer: SanDisk Apr 14 21:59:06 kernel: usb 2-1.2: SerialNumber: [removed] Use this as a quick test after booting: $ journalctl --no-hostname -k -p err And to check a prior boot: $ journalctl --no-hostname -k -p err -b -1
(In reply to Steve from comment #163) I forgot to document my command-line for collecting those log entries: $ journalctl --no-hostname -k | egrep 'Command line|usb 2-1'
(In reply to William Bader from comment #28) > ... there is a second bug, possibly hardware, that doesn't reinitialize something during warm boots. I think I am seeing something like that with a consistent kernel and without special command-line options: ... Command line: BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro Test procedure: Configure a laptop with a SanDisk Ultra Fit USB flash drive as the grub2 boot device in a USB 2 port, and configure the BIOS to always boot from it. $ poweroff Power on. ... $ journalctl --no-hostname -k -p err Repeatedly run: $ reboot ... $ journalctl --no-hostname -k -p err Sometimes I get this error, and sometimes I don't: ... kernel: usb 2-1: device not accepting address 2, error -71 More often I do.
I get the same inconsistent behavior on warm reboots with "usbcore.old_scheme_first=1" on the kernel command-line: ... kernel: Command line: BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro usbcore.old_scheme_first=1 So I wonder if that option even works.
Even with cold boots I get inconsistent behavior with: ... Command line: BOOT_IMAGE=(hd1,msdos6)/vmlinuz-5.5.16-100.fc30.x86_64 root=/dev/mapper/[removed] ro Procedure: Repeatedly run: $ poweroff Power on. ... $ journalctl --no-hostname -k -p err The only explanation I can think of is that there is a race during USB initialization.
>Here is another variant: That didn't work. I don't understand why. I think that I set it correctly. I did a shutdown and reboot: [before reboot with usbcore.old_scheme_first=1] $ uname -r 5.5.15-200.fc31.x86_64 $ ls /sys/module/ | wc -l 199 $ ls -d /sys/module/*/parameters | wc -l 121 $ grep '' /sys/module/usbcore/parameters/* /sys/module/usbcore/parameters/authorized_default:-1 /sys/module/usbcore/parameters/autosuspend:2 /sys/module/usbcore/parameters/blinkenlights:N /sys/module/usbcore/parameters/initial_descriptor_timeout:5000 /sys/module/usbcore/parameters/nousb:N /sys/module/usbcore/parameters/old_scheme_first:N <- /sys/module/usbcore/parameters/quirks: /sys/module/usbcore/parameters/usbfs_memory_mb:16 /sys/module/usbcore/parameters/usbfs_snoop:N /sys/module/usbcore/parameters/usbfs_snoop_max:65536 /sys/module/usbcore/parameters/use_both_schemes:Y [after reboot] $ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.15-200.fc31.x86_64 root=UUID=01ea3428-c96d-4f4c-af30-2072ce724031 ro rhgb quiet elevator=noop usbcore.old_scheme_first=1 LANG=en_US.UTF-8 mitigations=off $ cat /sys/module/usbcore/parameters/old_scheme_first Y $ ls /dev/vid* ls: cannot access '/dev/vid*': No such file or directory $ grep '' /sys/module/usbcore/parameters/* /sys/module/usbcore/parameters/authorized_default:-1 /sys/module/usbcore/parameters/autosuspend:2 /sys/module/usbcore/parameters/blinkenlights:N /sys/module/usbcore/parameters/initial_descriptor_timeout:5000 /sys/module/usbcore/parameters/nousb:N /sys/module/usbcore/parameters/old_scheme_first:Y <- /sys/module/usbcore/parameters/quirks: /sys/module/usbcore/parameters/usbfs_memory_mb:16 /sys/module/usbcore/parameters/usbfs_snoop:N /sys/module/usbcore/parameters/usbfs_snoop_max:65536 /sys/module/usbcore/parameters/use_both_schemes:Y $ journalctl --no-hostname -k | egrep 'Command line|usb 1-1.3' Apr 15 15:16:39 kernel: Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.5.15-200.fc31.x86_64 root=UUID=01ea3428-c96d-4f4c-af30-2072ce724031 ro rhgb quiet elevator=noop usbcore.old_scheme_first=1 LANG=en_US.UTF-8 mitigations=off Apr 15 15:16:39 kernel: usb 1-1.3: new high-speed USB device number 4 using ehci-pci Apr 15 15:16:39 kernel: usb 1-1.3: config 247 has too many interfaces: 120, using maximum allowed: 32 Apr 15 15:16:39 kernel: usb 1-1.3: config 247 descriptor has 1 excess byte, ignoring Apr 15 15:16:39 kernel: usb 1-1.3: config 247 has 0 interfaces, different from the descriptor's value: 120 Apr 15 15:16:39 kernel: usb 1-1.3: New USB device found, idVendor=05ca, idProduct=18c0, bcdDevice= 7.32 Apr 15 15:16:39 kernel: usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Apr 15 15:16:39 kernel: usb 1-1.3: Product: USB2.0 Camera Apr 15 15:16:39 kernel: usb 1-1.3: Manufacturer: Ricoh Company Ltd. Apr 15 15:16:39 kernel: usb 1-1.3: can't set config #247, error -32 >MATE has an option to "Automatically remember running applications when logging out". (Under System:Preferences:Personal) I set System -> Preferences -> Startup Applications -> [Options tab] -> [x] Automatically remember running applications when logging out and then I left a bunch of xterms running in workspaces. When I clicked 'Shut Down...' on the menu and then the 'Shut Down' button in the dialog, it said that I had applications running. I said to shutdown anyway, and then it waited for a while and shutdown, and when it restarted, it didn't restart the xterms. Something I did once a long time ago, maybe testing out switching users, restarted the xterms but placed them all on workspace 1, so I've been cautious about that. I launch the xterms from an application launcher on the Mate panel that runs a shell script that runs xterm with a long list of flags.
>/sys/module/usbcore/parameters/old_scheme_first:Y <- Thanks for running that test. That all looks as expected. So, based on your tests and on my tests, setting old_scheme_first=1 has no effect. That's very disappointing. I guess we will have to wait for Alan to ask for another test ... >.... it said that I had applications running. I said to shutdown anyway, and then it waited for a while and shutdown, and when it restarted, it didn't restart the xterms. Try it a second time. When I first tried the feature, some apps did not restart, but on subsequent reboots they did restart. There may be an initialization bug in MATE. I also noticed that the behavior depends on the app and the state of the app (whether there are unsaved changes in Pluma, for example). >I launch the xterms from an application launcher on the Mate panel that runs a shell script that runs xterm with a long list of flags. Good idea. I do something like that to run various shell scripts, such as one that displays the output from "cal -3" in an xterm: You might also be able to create ".desktop" files. There are examples in ".config/mate-session/saved-session/".
For the record, Alan's revert* is in v5.7-rc3: Merge tag 'usb-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.7-rc3&id=e9a61afb69f07b1c5880984d45e5cc232ec1bf6f A Fedora build is estimated to be completed Mon, 27 Apr 2020 18:40:23 UTC: Information for build kernel-5.7.0-0.rc3.1.fc33 https://koji.fedoraproject.org/koji/buildinfo?buildID=1498816 * https://bugzilla.kernel.org/show_bug.cgi?id=207219#c9
Thanks. >Information for build kernel-5.7.0-0.rc3.1.fc33 Can I use a Fedora 33 kernel with Fedora 31? I booted with 5.6.8-200.fc31.x86_64 this morning, and it found the webcam. It works maybe 1 out of 10 times. >MATE has an option to "Automatically remember running applications when logging out". (Under System:Preferences:Personal) >However, it is somewhat inconsistent about what is and what is not restored. And the behavior seems to change between the first restart and later restarts. I left that set, but it does not seem to work for my xterm windows. Maybe it works only for applications that do some kind of Mate-specific registration through dbus.
(In reply to William Bader from comment #171) > Thanks. > > >Information for build kernel-5.7.0-0.rc3.1.fc33 > > Can I use a Fedora 33 kernel with Fedora 31? Yes. It's not any different than building your own kernel and running it. In fact, I have just completed a build of 4.19.119, which is a "longterm" kernel (https://www.kernel.org/). It runs fine, if you start with a Fedora config file. Starting with the kernel's default config file is very informative, but basically it becomes an exercise in debugging the config. With an early version, after suspending, my laptop would resume briefly and then suspend again. Worse, after I powered it off, it wouldn't power on again. I finally removed the battery and recovered. That's as close to bricking my laptop as I want to get. :-) The fix was to enable certain kernel features in the config, because certain system processes require that kernel functionality. > I booted with 5.6.8-200.fc31.x86_64 this morning, and it found the webcam. > It works maybe 1 out of 10 times. That's not too reliable. > >MATE has an option to "Automatically remember running applications when logging out". (Under System:Preferences:Personal) > >However, it is somewhat inconsistent about what is and what is not restored. And the behavior seems to change between the first restart and later restarts. > > I left that set, but it does not seem to work for my xterm windows. Maybe it works only for applications that do some kind of Mate-specific registration through dbus. The xterm package contains xterm.desktop, which is all that I thought was required. This might give some clues about what the problem is: Desktop Application Autostart Specification https://specifications.freedesktop.org/autostart-spec/autostart-spec-latest.html
I installed kernel-5.7.0-0.rc3.1.fc33 and the webcam worked. What is strange is that I have been using 5.6.8, and it has been showing the webcam. uname says Linux laptop37 5.6.8-200.fc31.x86_64 #1 SMP Wed Apr 29 19:10:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux which is after the Apr 27 date of the change to 5.7 but it doesn't look like it was backported into 5.6.8 https://koji.fedoraproject.org/koji/buildinfo?buildID=1499538 I eventually managed to produce a log that captures a bad initialization. https://bugzilla.kernel.org/show_bug.cgi?id=207219#c20
(In reply to William Bader from comment #173) > I installed kernel-5.7.0-0.rc3.1.fc33 and the webcam worked. Congratulations. That took a little over a month from the time you filed your bug report. > What is strange is that I have been using 5.6.8, and it has been showing the webcam. > uname says > Linux laptop37 5.6.8-200.fc31.x86_64 #1 SMP Wed Apr 29 19:10:01 UTC 2020 > x86_64 x86_64 x86_64 GNU/Linux > which is after the Apr 27 date of the change to 5.7 but it doesn't look like it was backported into 5.6.8 > https://koji.fedoraproject.org/koji/buildinfo?buildID=1499538 Alan's revert* is not in 5.6.8, so there must be something else affecting the way the camera is initialized: $ git describe 3155f4f40811c5d7e3c686215051acf504e05565 fatal: 3155f4f40811c5d7e3c686215051acf504e05565 is neither a commit nor blob $ git log --oneline --no-decorate -n1 63c3d49741 Linux 5.6.8 > I eventually managed to produce a log that captures a bad initialization. > https://bugzilla.kernel.org/show_bug.cgi?id=207219#c20 Thanks for that link. Alan's analysis is very interesting, and it shows the power of tracing as a debugging tool. However, I would bet that if we had access to the firmware, we could explain that seemingly random data being sent by the camera: "09027602 78f7e4ff 029e5f02 4675e490 b197f0a3 f07b017a b0790012 28ef7404" * USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3155f4f40811c5d7e3c686215051acf504e05565
I started browsing https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.6.8 and the revert is there: commit f8092b0e021762ab73656d6ec87a6c9e90aff4f4 Author: Alan Stern <stern.edu> Date: Wed Apr 22 16:13:08 2020 -0400 USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") commit 3155f4f40811c5d7e3c686215051acf504e05565 upstream. So that explains the mystery. >I would bet that if we had access to the firmware, we could explain that seemingly random data being sent by the camera: Is there any way to look at it? (Without risking bricking my laptop...) I remember in MSDOS days, you could do some interesting things reading and writing to some ports. The mate session issue looks like it has been happening for years: https://github.com/mate-desktop/mate-session-manager/issues/42 I wrote two of my own mate panel apps (a mail checker and a screen brightness controller (because for many versions of fedora the one that came with mate didn't work)). Working with mate is a pain because the source is divided between a lot of repositories that you have to build in the right order. For the last few versions of fedora, I've been able to rebuild my panel apps by installing mate-panel-devel, which is easier than before. Something I've been meaning to look at for years is changing GDM so it remembers that I always select mate. Every once in a while I forget to select it, and then when I close the login form, the screen blanks for longer than usual, and I worry that something went wrong, but it was only gnome starting instead of mate.
> I started browsing https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.6.8 and the revert is there: Thanks for checking that. > commit f8092b0e021762ab73656d6ec87a6c9e90aff4f4 ... > commit 3155f4f40811c5d7e3c686215051acf504e05565 upstream. The commit ID changed -- I wasn't expecting that. Now I'm not sure how to reliably check that a commit has been merged from mainline into stable. $ git log --oneline --no-decorate -n1 63c3d49741 Linux 5.6.8 $ git log --format=fuller --grep 'USB: hub: Revert commit bd0e6c9614b9' commit f8092b0e021762ab73656d6ec87a6c9e90aff4f4 Author: Alan Stern <...> AuthorDate: Wed Apr 22 16:13:08 2020 -0400 Commit: Greg Kroah-Hartman <...> CommitDate: Wed Apr 29 16:34:45 2020 +0200 USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") commit 3155f4f40811c5d7e3c686215051acf504e05565 upstream. ... Evidently, mainline is considered "upstream" from stable. This is going to require some mental adjustments. :-)
(In reply to William Bader from comment #171) > I booted with 5.6.8-200.fc31.x86_64 this morning, and it found the webcam. It works maybe 1 out of 10 times. Now that we know that Alan's revert is in 5.6.8, we can ask why it doesn't seem to be working reliably. Can you get another usbmon trace that Alan could look at?
(In reply to William Bader from comment #173) > I installed kernel-5.7.0-0.rc3.1.fc33 and the webcam worked. How reliable is the camera initialization with 5.7.0-0.rc3?
(In reply to Steve from comment #176) > This is going to require some mental adjustments. :-) Here are the "adjustments": :-) The dates on the mainline and stable commits differ, which causes the commit IDs to differ:* mainline: CommitDate: Thu Apr 23 15:22:41 2020 +0200 stable: CommitDate: Wed Apr 29 16:34:45 2020 +0200 Most of the stable repo commits include the upstream commit ID, so it is possible to search for the mainline commit ID in the stable repo: $ git log --oneline --grep 3155f4f40811c5d7e3c686215051acf504e05565 <<< mainline commit ID (Comment 176 shows the matching commit ID.) f8092b0e02 USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") However, there are a few commits in the stable repo with a line like this: [ no upstream commit ] * The commit IDs are just sha1sum hashes on the commit with a short, null-terminated header: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
(In reply to William Bader from comment #175) ... > >I would bet that if we had access to the firmware, we could explain that seemingly random data being sent by the camera: > > Is there any way to look at it? (Without risking bricking my laptop...) > I remember in MSDOS days, you could do some interesting things reading and writing to some ports. ... With a USB device, there is a protocol: Universal Serial Bus Device Class Specification for Device Firmware Upgrade Version 1.1 Aug 5, 2004 https://usb.org/sites/default/files/DFU_1.1.pdf A web search for "usb developers kit" found several companies that provide such kits. > The mate session issue looks like it has been happening for years: > https://github.com/mate-desktop/mate-session-manager/issues/42 Thanks for that link. It sounds like you are going to have to fix it yourself. :-) Since you seem to be mainly interested in getting xterm sessions reliably restored, fixing that problem might be easier than trying to solve the problem generally, which could involve getting all possible apps to support session-saving. There is an "xterm" component in BZ, so you could try opening a bug there and seeing what the maintainer says: https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&classification=Fedora&component=xterm
>Now that we know that Alan's revert is in 5.6.8, we can ask why it doesn't seem to be working reliably. I think that it is working reliably. I have been booting a lot of different kernels. I didn't realize that 5.6.8 had the fix. I wasn't looking for the webcam, but every once in a while I noticed it was there, so I thought that it was just random, but I think what really happened is that I had booted 5.6.8 instead of the stable kernel at the time, which was 5.6.7. I'll pay more attention to see that 5.6.8 works consistently. Last night 5.6.8 became the stable kernel, and dnfdragora installed some (but not all) of the related packages. $ rpm -qa | grep 'kernel.*5.6.7' | sort kernel-5.6.7-200.fc31.x86_64 kernel-core-5.6.7-200.fc31.x86_64 kernel-debug-devel-5.6.7-200.fc31.x86_64 kernel-devel-5.6.7-200.fc31.x86_64 kernel-headers-5.6.7-200.fc31.x86_64 kernel-modules-5.6.7-200.fc31.x86_64 kernel-tools-libs-5.6.7-200.fc31.x86_64 $ rpm -qa | grep 'kernel.*5.6.8' | sort kernel-5.6.8-200.fc31.x86_64 kernel-core-5.6.8-200.fc31.x86_64 kernel-debug-devel-5.6.8-200.fc31.x86_64 kernel-devel-5.6.8-200.fc31.x86_64 kernel-modules-5.6.8-200.fc31.x86_64 >There is an "xterm" component in BZ Thanks.
There seem to be some 5.6.7 packages still in "updates": # dnf -q repoquery 'kernel*5.6.7*.x86_64' --repo=updates kernel-cross-headers-0:5.6.7-200.fc31.x86_64 kernel-headers-0:5.6.7-200.fc31.x86_64 kernel-tools-0:5.6.7-200.fc31.x86_64 kernel-tools-libs-0:5.6.7-200.fc31.x86_64 kernel-tools-libs-devel-0:5.6.7-200.fc31.x86_64 See if this will remove the problematic packages: # dnf remove kernel-core-5.6.7-200.fc31 --noautoremove The "--noautoremove" option sometimes stops dnf from removing too many packages.
>dnf remove kernel-core-5.6.7-200.fc31 --noautoremove Thanks, that looks like it would work, but to be safe, I am not going to remove them until everything comes in for 5.6.8. $ sudo dnf remove kernel-core-5.6.7-200.fc31 --noautoremove Dependencies resolved. ================================================================================================================================================================================= Package Architecture Version Repository Size ================================================================================================================================================================================= Removing: kernel-core x86_64 5.6.7-200.fc31 @updates 72 M Removing dependent packages: kernel x86_64 5.6.7-200.fc31 @updates 0 kernel-modules x86_64 5.6.7-200.fc31 @updates 28 M kmod-VirtualBox-5.6.7-200.fc31.x86_64 x86_64 6.1.6-1.fc31 @@commandline 783 k Transaction Summary ================================================================================================================================================================================= Remove 4 Packages Freed space: 101 M Is this ok [y/N]: n Operation aborted.
(In reply to Steve from comment #165) ... > Sometimes I get this error, and sometimes I don't: > > ... kernel: usb 2-1: device not accepting address 2, error -71 > > More often I do. I retested with 5.6.8-100.fc30.x86_64 on my laptop with the USB grub2 boot drive, and never once got that error. Specifically, I ran several iterations of warm boots (~7) and several iterations of cold boots (~7) and checked the log each time with: $ journalctl --no-hostname -b -p err BTW, I used Xfce's saved-session feature to restart one instance of xfce4-terminal every time I logged in. That saved some time, because all I had to do was press the up-arrow in the shell to rerun the journalctl and reboot or poweroff commands from the shell's history. xfce4-terminal started every time. There was only a minor issue -- the terminal window would not restart maximized, it would only restart to the size of the desktop.
The kernel people didn't think that the USB drive issue was related, but maybe a lot of devices are only tested well under Windows. https://bugzilla.kernel.org/show_bug.cgi?id=207219#c8 Back in the days of SCO Xenix 386, before vendors began shipping large numbers of PCs with Windows NT pre-installed, we had a lot of problems with motherboards and disk controller cards that didn't work dependably with a 32 bit OS. We eventually made lists of good and bad revision numbers so we knew what to replacement parts to ask for. I tried xfce4-terminal. It is close to what I want, but it didn't like the 10x20 font that I use with xterm.
(In reply to William Bader from comment #185) > The kernel people didn't think that the USB drive issue was related, but maybe a lot of devices are only tested well under Windows. > https://bugzilla.kernel.org/show_bug.cgi?id=207219#c8 I know that ... and that's why I ran my test. :-) > Back in the days of SCO Xenix 386, before vendors began shipping large numbers of PCs with Windows NT pre-installed, we had a lot of problems with motherboards and disk controller cards that didn't work dependably with a 32 bit OS. We eventually made lists of good and bad revision numbers so we knew what to replacement parts to ask for. It's completely unpredictable what USB controller you are going to get. I have USB flash drives that say brand X on the package, but lsusb says they are brand Y. > I tried xfce4-terminal. It is close to what I want, but it didn't like the 10x20 font that I use with xterm. What font do you use? This looks great, IMO: $ cat .Xresources ! .Xresources XTerm*foreground: green XTerm*background: black XTerm*faceName: DejaVu Sans Mono Book XTerm*faceSize: 12 XTerm*scrollBar: true XTerm*rightScrollBar: true BTW, I installed F32 Mate in a VM, and it seems to work quite well with the following test: Tile one mate-terminal and one xterm side-by-side on each of the four workspaces. Reboot. All are restored in their correct positions, but they aren't quite the right sizes, so there is still a bug that needs to be fixed. $ rpm -q mate-terminal xterm mate-terminal-1.24.0-2.fc32.x86_64 xterm-351-2.fc32.x86_64
I tested kernel 5.6.8 with every USB device that I have and found no errors like the one reported in Comment 163. The test procedure is simple. In a full-screen terminal window, run: $ journalctl --no-hostname -k -f Insert and remove whatever USB devices are available, trying both USB2 and USB3 ports. The only possibly-related anomaly I found was with an ASUS-branded Broadcom USB Bluetooth adapter: May 03 07:49:26 kernel: usb 3-11: new full-speed USB device number 20 using xhci_hcd May 03 07:49:42 kernel: usb 3-11: device descriptor read/64, error -110 May 03 07:49:43 kernel: usb 3-11: New USB device found, idVendor=0b05, idProduct=17cb, bcdDevice= 1.12 May 03 07:49:43 kernel: usb 3-11: New USB device strings: Mfr=1, Product=2, SerialNumber=3 May 03 07:49:43 kernel: usb 3-11: Product: BCM20702A0 May 03 07:49:43 kernel: usb 3-11: Manufacturer: Broadcom Corp May 03 07:49:43 kernel: usb 3-11: SerialNumber: [removed] == BTW, the window-sizing problem on session-restore appears to be related to "marco", the MATE Desktop window manager. The relevant directories are here: $ ls .config/marco/sessions/ .config/mate-session/saved-session/ Removing all session files is a simple way to get back to a default state. "marco" is the name of a component in BZ: https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&classification=Fedora&component=marco
I have a different webcam, but exactly the same problem. Kernel 5.6.8 fixed it. Thank you. kernel-core-5.6.7-200.fc31.x86_64 May 2 13:44:50 desktop kernel: usb 1-6: new high-speed USB device number 3 using ehci-pci May 2 13:44:50 desktop kernel: usb 1-6: config 247 has too many interfaces: 120, using maximum allowed: 32 May 2 13:44:50 desktop kernel: usb 1-6: config 247 descriptor has 1 excess byte, ignoring May 2 13:44:50 desktop kernel: usb 1-6: config 247 has 0 interfaces, different from the descriptor's value: 120 May 2 13:44:50 desktop kernel: usb 1-6: New USB device found, idVendor=041e, idProduct=4087, bcdDevice=10.20 May 2 13:44:50 desktop kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 May 2 13:44:50 desktop kernel: usb 1-6: Product: VF0680Live!CamSocialize HD1080 May 2 13:44:50 desktop kernel: usb 1-6: Manufacturer: Creative Technology Ltd. May 2 13:44:50 desktop kernel: usb 1-6: can't set config #247, error -32 kernel-core-5.6.8-200.fc31.x86_64 May 3 13:19:47 desktop kernel: usb 1-6: new high-speed USB device number 3 using ehci-pci May 3 13:19:52 desktop kernel: usb 1-6: device descriptor read/all, error -110 May 3 13:19:52 desktop kernel: usb 1-6: new high-speed USB device number 4 using ehci-pci May 3 13:19:53 desktop kernel: usb 1-6: New USB device found, idVendor=041e, idProduct=4087, bcdDevice=10.20 May 3 13:19:53 desktop kernel: usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3 May 3 13:19:53 desktop kernel: usb 1-6: Product: VF0680 Live! Cam Socialize HD 1080 May 3 13:19:53 desktop kernel: usb 1-6: Manufacturer: Creative Technology Ltd. May 3 13:19:53 desktop kernel: uvcvideo: Found UVC 1.00 device VF0680 Live! Cam Socialize HD 1080 (041e:4087) May 3 13:19:53 desktop kernel: uvcvideo 1-6:1.0: Entity type for entity Extension 4 was not initialized! May 3 13:19:53 desktop kernel: uvcvideo 1-6:1.0: Entity type for entity Extension 3 was not initialized! May 3 13:19:53 desktop kernel: uvcvideo 1-6:1.0: Entity type for entity Processing 2 was not initialized! May 3 13:19:53 desktop kernel: uvcvideo 1-6:1.0: Entity type for entity Camera 1 was not initialized! May 3 13:19:53 desktop kernel: input: VF0680 Live! Cam Socialize HD 1 as /devices/pci0000:00/0000:00:12.2/usb1/1-6/1-6:1.0/input/input16 May 3 13:19:53 desktop kernel: usbcore: registered new interface driver uvcvideo May 3 13:19:53 desktop kernel: USB Video Class driver (1.1.1) May 3 13:19:53 desktop kernel: usb 1-6: Warning! Unlikely big volume range (=5632), cval->res is probably wrong. May 3 13:19:53 desktop kernel: usb 1-6: [2] FU [Mic Capture Volume] ch = 2, val = -5632/0/1 May 3 13:19:53 desktop kernel: usbcore: registered new interface driver snd-usb-audio
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.