Created attachment 818735 [details] snapshot of booting Description of problem: Can't boot the live image of Fedora 19 x64 from usb. Version-Release number of selected component (if applicable): How reproducible: Boot the live image of Fedora 19 x64 from usb. Steps to Reproduce: 1. 2. 3. Actual results: Fedora won't boot Expected results: Fedora should boot. Additional info: [ionut@localhost ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU T6400 @ 2.00GHz stepping : 10 microcode : 0xa0b cpu MHz : 1200.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm bogomips : 3990.06 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU T6400 @ 2.00GHz stepping : 10 microcode : 0xa0b cpu MHz : 1200.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm bogomips : 3990.06 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: 0:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07) 00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03) 00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03) 00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03) 00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03) 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03) 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03) 00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93) 00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M GS] (rev a1) 02:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller 02:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller 02:00.3 System peripheral: JMicron Technology Corp. MS Host Controller 04:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection 07:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02)
Hi, I've also tried to boot from dvd/cd but I've got slightly the same error (cpu#1 instead of cpu#0). What I've tried : dvd/cd : fedora 19 mate spin x64, fedora live x64, fedora dvd x64 usb : fedora 19 mate spin x64, fedora live x64, fedora dvd x64 None of the above was able to boot, so I have also tried Fedora-Live-MATE-Compiz-i686-19-1.iso. I've got the same error but with a trace too. Note: most of checksums of iso were checked.
Created attachment 819999 [details] snapshot of booting for mate spin x64
Created attachment 820000 [details] snapshot of booting for mate spin i686
had similar problem with fedora 20 which doesnt boot unless use kernels less than 3.11.7-300.fc20.x86_64. will not boot with kernel-3.13.0-0.r0.git1.3.fc21.x86_64 either. boots ok with kernel-3.10.4-300.fc19.x86_64
Hi, does anyone have a proper kernel stack trace here? systemd-udev being stuck is normally just a symptom of something going wrong in kernel-land. Without a proper backtrace it is hard to speculate. collura: When you say 'doesnt boot unless use kernels less than 3.11.7-300.fc20.x86_64', is it also broken with 3.11.6? Thanks, Michele
Except the screen snapshots I have done with my phone I cannot provide more information as long as my laptop is not booting. Still, one of the a snapshots (booting of f19 spin mate i686) is containing some stack trace information but I don't know if it's useful or not.
Nope that does not help all too much unfortunately. Does it work with other Fedora versions/kernels?
Hi, Yes, I've been able to boot f18 a long time ago. I don't remember if it was the live image or the dvd and I have also sucessfuly installed it. Then about 2 months ago I've upgraded to F19 using FedUp without any issue. Also made a lot of updates with yum without any issue. At the beginning of November I've created a bootable usb media with F19 in order to install it on a different laptop. I've wanted to test it on my laptop first and, suprise, it wasn't booting. That's the whole story.
from comment#5 0) no 3.11.6 doesnt boot either. 1) tried adding 'pci=noacpi' to kernel line and allows boot i went back to check and the problem seems to start with kernel-3.11.0-0.rc0.git6.1.fc20.x86_64 and continues upto most recent kernel (kernel-3.13.0-0.rc0.git5.1.fc21.x86_64). however from checking kernel-3.11.0-0.rc3.git0.1.fc20.x86_64 the problem seems to happen right after something like: 'starting load/save screen backlight brightness of acpi_video0... BUG: unable to handle kernel NULL pointer dereferece at (null) IP: [< (null)>] (null) PDG 0 oops: 0010 [#1] SMP modules linked in: i2c_piix4 shpchp edac_mce_amd serio_raw fam15h_power wmi acpi_cpufreq mperf video binfmt_misc radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core CPU: 0 PID: 37 COMM: kworker/0:1 not tainted 3.11.0-0.rc3.git0.1.fc20.x86_64 #1 Hardware name: TOSHIBA Satellite L75D-A/Larne, BIOS 1.10 5/16/2013 workque: kacpi_notify acpi_os_execute_deferred' so it tried adding 'pci=noacpi' option to the kernel lines in grub and i was then able to boot all the kernels between kernel-3.11.0-0.rc0.git6.1.fc20.x86_64 and kernel-3.13.0-0.rc0.git5.1.fc21.x86_64 on an fc20 system if that option is set in the kernel line of grub entry. :')
I'd need to have a full proper backtrace to try and see where the issue is
Can I have a full backtrace from when it fails?
same with kernel-3.11.10-300.fc20.x86_64. how to get the baktrace? abrt doesnt seem to pick it up. system doesnt seem to boot enough to make log entries when booting without 'pci=noacpi', unless am misreading.
A picture with more lines so that we know what caused it would be a start. Also does the issue persist with 3.12.5? thanks, Michele
Created attachment 842112 [details] snapshot of booting for fedora 20 mate spin x64
Hi Michele, There is no way to provide more lines. The same lines are displayed over and over again. There is the same problem with fedora 20 mate spin x86_64. See the new snapshot. However I've successfully updated to Fedora 20 using fedup and have no problem with booting of 3.12.5. Booting problem occurs when using live images or installation dvd only. thanks, Ionut Radu.
Please note that live image is not booting even-though I use pci=noacpi, so my issue is different from collura's issue. Collura can you please open another bug for your issue?
Hi Ionut, I see interesting. It might very well be this is just a kernel warning due to the time it takes to scan stuff on the DVD. IIRC anaconda collects the messages log somewhere in ramfs. If you could try and find that and copy it somewhere we could try and confirm. My hunch is that this is likely harmless. regards, Michele
Created attachment 842427 [details] snapshot of booting after ~20 minutes Hi Michele, It's not an warning. 20 minutes later there are the same messages displayed. Nothing else happens. See the new snapshot. Why do you thing it's just an warning ? thanks, Ionut Radu.
Also I've used an USB device for fedora 20 live mate spin booting. It's much easier and less time consuming to use an usb device than burning a dvd. I've used liveusb-creator to download and write the image to the usb. A few weeks ago, I've also tried Ubuntu 13.10 live with Unebootin and get more or less the same error. I can boot Fedora 18 and Ubuntu 12.04 live images only. After that no live image can boot for me. Thanks, Ionut Radu.
I've just retried booting Fedora-18-x86_64-Live-Desktop.iso and had no issue. [liveuser@localhost ~]$ uname -a Linux localhost 3.6.10-4.fc18.x86_64 #1 SMP Tue Dec 11 18:01:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Created attachment 842591 [details] stack trace 1
Created attachment 842592 [details] stack trace 2
Hi Michele, I've removed rhgb and quiet from kernel parameters when booting Fedora-Live-MATE-Compiz-x86_64-20-1.iso and got more stack trace information. Please see the 2 new snapshots. thanks, Ionut Radu.
Hi Michele, I've succeeded to boot Fedora-Live-MATE-Compiz-x86_64-20-1.iso by adding the following parameters to the kernel : pci=noacpi nouveau.modeset=0 rd.driver.blacklist=nouveau However the kernel is not 3.12.5 as you stated but: [liveuser@localhost ~]$ uname -a Linux localhost 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux thanks, Ionut Radu.
I have also been able to boot Fedora-Live-MATE-Compiz-x86_64-20-1.iso by using nouveau.modeset=0 alone. So the problem is in the nouveau driver. thanks, Ionut Radu.
nomodeset allows me to boot as well.
Hi Ionut, thanks. I had misunderstood and thought it was only a warning but then the live system would proceed booting. So if disabling nouveau/acpi helps, it's somehwere in that driver or in the acpi layer (or a combination thereof) So just to recap: - 3.6.10-4.fc18.x86_64 Live iso -> OK - 3.11.10-301.fc20.x86_64 Live iso -> NOT OK - 3.11.10-301.fc20.x86_64 Live iso -> OK (with nouveau.modeset=0) - 3.12.5-.... F20 upgraded system -> OK Is the above picture correct or did I miss anything? If it is correct I guess the nouveau issue has been fixed in 3.12.x and the problem is that there are no Live images with 3.12 just yet. Can you confirm (or not) the recap above? thanks, Michele
Hi Michele, Your recap is correct. nouveau disabling only is needed. I'll try to set-up a live image with kernel 3.12.5 and see what's happening. thanks, Ionut Radu.
Hi Michele, I've crafted a Fedora 20 live image with kernel 3.12.5 and I see the bug is still present. I've used these instructions : https://fedoraproject.org/wiki/How_to_create_and_use_Live_USB?rd=FedoraLiveCD/USBHowTo#Kernel_updates I still need to use nouveau.modeset=0 to be able to boot and the kernel is the right one : [liveuser@localhost ~]$ uname -a Linux localhost 3.12.5-302.fc20.x86_64 #1 SMP Tue Dec 17 20:42:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux I've tried to update xorg-x11-drv-nouveau but there is no update available: [root@localhost liveuser]# yum update xorg-x11-drv-nouveau Loaded plugins: langpacks No packages marked for update Bottom line is that the bug is still present. thanks, Ionut Radu.
Hi Ionut, ok, that's a bit surprising. So the same 3.12.5-302 kernel needs nouveau.modeset=0 to work when used on a USB/LiveCD otherwise we get stuck with the messages from the BZ subject. The very same kernel when booted from the installed system/harddrive has no issues. I guess some odd udev/dracut interaction when used via USB. Or there are some other differences between the installed system and the livecd (nouveau, xorg, dracut, udev). I'd start verifying these other potential differences. Or see if we can get some full logs off the anaconda usb install (http://fedoraproject.org/wiki/Anaconda/Logging) hth, Michele
Hi Michele, I'm having nouveau.modeset=0 and rd.driver.blacklist=nouveau in my grub config file from a long time ago because it was interfering with nvidia driver on Fedora 18, so maybe that's why I had no issue with any kernel version on my installed system. I can't get anaconda log, anaconda is not even started. thanks, Ionut Radu.
Kernel 3.12.5 is booting without these kernel parameters "nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off" on my installed system, but : - I don't have nouveau driver installed - I have nvidia driver installed instead
Hi Ionut, ah ok that explains it. So the issue was always present and the livecd vs installed was a red herring. I think the best course of action is to give 3.13 a try and see if it boots without the nouveau.modeset=0 setting. If that works excellent. If that does not work, we will need the full messages output to investigate further and maybe raise this upstream hth, Michele
Hi Michele, There is no 3.13 kernel : [root@localhost ~]# uname -a Linux localhost 3.12.5-302.fc20.x86_64 #1 SMP Tue Dec 17 20:42:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]# yum update kernel Loaded plugins: langpacks No packages marked for update Anyway if the booting will stall there is no way to provide more than snapshots of the screen. thanks, Ionut.
Hi Michele, Do you know what package is containing the nouveau driver. I see that on my installed system I have : [root@localhost ~]# rpm -qa | grep nouveau xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64 Which contains : [root@localhost ~]# rpm -qvl xorg-x11-drv-nouveau -rwxr-xr-x 1 root root 211088 Jul 31 06:46 /usr/lib64/xorg/modules/drivers/nouveau_drv.so -rw-r--r-- 1 root root 2538 Jul 31 06:46 /usr/share/man/man4/nouveau.4.gz Is this the driver ? If yes, than I was wrong when stated that I don't have nouveau driver installed. thanks, Ionut.
Hi Ionut, yes 3.13 has not yet gone GA upstream and is not in Fedora updates. If you are comfortable trying out very experimental kernels you can try here: http://koji.fedoraproject.org/koji/buildinfo?buildID=487242 The nouveau kernel driver is in the kernel package itself. The xorg driver (which uses amongst others the kernel driver) is in the xorg-x11-drv-nouveau package. hth, Michele
Hi Michele, I've succeeded to reproduce the issue on installed system (kernel 3.12.5-302) Two steps were needed : 1) erased /usr/lib/modprobe.d/blacklist-nouveau.conf containing: # RPM Fusion blacklist for nouveau driver - you need to run as root: # dracut -f /boot/initramfs-$(uname -r).img $(uname -r) # if nouveau is loaded despite this file. blacklist nouveau That file was added by me in the past when there were issues with nvidia driver on fedora 18 2) removed nouveau.modeset=0 and rd.driver.blacklist=nouveau from kernel parameters thanks, Ionut.
Actually, I see that /usr/lib/modprobe.d/blacklist-nouveau.conf is part of nvidia driver (so it wasn't added by me): [ionut@localhost ~]$ rpm -qf /usr/lib/modprobe.d/blacklist-nouveau.conf xorg-x11-drv-nvidia-331.20-6.fc20.x86_64
Created attachment 844309 [details] stack trace 1 on kernel 3.13 Hi Michele, I see the same issue on kernel 3.13. Added 2 snapshots. thanks, Ionut
Created attachment 844310 [details] stack trace 2 on kernel 3.13.0
Hi Ionut, yes the nvidia driver cannot coexist with the nouveau driver so it blacklists it. Meh the screen captures are still only very partial to be able to open a proper bug report upstream. Given that you can reproduce it from the installed system, we could try and use: https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes To get a proper crash file to analyze the issue. Could you give the above a shot? regards, Michele
I wonder if this one is related: https://bugzilla.redhat.com/show_bug.cgi?id=983189 At least a good chunk of the stack trace seems to be related.
Hi Michele, It looks like kdump is useless in my case. As far as I understand it requires for kernel to normally boot before crashing or at least to reach init scripts (see Considerations: 2. Init scripts take care of pre-loading the capture kernel at system boot time.) Neither of these are happening. Also the kernel doesn't get a panic trigger. It just enters a dead-lock. thanks, Ionut.
Regarding https://bugzilla.redhat.com/show_bug.cgi?id=983189, I see no resemblance between stack traces. Also that bug refers to abrt while in my case the booting doesn't get so far.
Hi Ionut, well kdump can trigger on any oops just set /proc/sys/kernel/panic_on_oops to 1 In terms of stack trace in BZ 983189, your second screenshot shows a trace that is quite similar to: [<ffffffff81320b91>] pci_device_probe+0x121/0x130 [<ffffffff813e5c27>] driver_probe_device+0x87/0x390 [<ffffffff813e6003>] __driver_attach+0x93/0xa0 [<ffffffff813e5f70>] ? __device_attach+0x40/0x40 [<ffffffff813e3c53>] bus_for_each_dev+0x63/0xa0 [<ffffffff813e56ae>] driver_attach+0x1e/0x20 [<ffffffff813e5238>] bus_add_driver+0x1e8/0x2b0 [<ffffffffa036e000>] ? 0xffffffffa036dfff [<ffffffff813e6641>] driver_register+0x71/0x150 [<ffffffffa036e000>] ? 0xffffffffa036dfff [<ffffffff8131f72b>] __pci_register_driver+0x4b/0x50 [<ffffffffa00278ba>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa036e000>] ? 0xffffffffa036dfff [<ffffffffa036e04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] It is truncated so we are obviously not 100% sure, but it is worth keeping an eye on. If you could manage to get a dump, we might confirm or dispel this theory. But without a full log there is not much else that can be done. Except always trying later kernels and see if this is fixed. hth, Michele
Hi Michele, How I'm supposed to issue "set /proc/sys/kernel/panic_on_oops to 1" if the booting is not succesful ? thanks, Ionut.
Hi Ionut, well if the oops happens before systemd does its /etc/sysctl.conf parsing we could try: 1) Just pass the kernel option oops=panic 2) - Blacklist nouveau - Boot normally until get to the text console (either it happens via vesafb or pure text) - login and verify the sysctl - modprobe nouveau 1 is simpler (obviously). With 2 we might actually be able to scroll back via Shift-PageUP and see the whole stacktrace (unless there are other issues with modprobing the driver later on) hth, Michele
Hi Michele, See below the answers: 1) even though kernel option oops=panic is used, init scripts are still not reached so the capture kernel is not loaded so kdump won't work 2) modprobe nouveau does not result in a kernel crash: lsmod | grep nouveau nouveau 952573 0 mxm_wmi 12865 1 nouveau wmi 18804 2 mxm_wmi,nouveau i2c_algo_bit 13257 1 nouveau ttm 79787 1 nouveau drm_kms_helper 50287 1 nouveau drm 283349 3 ttm,drm_kms_helper,nouveau i2c_core 38302 6 drm,i2c_i801,drm_kms_helper,i2c_algo_bit,nouveau,videodev video 19104 1 nouveau
Created attachment 845295 [details] nouveau messages for kernel 3.13.0 Hi Michele, It looks to me that nouveau driver tries to execute some code from NVIDIA card nvram and that's when the crash occurs. Please see the nouveau.jpg snapshot taken for kernel 3.13.0. thanks, Ionut.
Hi Michele, I've taken one more look to https://bugzilla.redhat.com/show_bug.cgi?id=983189 and noticed that i915 driver is present too, so it looks like an optimus laptop there having intel and nvidia graphic while my laptop is about 5 years old and it has nvidia graphic only. I don't think the two bugs are related. thanks, Ionut.
Hi Michele, I've suceeded to reproduce kernel crash on "modprobe nouveau" but the kernel panic is not triggered with either oops=panic kernel option or "echo 1 > /proc/sys/kernel/panic_on_oops" Indeed when using oops=panic the content of /proc/sys/kernel/panic_on_oops is already 1 instead of default 0. Do you have any other ideea on how to enforce trigger of kernel panic? Thanks, Ionut
Hi Ionut, yes 983189 is not necessarily the same thing but it had the exact same stack trace (up to where we can see it) so it is worth keeping an eye on. Odd that the crash happened but no kernel panic. Are you sure there is an oops? I don't see any crashes in the jpg you just uploaded? Michele
Hi Michele, In the jpg I've uploaded I've captured the moment before the crash. I've succeeded to obtain a vmcore, but saving of vmcore-dmesg failed. Please check: https://www.dropbox.com/sh/e77p700zr8g1v4z/y3ldY3npQB thanks, Ionut.
vmcore is for kernel 3.13.0. I've enforced kernel panic through sysrq (i.e pressed alt + sysrq + c). Hope this helps. I couldn't trigger a kernel panic another way. I've even set all /proc/sys/kernel/panic_* to 1 and still no panic was triggered. Thanks, Ionut.
Hi Ionut, I've downloaded the vmcore. Am downloading the corresponding rpm's so I can try and see what is up. It'll take a bit as I am on a slow connection currently. regards, Michele
Hi Michele, I've forgot to mention that trying different ways of obtaining a dump, I've used kernel-3.13.0-debug (3.13.0-0.rc6.git0.1.fc21.x86_64+debug), not the usual kernel. thanks, Ionut.
Created attachment 845414 [details] vmcore log Log from vmcore
Hi Ionut, great thanks. This is https://bugs.freedesktop.org/show_bug.cgi?id=72943 hth, Michele
Hi Michele, This guy is saying he can't boot any kernel above 3.2 while I can boot 3.6.10 from Fedora 18. Thanks, Ionut.
Thanks Michele, indeed it looks like it's the same bug. There is the same video card and slightly different vbios version. Regards, Ionut Radu.
Hi Ionut, thanks for your persistence here ;) At least we have pinpointed the exact place where we fail. Changing the title to make it reflect the situation. regards, Michele
Hi Michele, Thanks for your useful suggestions. At some point I thought that filling this bug was just a waste of time. Let's keep a reference of the original summary "BUG: soft lockup - CPU#0 stuck for 23s [systemd-udevd:194]" in order to help others having the same problem and seeing this message. thanks, Ionut.
Issue was fixed. Thanks, Ionut.