Bug 1026073 - [NV98] hangs in nvbios_init on probe
[NV98] hangs in nvbios_init on probe
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nouveau (Show other bugs)
20
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Ben Skeggs
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-03 07:15 EST by Ionut Radu
Modified: 2014-01-08 16:30 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-08 16:28:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
snapshot of booting (2.58 MB, image/jpeg)
2013-11-03 07:15 EST, Ionut Radu
no flags Details
snapshot of booting for mate spin x64 (2.46 MB, image/jpeg)
2013-11-05 15:04 EST, Ionut Radu
no flags Details
snapshot of booting for mate spin i686 (2.48 MB, image/jpeg)
2013-11-05 15:05 EST, Ionut Radu
no flags Details
snapshot of booting for fedora 20 mate spin x64 (2.92 MB, image/jpeg)
2013-12-26 17:03 EST, Ionut Radu
no flags Details
snapshot of booting after ~20 minutes (3.47 MB, image/jpeg)
2013-12-27 13:18 EST, Ionut Radu
no flags Details
stack trace 1 (3.12 MB, image/jpeg)
2013-12-28 05:17 EST, Ionut Radu
no flags Details
stack trace 2 (3.37 MB, image/jpeg)
2013-12-28 05:18 EST, Ionut Radu
no flags Details
stack trace 1 on kernel 3.13 (2.67 MB, image/jpeg)
2014-01-01 18:19 EST, Ionut Radu
no flags Details
stack trace 2 on kernel 3.13.0 (2.36 MB, image/jpeg)
2014-01-01 18:20 EST, Ionut Radu
no flags Details
nouveau messages for kernel 3.13.0 (2.22 MB, image/jpeg)
2014-01-04 04:22 EST, Ionut Radu
no flags Details
vmcore log (5.91 KB, text/plain)
2014-01-04 11:50 EST, Ionut Radu
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
FreeDesktop.org 72943 None None None Never

  None (edit)
Description Ionut Radu 2013-11-03 07:15:35 EST
Created attachment 818735 [details]
snapshot of booting

Description of problem:
Can't boot the live image of Fedora 19 x64 from usb.

Version-Release number of selected component (if applicable):


How reproducible:
Boot the live image of Fedora 19 x64 from usb.


Steps to Reproduce:
1.
2.
3.

Actual results:
Fedora won't boot

Expected results:
Fedora should boot.

Additional info:
[ionut@localhost ~]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     T6400  @ 2.00GHz
stepping        : 10
microcode       : 0xa0b
cpu MHz         : 1200.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm
bogomips        : 3990.06
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     T6400  @ 2.00GHz
stepping        : 10
microcode       : 0xa0b
cpu MHz         : 1200.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm
bogomips        : 3990.06
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

0:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M GS] (rev a1)
02:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller
02:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller
02:00.3 System peripheral: JMicron Technology Corp. MS Host Controller
04:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
07:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02)
Comment 1 Ionut Radu 2013-11-05 15:01:30 EST
Hi, 

I've also tried to boot from dvd/cd but I've got slightly the same error (cpu#1 instead of cpu#0). 

What I've tried :

dvd/cd : fedora 19 mate spin x64, fedora live x64, fedora dvd x64
usb : fedora 19 mate spin x64, fedora live x64, fedora dvd x64

None of the above was able to boot, so I have also tried Fedora-Live-MATE-Compiz-i686-19-1.iso. I've got the same error but with a trace too.

Note: most of checksums of iso were checked.
Comment 2 Ionut Radu 2013-11-05 15:04:10 EST
Created attachment 819999 [details]
snapshot of booting for mate spin x64
Comment 3 Ionut Radu 2013-11-05 15:05:26 EST
Created attachment 820000 [details]
snapshot of booting for mate spin i686
Comment 4 collura 2013-11-14 05:09:32 EST
had similar problem with fedora 20 which 
doesnt boot unless use kernels less than 3.11.7-300.fc20.x86_64.  
will not boot with kernel-3.13.0-0.r0.git1.3.fc21.x86_64 either.

boots ok with kernel-3.10.4-300.fc19.x86_64
Comment 5 Michele Baldessari 2013-11-16 14:25:48 EST
Hi,

does anyone have a proper kernel stack trace here? systemd-udev being stuck
is normally just a symptom of something going wrong in kernel-land.
Without a proper backtrace it is hard to speculate.

collura@ieee.org: When you say 'doesnt boot unless use kernels less than 3.11.7-300.fc20.x86_64', is it also broken with 3.11.6?

Thanks,
Michele
Comment 6 Ionut Radu 2013-11-16 16:28:38 EST
Except the screen snapshots I have done with my phone I cannot provide more information as long as my laptop is not booting.
Still, one of the a snapshots (booting of f19 spin mate i686) is containing some stack trace information but I don't know if it's useful or not.
Comment 7 Michele Baldessari 2013-11-16 16:38:24 EST
Nope that does not help all too much unfortunately. Does it work with other Fedora versions/kernels?
Comment 8 Ionut Radu 2013-11-16 16:53:35 EST
Hi,

Yes, I've been able to boot f18 a long time ago. I don't remember if it was the live image or the dvd and I have also sucessfuly installed it. Then about 2 months ago I've upgraded to F19 using FedUp without any issue. Also made a lot of updates with yum without any issue. At the beginning of November I've created a bootable usb media with F19 in order to install it on a different laptop. I've wanted to test it on my laptop first and, suprise, it wasn't booting. That's the whole story.
Comment 9 collura 2013-11-19 05:43:48 EST
from comment#5 

0) no 3.11.6 doesnt boot either.

1) tried adding 'pci=noacpi' to kernel line and allows boot

i went back to check and the problem seems to start with 
   kernel-3.11.0-0.rc0.git6.1.fc20.x86_64
and continues upto most recent kernel (kernel-3.13.0-0.rc0.git5.1.fc21.x86_64).

however from checking kernel-3.11.0-0.rc3.git0.1.fc20.x86_64 the problem seems to happen right after something like:
  'starting load/save screen backlight brightness of acpi_video0...
  BUG: unable to handle kernel NULL pointer dereferece at     (null)
  IP: [<   (null)>]     (null)
  PDG 0
  oops: 0010 [#1] SMP
  modules linked in: i2c_piix4 shpchp edac_mce_amd serio_raw fam15h_power wmi acpi_cpufreq mperf video binfmt_misc radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core
  CPU: 0   PID: 37   COMM: kworker/0:1 not tainted 3.11.0-0.rc3.git0.1.fc20.x86_64 #1
  Hardware name: TOSHIBA Satellite L75D-A/Larne, BIOS 1.10  5/16/2013
  workque: kacpi_notify acpi_os_execute_deferred'

so it tried adding 'pci=noacpi' option to the kernel lines in grub and i was then able to boot all the kernels between
  kernel-3.11.0-0.rc0.git6.1.fc20.x86_64
  and 
  kernel-3.13.0-0.rc0.git5.1.fc21.x86_64 
on an fc20 system if that option is set in the kernel line of grub entry. :')
Comment 10 Michele Baldessari 2013-11-23 18:26:18 EST
I'd need to have a full proper backtrace to try and see where the issue is
Comment 11 Michele Baldessari 2013-12-01 06:18:19 EST
Can I have a full backtrace from when it fails?
Comment 12 collura 2013-12-05 09:09:05 EST
same with kernel-3.11.10-300.fc20.x86_64.

how to get the baktrace? abrt doesnt seem to pick it up. system doesnt seem to boot enough to make log entries when booting without 'pci=noacpi', unless am misreading.
Comment 13 Michele Baldessari 2013-12-26 14:29:29 EST
A picture with more lines so that we know what caused it would be a start.

Also does the issue persist with 3.12.5?

thanks,
Michele
Comment 14 Ionut Radu 2013-12-26 17:03:32 EST
Created attachment 842112 [details]
snapshot of booting for fedora 20 mate spin x64
Comment 15 Ionut Radu 2013-12-26 17:07:26 EST
Hi Michele,

There is no way to provide more lines. The same lines are displayed 
over and over again.
There is the same problem with fedora 20 mate spin x86_64. See the new 
snapshot.
However I've successfully updated to Fedora 20 using fedup and have no problem 
with booting of 3.12.5. Booting problem occurs when using live images or installation dvd only.

thanks,
Ionut Radu.
Comment 16 Ionut Radu 2013-12-27 07:14:43 EST
Please note that live image is not booting even-though I use pci=noacpi, so my issue is different from collura's issue.
Collura can you please open another bug for your issue?
Comment 17 Michele Baldessari 2013-12-27 12:44:04 EST
Hi Ionut,

I see interesting. It might very well be this is just a kernel warning due to
the time it takes to scan stuff on the DVD. 

IIRC anaconda collects the messages log somewhere in ramfs. If you could try and find that and copy it somewhere we could try and confirm. My hunch is that
this is likely harmless.

regards,
Michele
Comment 18 Ionut Radu 2013-12-27 13:18:23 EST
Created attachment 842427 [details]
snapshot of booting after ~20 minutes


Hi Michele,

It's not an warning. 20 minutes later there are the 
same messages displayed. Nothing else happens.
See the new snapshot.
Why do you thing it's just an warning ?

thanks,
Ionut Radu.
Comment 19 Ionut Radu 2013-12-27 13:30:04 EST
Also I've used an USB device for fedora 20 live mate spin booting. It's much easier and less time consuming to use an usb device than burning a dvd.
I've used liveusb-creator to download and write the image to the usb.

A few weeks ago, I've also tried Ubuntu 13.10 live with Unebootin and get more or less the same error.

I can boot Fedora 18 and Ubuntu 12.04 live images only. After that no live image can boot for me.

Thanks,
Ionut Radu.
Comment 20 Ionut Radu 2013-12-27 16:29:57 EST

I've just retried booting Fedora-18-x86_64-Live-Desktop.iso and had no issue.

[liveuser@localhost ~]$ uname -a
Linux localhost 3.6.10-4.fc18.x86_64 #1 SMP Tue Dec 11 18:01:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Comment 21 Ionut Radu 2013-12-28 05:17:46 EST
Created attachment 842591 [details]
stack trace 1
Comment 22 Ionut Radu 2013-12-28 05:18:27 EST
Created attachment 842592 [details]
stack trace 2
Comment 23 Ionut Radu 2013-12-28 05:20:14 EST

Hi Michele,

I've removed rhgb and quiet from kernel parameters when booting 
Fedora-Live-MATE-Compiz-x86_64-20-1.iso and got more stack trace information.
Please see the 2 new snapshots.

thanks,
Ionut Radu.
Comment 24 Ionut Radu 2013-12-28 06:19:40 EST

Hi Michele,

I've succeeded to boot Fedora-Live-MATE-Compiz-x86_64-20-1.iso by adding the following parameters to the kernel :

pci=noacpi nouveau.modeset=0 rd.driver.blacklist=nouveau

However the kernel is not 3.12.5 as you stated but:
[liveuser@localhost ~]$ uname -a
Linux localhost 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


thanks,
Ionut Radu.
Comment 25 Ionut Radu 2013-12-28 06:30:04 EST

I have also been able to boot Fedora-Live-MATE-Compiz-x86_64-20-1.iso by using 

nouveau.modeset=0

alone. So the problem is in the nouveau driver.

thanks,
Ionut Radu.
Comment 26 Ionut Radu 2013-12-28 06:50:39 EST
nomodeset allows me to boot as well.
Comment 27 Michele Baldessari 2013-12-28 13:33:08 EST
Hi Ionut,

thanks. I had misunderstood and thought it was only a warning but then the live system would proceed booting. So if disabling nouveau/acpi helps, it's somehwere in  that driver or in the acpi layer (or a combination thereof)

So just to recap:
- 3.6.10-4.fc18.x86_64 Live iso -> OK
- 3.11.10-301.fc20.x86_64 Live iso -> NOT OK
- 3.11.10-301.fc20.x86_64 Live iso -> OK (with nouveau.modeset=0)
- 3.12.5-.... F20 upgraded system -> OK

Is the above picture correct or did I miss anything? If it is correct I guess
the nouveau issue has been fixed in 3.12.x and the problem is that there are no
Live images with 3.12 just yet.

Can you confirm (or not) the recap above?

thanks,
Michele
Comment 28 Ionut Radu 2013-12-29 04:40:47 EST
Hi Michele,

Your recap is correct. nouveau disabling only is needed.
I'll try to set-up a live image with kernel 3.12.5 and 
see what's happening.

thanks,
Ionut Radu.
Comment 29 Ionut Radu 2013-12-29 08:36:32 EST

Hi Michele,

I've crafted a Fedora 20 live image with kernel 3.12.5 and I see the bug 
is still present.
I've used these instructions : https://fedoraproject.org/wiki/How_to_create_and_use_Live_USB?rd=FedoraLiveCD/USBHowTo#Kernel_updates

I still need to use nouveau.modeset=0 to be able to boot and the kernel is the right one :
[liveuser@localhost ~]$ uname -a
Linux localhost 3.12.5-302.fc20.x86_64 #1 SMP Tue Dec 17 20:42:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I've tried to update xorg-x11-drv-nouveau but there is no update available:
[root@localhost liveuser]# yum update xorg-x11-drv-nouveau
Loaded plugins: langpacks
No packages marked for update

Bottom line is that the bug is still present.

thanks,
Ionut Radu.
Comment 30 Michele Baldessari 2014-01-01 12:38:11 EST
Hi Ionut,

ok, that's a bit surprising. So the same 3.12.5-302 kernel needs nouveau.modeset=0 to work when used on a USB/LiveCD otherwise we get stuck
with the messages from the BZ subject.
The very same kernel when booted from the installed system/harddrive has
no issues.

I guess some odd udev/dracut interaction when used via USB. Or there are some
other differences between the installed system and the livecd (nouveau, xorg,
dracut, udev).

I'd start verifying these other potential differences. Or see if we can get
some full logs off the anaconda usb install (http://fedoraproject.org/wiki/Anaconda/Logging)

hth,
Michele
Comment 31 Ionut Radu 2014-01-01 13:11:06 EST

Hi Michele,

I'm having nouveau.modeset=0 and rd.driver.blacklist=nouveau in my grub config 
file from a long time ago because it was interfering with nvidia driver on Fedora 18, so maybe that's why I had no issue with any kernel version on my installed system.

I can't get anaconda log, anaconda is not even started.

thanks,
Ionut Radu.
Comment 32 Ionut Radu 2014-01-01 14:21:45 EST

Kernel 3.12.5 is booting without these kernel parameters 
"nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off" on my installed system, but :

- I don't have nouveau driver installed
- I have nvidia driver installed instead
Comment 33 Michele Baldessari 2014-01-01 14:40:35 EST
Hi Ionut,

ah ok that explains it. So the issue was always present and the livecd vs 
installed was a red herring.

I think the best course of action is to give 3.13 a try and see if it
boots without the nouveau.modeset=0 setting. If that works excellent.
If that does not work, we will need the full messages output to investigate 
further and maybe raise this upstream 

hth,
Michele
Comment 34 Ionut Radu 2014-01-01 15:31:06 EST
Hi Michele,

There is no 3.13 kernel :

[root@localhost ~]# uname -a 
Linux localhost 3.12.5-302.fc20.x86_64 #1 SMP Tue Dec 17 20:42:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# yum update kernel
Loaded plugins: langpacks
No packages marked for update


Anyway if the booting will stall there is no way to provide more than snapshots 
of the screen.

thanks,
Ionut.
Comment 35 Ionut Radu 2014-01-01 15:42:26 EST

Hi Michele,

Do you know what package is containing the nouveau driver.
I see that on my installed system I have :
[root@localhost ~]# rpm -qa | grep nouveau
xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64

Which contains :

[root@localhost ~]# rpm -qvl xorg-x11-drv-nouveau
-rwxr-xr-x    1 root    root                   211088 Jul 31 06:46 /usr/lib64/xorg/modules/drivers/nouveau_drv.so
-rw-r--r--    1 root    root                     2538 Jul 31 06:46 /usr/share/man/man4/nouveau.4.gz

Is this the driver ? If yes, than I was wrong when stated that I don't have 
nouveau driver installed.

thanks,
Ionut.
Comment 36 Michele Baldessari 2014-01-01 15:48:45 EST
Hi Ionut,

yes 3.13 has not yet gone GA upstream and is not in Fedora updates. If you are 
comfortable trying out very experimental kernels you can try here:
http://koji.fedoraproject.org/koji/buildinfo?buildID=487242

The nouveau kernel driver is in the kernel package itself. The xorg driver
(which uses amongst others the kernel driver) is in the xorg-x11-drv-nouveau
package.

hth,
Michele
Comment 37 Ionut Radu 2014-01-01 17:46:11 EST

Hi Michele,

I've succeeded to reproduce the issue on installed system (kernel 3.12.5-302)

Two steps were needed :
1) erased /usr/lib/modprobe.d/blacklist-nouveau.conf containing:
# RPM Fusion blacklist for nouveau driver - you need to run as root:
# dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
# if nouveau is loaded despite this file.
blacklist nouveau

That file was added by me in the past when there were issues with nvidia driver
on fedora 18

2) removed nouveau.modeset=0 and rd.driver.blacklist=nouveau from kernel parameters

thanks,
Ionut.
Comment 38 Ionut Radu 2014-01-01 18:00:10 EST

Actually, I see that /usr/lib/modprobe.d/blacklist-nouveau.conf is part of nvidia driver (so it wasn't added by me):

[ionut@localhost ~]$ rpm -qf /usr/lib/modprobe.d/blacklist-nouveau.conf
xorg-x11-drv-nvidia-331.20-6.fc20.x86_64
Comment 39 Ionut Radu 2014-01-01 18:19:19 EST
Created attachment 844309 [details]
stack trace 1 on kernel 3.13


Hi Michele,

I see the same issue on kernel 3.13.
Added 2 snapshots.

thanks,
Ionut
Comment 40 Ionut Radu 2014-01-01 18:20:31 EST
Created attachment 844310 [details]
stack trace 2 on kernel 3.13.0
Comment 41 Michele Baldessari 2014-01-02 04:11:22 EST
Hi Ionut,

yes the nvidia driver cannot coexist with the nouveau driver so 
it blacklists it.

Meh the screen captures are still only very partial to be able
to open a proper bug report upstream.

Given that you can reproduce it from the installed system, we could
try and use: https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes

To get a proper crash file to analyze the issue.

Could you give the above a shot?

regards,
Michele
Comment 42 Michele Baldessari 2014-01-02 04:18:17 EST
I wonder if this one is related: https://bugzilla.redhat.com/show_bug.cgi?id=983189

At least a good chunk of the stack trace seems to be related.
Comment 43 Ionut Radu 2014-01-02 10:37:16 EST

Hi Michele,

It looks like kdump is useless in my case. As far as I understand it requires 
for kernel to normally boot before crashing or at least to reach init scripts 
(see Considerations: 2. Init scripts take care of pre-loading the capture kernel at system boot time.) Neither of these are happening.

Also the kernel doesn't get a panic trigger. It just enters a dead-lock.

thanks,
Ionut.
Comment 44 Ionut Radu 2014-01-02 10:50:31 EST
Regarding  https://bugzilla.redhat.com/show_bug.cgi?id=983189, I see no resemblance between stack traces. Also that bug refers to abrt while in my case 
the booting doesn't get so far.
Comment 45 Michele Baldessari 2014-01-02 12:13:39 EST
Hi Ionut,

well kdump can trigger on any oops just set /proc/sys/kernel/panic_on_oops to 1

In terms of stack trace in BZ 983189, your second screenshot shows a trace that
is quite similar to:
 [<ffffffff81320b91>] pci_device_probe+0x121/0x130
 [<ffffffff813e5c27>] driver_probe_device+0x87/0x390
 [<ffffffff813e6003>] __driver_attach+0x93/0xa0
 [<ffffffff813e5f70>] ? __device_attach+0x40/0x40
 [<ffffffff813e3c53>] bus_for_each_dev+0x63/0xa0
 [<ffffffff813e56ae>] driver_attach+0x1e/0x20
 [<ffffffff813e5238>] bus_add_driver+0x1e8/0x2b0
 [<ffffffffa036e000>] ? 0xffffffffa036dfff
 [<ffffffff813e6641>] driver_register+0x71/0x150
 [<ffffffffa036e000>] ? 0xffffffffa036dfff
 [<ffffffff8131f72b>] __pci_register_driver+0x4b/0x50
 [<ffffffffa00278ba>] drm_pci_init+0x11a/0x130 [drm]
 [<ffffffffa036e000>] ? 0xffffffffa036dfff
 [<ffffffffa036e04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]

It is truncated so we are obviously not 100% sure, but it is worth keeping an
eye on.

If you could manage to get a dump, we might confirm or dispel this theory.
But without a full log there is not much else that can be done. Except 
always trying later kernels and see if this is fixed.

hth,
Michele
Comment 46 Ionut Radu 2014-01-02 12:34:37 EST
Hi Michele,

How I'm supposed to issue "set /proc/sys/kernel/panic_on_oops to 1" if the booting is not succesful ?

thanks,
Ionut.
Comment 47 Michele Baldessari 2014-01-02 13:16:57 EST
Hi Ionut,

well if the oops happens before systemd does its /etc/sysctl.conf parsing we
could try:
1) Just pass the kernel option oops=panic

2) 
- Blacklist nouveau
- Boot normally until get to the text console (either it happens via vesafb or pure text)
- login and verify the sysctl
- modprobe nouveau


1 is simpler (obviously). With 2 we might actually be able to scroll back via
Shift-PageUP and see the whole stacktrace (unless there are other issues with
modprobing the driver later on)

hth,
Michele
Comment 48 Ionut Radu 2014-01-02 13:57:30 EST
Hi Michele,

See below the answers:

1) even though  kernel option oops=panic is used, init scripts are still not reached so the capture kernel is not loaded so kdump won't work

2) modprobe nouveau does not result in a kernel crash:

lsmod | grep nouveau
nouveau               952573  0 
mxm_wmi                12865  1 nouveau
wmi                    18804  2 mxm_wmi,nouveau
i2c_algo_bit           13257  1 nouveau
ttm                    79787  1 nouveau
drm_kms_helper         50287  1 nouveau
drm                   283349  3 ttm,drm_kms_helper,nouveau
i2c_core               38302  6 drm,i2c_i801,drm_kms_helper,i2c_algo_bit,nouveau,videodev
video                  19104  1 nouveau
Comment 49 Ionut Radu 2014-01-04 04:22:29 EST
Created attachment 845295 [details]
nouveau messages for kernel 3.13.0


Hi Michele,

It looks to me that nouveau driver tries to execute some code from NVIDIA 
card nvram and that's when the crash occurs.
Please see the nouveau.jpg snapshot taken for kernel 3.13.0.

thanks,
Ionut.
Comment 50 Ionut Radu 2014-01-04 08:24:10 EST
Hi Michele,

I've taken one more look to https://bugzilla.redhat.com/show_bug.cgi?id=983189
and noticed that i915 driver is present too, so it 
looks like an optimus laptop there having intel and
nvidia graphic while my laptop is about 5 years old and 
it has nvidia graphic only. I don't think the two bugs 
are related.

thanks,
Ionut.
Comment 51 Ionut Radu 2014-01-04 10:17:26 EST
Hi Michele,

I've suceeded to reproduce kernel crash on "modprobe 
nouveau" but the kernel panic is not triggered with either
oops=panic kernel option or "echo 1 > /proc/sys/kernel/panic_on_oops"
Indeed when using oops=panic the content of /proc/sys/kernel/panic_on_oops is already 1 instead of default 0.

Do you have any other ideea on how to enforce trigger of kernel panic?

Thanks,
Ionut
Comment 52 Michele Baldessari 2014-01-04 10:41:41 EST
Hi Ionut,

yes 983189 is not necessarily the same thing but it had the exact same stack 
trace (up to where we can see it) so it is worth keeping an eye on.

Odd that the crash happened but no kernel panic. Are you sure there is an oops?

I don't see any crashes in the jpg you just uploaded?

Michele
Comment 53 Ionut Radu 2014-01-04 10:52:57 EST
Hi Michele,

In the jpg I've uploaded I've captured the moment before the crash.
I've succeeded to obtain a vmcore, but saving of vmcore-dmesg failed.

Please check:
https://www.dropbox.com/sh/e77p700zr8g1v4z/y3ldY3npQB

thanks,
Ionut.
Comment 54 Ionut Radu 2014-01-04 11:17:02 EST

vmcore is for kernel 3.13.0.
I've enforced kernel panic through sysrq (i.e pressed alt + sysrq + c). Hope this helps. I couldn't trigger a kernel panic another way. I've even set all /proc/sys/kernel/panic_* to 1 and still no panic was triggered.

Thanks,
Ionut.
Comment 55 Michele Baldessari 2014-01-04 11:18:00 EST
Hi Ionut,

I've downloaded the vmcore. Am downloading the corresponding rpm's so I can
try and see what is up.

It'll take a bit as I am on a slow connection currently.

regards,
Michele
Comment 56 Ionut Radu 2014-01-04 11:23:17 EST

Hi Michele,

I've forgot to mention that trying different ways of obtaining a dump, 
I've used kernel-3.13.0-debug (3.13.0-0.rc6.git0.1.fc21.x86_64+debug), not the usual kernel.

thanks,
Ionut.
Comment 57 Ionut Radu 2014-01-04 11:50:53 EST
Created attachment 845414 [details]
vmcore log


Log from vmcore
Comment 58 Michele Baldessari 2014-01-04 12:01:28 EST
Hi Ionut,

great thanks. This is https://bugs.freedesktop.org/show_bug.cgi?id=72943

hth,
Michele
Comment 59 Ionut Radu 2014-01-04 12:14:04 EST
Hi Michele,

This guy is saying he can't boot any kernel above 3.2 while I can boot 3.6.10 from Fedora 18.

Thanks,
Ionut.
Comment 60 Ionut Radu 2014-01-04 12:59:34 EST
Thanks Michele, indeed it looks like it's the same bug. There is the same video card and slightly different vbios version.

Regards,
Ionut Radu.
Comment 61 Michele Baldessari 2014-01-04 15:15:59 EST
Hi Ionut,

thanks for your persistence here ;) At least we have pinpointed the exact
place where we fail.

Changing the title to make it reflect the situation.

regards,
Michele
Comment 62 Ionut Radu 2014-01-05 07:16:35 EST
Hi Michele,

Thanks for your useful suggestions. At some point I thought 
that filling this bug was just a waste of time.

Let's keep a reference of the original summary "BUG: soft lockup - CPU#0 stuck for 23s [systemd-udevd:194]" in order to help others having the same problem and seeing this message.

thanks,
Ionut.
Comment 63 Ionut Radu 2014-01-08 16:26:38 EST
Issue was fixed.

Thanks,
Ionut.
Comment 64 Ionut Radu 2014-01-08 16:27:08 EST
Issue was fixed.

Thanks,
Ionut.

Note You need to log in before you can comment on or make changes to this bug.