Bug 645940

Summary: Kernel will not boot, just sits at 'starting udev'
Product: [Fedora] Fedora Reporter: Thomas Spear <Speeddymon>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: airlied, ajax, bskeggs, dougsland, gansalmon, harald, itamar, jonathan, kernel-maint, madhu.chinakonda, tspshilt
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-28 11:17:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 654280, 654286    

Description Thomas Spear 2010-10-23 04:12:35 UTC
Description of problem:
Just installed latest kernel on my desktop from fedora repos, 2.6.34.7-61.fc13.x86_64, along with other updates that became available between last night and tonight. Rebooted and went to smoke a cigarette. Came back about 3 minutes later and machine was sitting at the blue and white fedora logo. Hit escape to see how far along it was and it was only at 'Starting udev:'

I let it sit there for about a minute before I rebooted. Same problem. So I went back to the kernel before that, 2.6.34.7-56.fc13.x86_64, and the same thing happened. So I went to the kernel before THAT, which happened to be the oldest one in my grub.conf: 2.6.34.6-54.fc13.x86_64

The .6 kernel booted.

The machine is a Dell Optiplex 780 with 1 PCI-E Geforce 8400 GS (PCI ID: 10de:0422) and 2 PCI Geforce 8400 GS (PCI ID: 10de:06e4)

In order to boot the machine with more than one video card (on any kernel, due to an X freeze), I have added intel_iommu=off to the kernel boot command line. This option has worked fine since I first installed Fedora on this machine 2 months ago.

Unfortunately right now, I have not had time to let it just sit for more than a few minutes, as I use this machine for work and I am on the clock. Will get more details including any log files you think might help, when I have more time.

Version-Release number of selected component (if applicable):

[tspear@tomcat ~]$ rpm -qa |grep udev
libgudev1-153-4.fc13.x86_64
udev-153-4.fc13.x86_64
system-config-printer-udev-1.2.4-1.fc13.x86_64
libudev-153-4.fc13.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Boot machine with any .7 kernel from fedora repos
  
Actual results:
Machine sits at 'starting udev'

Expected results:
Machine should boot

Additional info:

In order to boot the machine with more than one video card (on any kernel, due to an X freeze), I have added intel_iommu=off to the kernel boot command line. This option has worked fine since I first installed Fedora on this machine 2 months ago.

Unfortunately right now, I have not had time to let it just sit for more than a few minutes, as I use this machine for work and I am on the clock. Will get more details including any log files you think might help, when I have more time.

Comment 1 Harald Hoyer 2010-10-26 10:26:42 UTC
Maybe the syntax for "intel_iommu=off" has changed? Don't know.

The hang you are experiencing is most likely due to a "modprobe" of a kernel module.

If you have the time boot without "rhgb quiet" and add "modprobedebug" to the kernel command line.

Then you will see lots of modprobes slowly beeing done, which can be speed up by removing "sleep 5" in /sbin/start_udev.

Most likely your computer will hang on one modprobe. You can take a screenshot with a camera and attach it here.

Comment 2 Thomas Spear 2010-10-26 10:30:05 UTC
If I remember, I will do that before I leave this morning. If you don't hear back from me by 9am CDT, assume I've forgotten and will get to it on Friday night.

Thanks for the quick reply!

Comment 3 Thomas Spear 2010-10-26 13:36:02 UTC
Ok, I added the modprobedebug, removed rhgb quiet, commented sleep 5 and rebooted to the failing kernel. It booted just fine, albeit without the splash screen; I can live with that. I decided to redo rhgb quiet and remove modprobedebug but leave the sleep 5 commented. When I rebooted, same issue as before.

I didn't do anything on either reboot with intel_iommu=off (it's still there). So something in rhgb quiet is causing it to not work, or something in modprobedebug is causing it to work... Any other ideas I will try on Friday when I get back into the office.

Comment 4 Thomas Spear 2010-11-18 05:27:42 UTC
I finally debugged this problem after upgrading to Fedora 14 and being forced into the newer kernel.

This is a problem with multiple video cards, or maybe multiple video cards on different busses since I run 2 PCI and 1 PCI Express, and nouveau.

If I rdblacklist nouveau and run the nvidia driver with all 3 cards installed, then the machine boots fine, but runs slow (as reported in bug 654280). If I blacklist nvidia (or remove it), and run nouveau, then the machine hangs at Starting udev until I remove all but 1 of the add-in video cards. Currently I am running on the PCI Express card only.

Comment 5 Thomas Spear 2010-11-18 05:44:16 UTC
This blocks 2 other bugs of mine from being fixed at the moment, since I cannot test nouveau with 2 or 3 cards installed. Whatever debug information you need from me, please let me know how to get it. If I need to chkconfig udev off to get the machine to boot and then run it manually to get some logs, then I'll do it.

Right now I am down one monitor due to the slowness in bug 654280, and I cannot test if konsole crashes with 2+ cards and nouveau in bug 654286

Comment 6 Harald Hoyer 2010-11-18 11:53:29 UTC
Remember "rdblacklist" only prevents the loading in the initramfs.
Preventing a module to be loaded during "Start udev" is done via:

# echo "blacklist <modulename>" >> /etc/modprobe.d/myblacklist.conf

Comment 7 Thomas Spear 2010-11-18 12:47:55 UTC
But that is just it, I had it blacklisted in both. Whenever I blacklist something, I always blacklist it with rdblacklist and in blacklist.conf.

Comment 8 Thomas Spear 2010-11-18 22:10:36 UTC
Since I am producing this again, I'm going to test the modprobedebug again and see if I can get any output for you to play with.

Comment 9 Thomas Spear 2010-12-01 11:29:00 UTC
Still waiting for an opportunity to test this. Sorry for the lack of input

Comment 10 York Possemiers 2011-03-13 08:56:41 UTC
I had this issue too. 
2.6.35.11-83.fc14.x86_64
Using two Nvidia PCI-E cards, a Gigabyte 1GB 460GTX and a ECS 512MB 9500GT.
Running a GA-X58A-UD3R Rev2.0 board

Initially, I had the issue where it would halt at starting udev. It would get to initialising and mapping the IRQ for one of the cards (nvidia 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16). This occurred every time I booted.

I decided to remove the PCI card adjacent (a TV tuner), and then sometime I would be able to boot, other times, it got a bit further than before, but it was obviously still stuck around the initialisation of the graphics cards. Note I had this issue in Fedora 13.

Just a note: I am using the NVidia Propriety drivers

When it did boot, the second card wasn't recognised and the error that turned up in dmesg: NVRM: failed to copy vbios to system memory.

One of the things I noticed was that both the graphics cards were being put on the same IRQ, and that the second graphics card was being assigned the owner of the PCI bus (or something) by the vgaarbiter. (Something along the lines of vgaarb: transferring owner from PCI:0000:02:00.0 to PCI:0000:03:00.0)

Because of this I decided to play around in the bios enabling and disabling features in the BIOS. I thought that this might show up conflicts, but unfortunately i wasn't able to come up with anything consistent.

Finally I decided to swap the graphics cards (Thinking of the ownership thing) and it has now booted without an apparent hitch. I also enabled most of the features in the bios and added the tuner card back in. I now have 3 screens enabled and I don't seem to have the slowness issue described.

One final point on this is that under windows, the graphics card was recognised and installed without issue, so it is not purely a hardware issue. I suspect it has to do with the vgaarbiter or management of the PCI bus, though I am hardly qualified to make such a judgement.

I know this is not very quantitative or exhaustive testing, but I hope that it sheds some light on the issue. If anything needs any clarification or some other testing is requested, please ask and I will do what I can to accommodate.

Comment 11 York Possemiers 2011-03-13 09:13:26 UTC
Actually, I should clarify, the swapping of the graphics cards may have done more than just reversing the order of the cards on the bus.

On the UD3R, the expansion card slots are as follows

Top (processor etc)
pci-e 1x
pci-e 1x
pci-e 16-1
pci-e 8-1
pci-e 16-2
pci
pci-e 8-2

Previously the 460 was in 16-1 and the 9500 was tried in 16-2 and 8-2. The tuner card was in the pci slot.

The new configuration has the 9500 in 16-1 and the 460 in 8-1. While the 460 was in 16-1 the 9500 would not fit in 8-1 because the 460 is two slots wide. The motherboard manual states that if 16-1 and 8-1 are occupied, they will both be reduced to 8x mode, which implies some kind of sharing.

Comment 12 Bug Zapper 2011-05-31 10:47:12 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bug Zapper 2011-06-28 11:17:08 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.