Bug 493526

Summary: FC11 Beta install will not completey boot up with PAE kernel...
Product: [Fedora] Fedora Reporter: Shannon McMackin <smcmackin>
Component: xorg-x11-drv-intelAssignee: Kristian Høgsberg <krh>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: ajax, darrellpf, dcantrell, kernel-maint, kmcmartin, mcepl, mefoster, nathaniel, nsoranzo, quintela, smcmackin, sundaram, tmarikle, wwoods, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.29.3-153 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-20 14:05:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446452    
Attachments:
Description Flags
xorg.conf
none
Xorg.log file
none
Xorg.log that shows HAL is not presenting devices to the PAE kernel
none
Xorg.log file from -135 PAE kernel none

Description Shannon McMackin 2009-04-02 05:27:33 UTC
Description of problem: After installing from the Fedora 11 Beta DVD, the machine will not boot.  This is a Lenovo T61 with Intel GM965 graphics.  I get through the progress bar on the splash screen and then it reverts to the text boot screen and then goes to a blank screen with a cursor in the upper left corner.

When booting with quiet removed from kernel line, the process hangs at Starting atd.

I've tried all the appends recommended on the wiki to no avail.  I can't get to a tty or any other source of output.


Version-Release number of selected component (if applicable): Rawhide 10.92


How reproducible: Happens every boot


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Shannon McMackin 2009-04-02 20:56:41 UTC
After trying the LiveCD, I found that it's the PAE kernel that would not boot on my system.  The DVD installs the PAE kernel because it detects 4gb of RAM.  The LiveCD installs the generic kernel which boots fine.

Comment 2 Shannon McMackin 2009-04-03 01:28:55 UTC
Here's something I've managed to get out of /var/log/messages:

Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: =======================================================
Apr  2 16:23:39 localhost kernel: [ INFO: possible circular locking dependency detected ]
Apr  2 16:23:39 localhost kernel: 2.6.29.1-37.rc1.fc11.i686.PAE #1
Apr  2 16:23:39 localhost kernel: -------------------------------------------------------
Apr  2 16:23:39 localhost kernel: Xorg/2735 is trying to acquire lock:
Apr  2 16:23:39 localhost kernel: (&mm->mmap_sem){----}, at: [<c0499e2e>] might_fault+0x48/0x85
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: but task is already holding lock:
Apr  2 16:23:39 localhost kernel: (&dev->struct_mutex){--..}, at: [<f825fbd4>] i915_gem_pwrite_ioctl+0x155/0x338 [i915]
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: which lock already depends on the new lock.
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: the existing dependency chain (in reverse order) is:
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: -> #1 (&dev->struct_mutex){--..}:
Apr  2 16:23:39 localhost kernel:       [<c0458fe4>] __lock_acquire+0x96e/0xacc
Apr  2 16:23:39 localhost kernel:       [<c045919d>] lock_acquire+0x5b/0x81
Apr  2 16:23:39 localhost kernel:       [<c0727dd9>] __mutex_lock_common+0xdd/0x338
Apr  2 16:23:39 localhost kernel:       [<c07280db>] mutex_lock_nested+0x33/0x3b
Apr  2 16:23:39 localhost kernel:       [<f82065e1>] drm_gem_mmap+0x36/0xfe [drm]
Apr  2 16:23:39 localhost kernel:       [<c04a0fb0>] mmap_region+0x266/0x3e5
Apr  2 16:23:39 localhost kernel:       [<c04a1384>] do_mmap_pgoff+0x255/0x2a5
Apr  2 16:23:39 localhost kernel:       [<c040c588>] sys_mmap2+0x5f/0x80
Apr  2 16:23:39 localhost kernel:       [<c040946b>] sysenter_do_call+0x12/0x3f
Apr  2 16:23:39 localhost kernel:       [<ffffffff>] 0xffffffff
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: -> #0 (&mm->mmap_sem){----}:
Apr  2 16:23:39 localhost kernel:       [<c0458eb1>] __lock_acquire+0x83b/0xacc
Apr  2 16:23:39 localhost kernel:       [<c045919d>] lock_acquire+0x5b/0x81
Apr  2 16:23:39 localhost kernel:       [<c0499e4b>] might_fault+0x65/0x85
Apr  2 16:23:39 localhost kernel:       [<f825fcd2>] i915_gem_pwrite_ioctl+0x253/0x338 [i915]
Apr  2 16:23:39 localhost kernel:       [<f8205748>] drm_ioctl+0x1ab/0x224 [drm]
Apr  2 16:23:39 localhost kernel:       [<c04be6ae>] vfs_ioctl+0x5c/0x76
Apr  2 16:23:39 localhost kernel:       [<c04bec4e>] do_vfs_ioctl+0x483/0x4bd
Apr  2 16:23:39 localhost kernel:       [<c04becce>] sys_ioctl+0x46/0x66
Apr  2 16:23:39 localhost kernel:       [<c040946b>] sysenter_do_call+0x12/0x3f
Apr  2 16:23:39 localhost kernel:       [<ffffffff>] 0xffffffff
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: other info that might help us debug this:
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: 1 lock held by Xorg/2735:
Apr  2 16:23:39 localhost kernel: #0:  (&dev->struct_mutex){--..}, at: [<f825fbd4>] i915_gem_pwrite_ioctl+0x155/0x338 [i915]
Apr  2 16:23:39 localhost kernel:
Apr  2 16:23:39 localhost kernel: stack backtrace:
Apr  2 16:23:39 localhost kernel: Pid: 2735, comm: Xorg Not tainted 2.6.29.1-37.rc1.fc11.i686.PAE #1
Apr  2 16:23:39 localhost kernel: Call Trace:
Apr  2 16:23:39 localhost kernel: [<c0726c76>] ? printk+0x14/0x16
Apr  2 16:23:39 localhost kernel: [<c0458461>] print_circular_bug_tail+0x5d/0x68
Apr  2 16:23:39 localhost kernel: [<c0458eb1>] __lock_acquire+0x83b/0xacc
Apr  2 16:23:39 localhost kernel: [<c0499e2e>] ? might_fault+0x48/0x85
Apr  2 16:23:39 localhost kernel: [<c045919d>] lock_acquire+0x5b/0x81
Apr  2 16:23:39 localhost kernel: [<c0499e2e>] ? might_fault+0x48/0x85
Apr  2 16:23:39 localhost kernel: [<c0499e4b>] might_fault+0x65/0x85
Apr  2 16:23:39 localhost kernel: [<c0499e2e>] ? might_fault+0x48/0x85
Apr  2 16:23:39 localhost kernel: [<f825fcd2>] i915_gem_pwrite_ioctl+0x253/0x338 [i915]
Apr  2 16:23:39 localhost kernel: [<c0576713>] ? copy_from_user+0x32/0x119
Apr  2 16:23:39 localhost kernel: [<f8205748>] drm_ioctl+0x1ab/0x224 [drm]
Apr  2 16:23:39 localhost kernel: [<f825fa7f>] ? i915_gem_pwrite_ioctl+0x0/0x338 [i915]
Apr  2 16:23:39 localhost kernel: [<c04be6ae>] vfs_ioctl+0x5c/0x76
Apr  2 16:23:39 localhost kernel: [<c04bec4e>] do_vfs_ioctl+0x483/0x4bd
Apr  2 16:23:39 localhost kernel: [<c04569ca>] ? lock_release_holdtime+0x2b/0x123
Apr  2 16:23:39 localhost kernel: [<c04748ef>] ? audit_filter_syscall+0xcc/0xed
Apr  2 16:23:39 localhost kernel: [<c0474903>] ? audit_filter_syscall+0xe0/0xed
Apr  2 16:23:39 localhost kernel: [<c0475ac2>] ? audit_syscall_entry+0x163/0x185
Apr  2 16:23:39 localhost kernel: [<c04becce>] sys_ioctl+0x46/0x66
Apr  2 16:23:39 localhost kernel: [<c040946b>] sysenter_do_call+0x12/0x3f

Comment 3 Shannon McMackin 2009-04-03 22:03:03 UTC
I installed all of today's updates as recommended in the fedora forum.  Still the same results.  Tried adding selinux 0 3 and running startx after login and it locks.  Then I tried adding selinux 0 3 nomodeset and still had the same hard lock.

Here's the entry from /var/log/messages:

Apr  3 17:51:04 localhost kernel: reserve_memtype: calling reserve_ram_pages_type for 0x32d99000 0x32d9a000 16

This was with nomodeset, the previous boot had no messages at all.  The above line spawns for thousands of entries.

Comment 4 Matěj Cepl 2009-04-06 13:55:17 UTC
Thanks for the bug report.  We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue.

Please attach your X server config file (/etc/X11/xorg.conf, if available) and X server log file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link below.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 5 Shannon McMackin 2009-04-06 19:39:38 UTC
Created attachment 338385 [details]
xorg.conf

I created this xorg.conf to see if it would improve the condition.  Previous errors had no xorg.conf associated by default.

Comment 6 Shannon McMackin 2009-04-06 19:43:45 UTC
I would like to post a copy of the Xorg.log, but it's size is 0, so it doesn't seem to ever get generated due to the hang.  I will attempt a boot with the PAE kernel to runlevel 3 and then run startx to see if that generates something.

Comment 7 Shannon McMackin 2009-04-06 19:54:45 UTC
Created attachment 338387 [details]
Xorg.log file

This file was generated by booting the PAE kernel to run-level 3 and then issuing startx from the command-line of a logged in user account.

Comment 8 Shannon McMackin 2009-04-07 00:25:24 UTC
As an update, I installed the newest kernel from koji, 2.6.29.1-52.fc11.i686-PAE, and it made no difference.  Same lockup..

Comment 9 Shannon McMackin 2009-04-09 04:18:33 UTC
I applied all of today's updates and have the same symptom with the latest -54 PAE kernel as well as the latest Intel driver package..

Comment 10 Shannon McMackin 2009-04-13 16:08:59 UTC
I reinstalled the x86 version from snap1 and installed the latest updates and the -68 PAE kernel and had the same problem...

Comment 11 Shannon McMackin 2009-04-13 16:09:56 UTC
Forgot to add that I've also been testing the x86_64 build and have not seen this issue with that kernel...

Comment 12 Shannon McMackin 2009-04-17 03:22:50 UTC
The -70 PAE and -85 PAE kernels still experience the same problem.

I'm also updating xorg with every update I find.

I've also confirmed this is happening on other Intel-based systems trying to use the PAE kernel.

Any update on the status?

Comment 13 Shannon McMackin 2009-04-22 01:06:55 UTC
I went to the -104 PAE kernel and still no luck.  I even implemented a basic xorg.conf file with the vesa driver and X wouldn't start.  I'm starting to think that this has more to do with the PAE kernel than it does with the intel driver.

Comment 14 Chuck Ebbert 2009-04-23 22:11:41 UTC
(In reply to comment #13)
> I went to the -104 PAE kernel and still no luck.  I even implemented a basic
> xorg.conf file with the vesa driver and X wouldn't start.  I'm starting to
> think that this has more to do with the PAE kernel than it does with the intel
> driver.  

Try the i586 kernel...

Comment 15 Shannon McMackin 2009-04-24 01:34:33 UTC
The i586 kernel works fine, so does the x86_64 kernel.  This would lead me to believe that there's something wrong with the PAE kernel which is what I originally filed this bug with.  I would like to use the PAE kernel because I have 4gb of RAM and don't want to have to mix 32- and 64-bit libraries on the same system.

Comment 16 Shannon McMackin 2009-04-29 03:56:06 UTC
I reinstalled the preview release and applied updates.  Still the same issue with the -111 PAE kernel.  I can boot up in runlevel 3, but when I execute startx, it hangs.  Any Xorg.log file for the corresponding boot is a 0 size file.  Is there anything else I can provide that would help debug this issue?

Comment 17 Shannon McMackin 2009-04-29 22:30:57 UTC
Created attachment 341847 [details]
Xorg.log that shows HAL is not presenting devices to the PAE kernel

Comment 18 Shannon McMackin 2009-05-01 16:25:23 UTC
The previous post and attachment was accomplished by booting to runlevel 3 and executing startx from a user account.

Comment 19 Thomas 2009-05-01 18:55:34 UTC
I experienced the same issue with my intel i965 laptop.  Based on comments related to 4GB RAM and the PAE kernel, I decided to try to reinstall with 2GB RAM removed.  The install was successful.  It was surprised to see that the PAE kernel was installed again with only 2GB RAM in the system.

At any rate, after successful install, I put the additional 2GB RAM back in and the system hung just before switching to graphical X mode.  Some one suggested (here: http://forums.fedoraforum.org/showthread.php?t=219710) that I restart the system with selinux 0 3 and compare the Xorg.0.conf file with one that succeeded to see what might have caused the issue.  

I commented that the 4GB test failed as it was starting the randr.  Log exerpt:

(==) intel(0): Backing store disabled
(==) intel(0): Silken mouse enabled
(II) intel(0): Initializing HW Cursor
(II) intel(0): Fixed memory allocation layout:
(II) intel(0): 0x00000000-0xffffffff: DRI memory manager (0 kB)
(II) intel(0): 0x00000000:            end of aperture
(II) intel(0): BO memory allocation layout:
(II) intel(0): 0x00000000:            start of memory manager
(II) intel(0): 0x00bd7000-0x00fbefff: front buffer (4000 kB) X tiled
(II) intel(0): 0x00bc7000-0x00bd0fff: HW cursors (40 kB)
(II) intel(0): 0x00000000:            end of memory manager
***4gb test fails here***
(II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
(II) intel(0): DPMS enabled
(==) intel(0): Intel XvMC decoder disabled
(II) intel(0): Set up textured video
(II) intel(0): direct rendering: DRI2 Enabled
(--) RandR disabled
(II) Initializing built-in extension Generic Event Extension
(II) Initializing built-in extension SHAPE

Following that, another suggestion was made to add a kernel parameter to grub's entry as follows: mem=3072M  That was successful and I have since bumped that up to 4095M, which is also successful.

Comment 20 Nathaniel McCallum 2009-05-03 01:09:05 UTC
This is a nasty bug and should be considered as a release blocker.  There are a *lot* of systems out there right now that will trigger this bug.  Further, its not a bug that can be fixed by an update since people can potentially have broken installs.

Comment 21 Shannon McMackin 2009-05-03 15:17:18 UTC
I was able to bump up my mem= statement to 4096M and I can log into X.

Here's the catch now.  I have an interest to run the PAE kernel to be able to make use of all 4gb of my ram.  I think it has something to do with the memory statement, but I can only use 3gb of ram.  I have a T7300 core2-duo and with other PAE-enabled kernels I'm able to see all 4gb in /proc/meminfo.  If I login to runlevel 3 I can see all 4gb in /proc/meminfo.

Comment 22 Rahul Sundaram 2009-05-04 15:47:02 UTC
Adding it to the blocker list. Whoever is managing is, please review.

Comment 23 Shannon McMackin 2009-05-06 17:19:02 UTC
Trying the -129 PAE kernel and I still have the same condition.  mem=4096M allows me to boot, but I can only use 3gb of the 4gb of RAM in my system.

Comment 24 Chuck Ebbert 2009-05-07 03:37:12 UTC
*** Bug 496283 has been marked as a duplicate of this bug. ***

Comment 25 Shannon McMackin 2009-05-08 02:21:42 UTC
Trying the -132 PAE kernel yields the same result.  I had hopes when I saw krh scheduled the build, but it must've been for a different intel related issue.

Comment 26 Shannon McMackin 2009-05-09 05:41:34 UTC
Tried the -135 PAE kernel with the same results.  Posted the Xorg.log.old for any clues.  I only see some warnings about a vendor ID block.

Comment 27 Shannon McMackin 2009-05-09 05:42:44 UTC
Created attachment 343197 [details]
Xorg.log file from -135 PAE kernel

Comment 28 Jesse Keating 2009-05-11 17:05:42 UTC
Not really related to the real bug, but why are you using the 32bit os on this hardware?  You should be able to run the x86_64 system which will allow you full use of your ram and likely avoid this PAE bug.

Comment 29 Nathaniel McCallum 2009-05-11 17:29:37 UTC
For me, two reasons:
1. While running 64bit Fedora, my total resident memory usage is consistently over 2GB.  Running the same system, with the same applications installed, results in a total resident memory size of < 800M.  Thus, my usable memory on 32bit is ~3.2GB while on 64bit it drops to around 1.8G.
2. I need to run 32bit for work related reasons, specifically compiling 32bit apps is far simpler on 32bit.  Yes, I know it is possible on x86_64, but it would waste precious time to set it up.

Additionally, I can think of a variety of reasons why people would run 32bit on a 64bit processor:
1. applications that won't run on 64bit
2. your friend hands you a 32bit cd and says "try this, it rocks!"
3. ignorance
4. working with data files saved in 32bit (rrd files for instance)
...

In short, the vast majority of the world is not running 64bit yet (Windows, Mac OS X or Linux).  And while, within a year or two this will probably change, there are many growing pains and it is not a sufficient "remedy" to merely suggest that people use 64bit.

Comment 30 Shannon McMackin 2009-05-11 17:47:07 UTC
Thanks Nate..

Most of what he said and also:

1. I'd like to avoid the bloat of installing 32- and 64-bit libs on the same system.
2. PAE does what I need and there's no reason a bug like this should be confined to saying, just use x86_64.
3. Lack of 64-bit plugins for Firefox

Comment 31 Shannon McMackin 2009-05-13 05:34:40 UTC
The -142 PAE kernel still fails.

Comparing the logs, the attempt with no mem=4096M append stops at the point where RandR is initializing.  There's no clear entry to indicate why RandR isn't initializing.

Comment 32 Shannon McMackin 2009-05-18 01:41:46 UTC
I know it's not proper etiquette to bump a bug, but I've seen no movement at all on this and the release date is 8 days away.  Is anyone working on this?  Does anyone care about the impact of PAE not working with what should be considered common hardware?

Comment 33 Will Woods 2009-05-19 21:48:04 UTC
The developers are hard at work on this bug. If I understand everything correctly, this patch has been proposed as a fix:
  http://kyle.fedorapeople.org/gem_dma32_on_pae.diff

It's now in the process of being built and tested. We still plan to have this fixed before releasing F11 if at all possible.

Comment 34 darrell pfeifer 2009-05-19 22:42:57 UTC
The problem with 4 gig and PAE kernels is also mentioned at

https://bugzilla.redhat.com/show_bug.cgi?id=488633

I was originally thinking the locking error was causing a hang, but booting with mem=3G was the workaround

Comment 35 Shannon McMackin 2009-05-20 01:46:23 UTC
Will, thanks for the comment.  I look forward to trying the patched kernel as soon as possible.

Darrell, I can add mem=4096M to my grub.conf, but the machine will then only see 3gb of RAM.  If I boot to runlevel 3 with no kernel append, /proc/meminfo will post all 4gb of RAM.

Comment 36 Kyle McMartin 2009-05-20 03:16:40 UTC
Please try the i686-PAE scratch build here, which may fix the issue:
http://koji.fedoraproject.org/koji/taskinfo?taskID=1365129

thanks, Kyle

Comment 37 Shannon McMackin 2009-05-20 05:33:26 UTC
Works like a charm...

Well done and thank you very much...

[SMcMackin@localhost Desktop]$ cat /proc/meminfo
MemTotal:        4046100 kB
MemFree:         3193444 kB
Buffers:           31000 kB
Cached:           433188 kB
SwapCached:            0 kB
Active:           555044 kB
Inactive:         213360 kB
Active(anon):     471552 kB
Inactive(anon):       92 kB
Active(file):      83492 kB
Inactive(file):   213268 kB
Unevictable:           8 kB
Mlocked:               8 kB
HighTotal:       3209928 kB
HighFree:        2620244 kB
LowTotal:         836172 kB
LowFree:          573200 kB
SwapTotal:       4192956 kB
SwapFree:        4192956 kB
Dirty:                88 kB
Writeback:             0 kB
AnonPages:        304216 kB
Mapped:            82556 kB
Slab:              26040 kB
SReclaimable:      11220 kB
SUnreclaim:        14820 kB
PageTables:         7832 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6216004 kB
Committed_AS:    1221620 kB
VmallocTotal:     122880 kB
VmallocUsed:       30868 kB
VmallocChunk:      85600 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10232 kB
DirectMap2M:      899072 kB

Comment 38 Shannon McMackin 2009-05-20 06:12:46 UTC
Works like a charm...

Well done and thank you very much...

[SMcMackin@localhost Desktop]$ cat /proc/meminfo
MemTotal:        4046100 kB
MemFree:         3193444 kB
Buffers:           31000 kB
Cached:           433188 kB
SwapCached:            0 kB
Active:           555044 kB
Inactive:         213360 kB
Active(anon):     471552 kB
Inactive(anon):       92 kB
Active(file):      83492 kB
Inactive(file):   213268 kB
Unevictable:           8 kB
Mlocked:               8 kB
HighTotal:       3209928 kB
HighFree:        2620244 kB
LowTotal:         836172 kB
LowFree:          573200 kB
SwapTotal:       4192956 kB
SwapFree:        4192956 kB
Dirty:                88 kB
Writeback:             0 kB
AnonPages:        304216 kB
Mapped:            82556 kB
Slab:              26040 kB
SReclaimable:      11220 kB
SUnreclaim:        14820 kB
PageTables:         7832 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6216004 kB
Committed_AS:    1221620 kB
VmallocTotal:     122880 kB
VmallocUsed:       30868 kB
VmallocChunk:      85600 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10232 kB
DirectMap2M:      899072 kB

Comment 39 Kyle McMartin 2009-05-20 14:05:32 UTC
Cool, I'm very glad to hear it. I've tossed it into -153.

thanks for testing!
 kyle