Bug 493526
Summary: | FC11 Beta install will not completey boot up with PAE kernel... | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Shannon McMackin <smcmackin> | ||||||||||
Component: | xorg-x11-drv-intel | Assignee: | Kristian Høgsberg <krh> | ||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | rawhide | CC: | ajax, darrellpf, dcantrell, kernel-maint, kmcmartin, mcepl, mefoster, nathaniel, nsoranzo, quintela, smcmackin, sundaram, tmarikle, wwoods, xgl-maint | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | 2.6.29.3-153 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2009-05-20 14:05:32 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 446452 | ||||||||||||
Attachments: |
|
Description
Shannon McMackin
2009-04-02 05:27:33 UTC
After trying the LiveCD, I found that it's the PAE kernel that would not boot on my system. The DVD installs the PAE kernel because it detects 4gb of RAM. The LiveCD installs the generic kernel which boots fine. Here's something I've managed to get out of /var/log/messages: Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: ======================================================= Apr 2 16:23:39 localhost kernel: [ INFO: possible circular locking dependency detected ] Apr 2 16:23:39 localhost kernel: 2.6.29.1-37.rc1.fc11.i686.PAE #1 Apr 2 16:23:39 localhost kernel: ------------------------------------------------------- Apr 2 16:23:39 localhost kernel: Xorg/2735 is trying to acquire lock: Apr 2 16:23:39 localhost kernel: (&mm->mmap_sem){----}, at: [<c0499e2e>] might_fault+0x48/0x85 Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: but task is already holding lock: Apr 2 16:23:39 localhost kernel: (&dev->struct_mutex){--..}, at: [<f825fbd4>] i915_gem_pwrite_ioctl+0x155/0x338 [i915] Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: which lock already depends on the new lock. Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: the existing dependency chain (in reverse order) is: Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: -> #1 (&dev->struct_mutex){--..}: Apr 2 16:23:39 localhost kernel: [<c0458fe4>] __lock_acquire+0x96e/0xacc Apr 2 16:23:39 localhost kernel: [<c045919d>] lock_acquire+0x5b/0x81 Apr 2 16:23:39 localhost kernel: [<c0727dd9>] __mutex_lock_common+0xdd/0x338 Apr 2 16:23:39 localhost kernel: [<c07280db>] mutex_lock_nested+0x33/0x3b Apr 2 16:23:39 localhost kernel: [<f82065e1>] drm_gem_mmap+0x36/0xfe [drm] Apr 2 16:23:39 localhost kernel: [<c04a0fb0>] mmap_region+0x266/0x3e5 Apr 2 16:23:39 localhost kernel: [<c04a1384>] do_mmap_pgoff+0x255/0x2a5 Apr 2 16:23:39 localhost kernel: [<c040c588>] sys_mmap2+0x5f/0x80 Apr 2 16:23:39 localhost kernel: [<c040946b>] sysenter_do_call+0x12/0x3f Apr 2 16:23:39 localhost kernel: [<ffffffff>] 0xffffffff Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: -> #0 (&mm->mmap_sem){----}: Apr 2 16:23:39 localhost kernel: [<c0458eb1>] __lock_acquire+0x83b/0xacc Apr 2 16:23:39 localhost kernel: [<c045919d>] lock_acquire+0x5b/0x81 Apr 2 16:23:39 localhost kernel: [<c0499e4b>] might_fault+0x65/0x85 Apr 2 16:23:39 localhost kernel: [<f825fcd2>] i915_gem_pwrite_ioctl+0x253/0x338 [i915] Apr 2 16:23:39 localhost kernel: [<f8205748>] drm_ioctl+0x1ab/0x224 [drm] Apr 2 16:23:39 localhost kernel: [<c04be6ae>] vfs_ioctl+0x5c/0x76 Apr 2 16:23:39 localhost kernel: [<c04bec4e>] do_vfs_ioctl+0x483/0x4bd Apr 2 16:23:39 localhost kernel: [<c04becce>] sys_ioctl+0x46/0x66 Apr 2 16:23:39 localhost kernel: [<c040946b>] sysenter_do_call+0x12/0x3f Apr 2 16:23:39 localhost kernel: [<ffffffff>] 0xffffffff Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: other info that might help us debug this: Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: 1 lock held by Xorg/2735: Apr 2 16:23:39 localhost kernel: #0: (&dev->struct_mutex){--..}, at: [<f825fbd4>] i915_gem_pwrite_ioctl+0x155/0x338 [i915] Apr 2 16:23:39 localhost kernel: Apr 2 16:23:39 localhost kernel: stack backtrace: Apr 2 16:23:39 localhost kernel: Pid: 2735, comm: Xorg Not tainted 2.6.29.1-37.rc1.fc11.i686.PAE #1 Apr 2 16:23:39 localhost kernel: Call Trace: Apr 2 16:23:39 localhost kernel: [<c0726c76>] ? printk+0x14/0x16 Apr 2 16:23:39 localhost kernel: [<c0458461>] print_circular_bug_tail+0x5d/0x68 Apr 2 16:23:39 localhost kernel: [<c0458eb1>] __lock_acquire+0x83b/0xacc Apr 2 16:23:39 localhost kernel: [<c0499e2e>] ? might_fault+0x48/0x85 Apr 2 16:23:39 localhost kernel: [<c045919d>] lock_acquire+0x5b/0x81 Apr 2 16:23:39 localhost kernel: [<c0499e2e>] ? might_fault+0x48/0x85 Apr 2 16:23:39 localhost kernel: [<c0499e4b>] might_fault+0x65/0x85 Apr 2 16:23:39 localhost kernel: [<c0499e2e>] ? might_fault+0x48/0x85 Apr 2 16:23:39 localhost kernel: [<f825fcd2>] i915_gem_pwrite_ioctl+0x253/0x338 [i915] Apr 2 16:23:39 localhost kernel: [<c0576713>] ? copy_from_user+0x32/0x119 Apr 2 16:23:39 localhost kernel: [<f8205748>] drm_ioctl+0x1ab/0x224 [drm] Apr 2 16:23:39 localhost kernel: [<f825fa7f>] ? i915_gem_pwrite_ioctl+0x0/0x338 [i915] Apr 2 16:23:39 localhost kernel: [<c04be6ae>] vfs_ioctl+0x5c/0x76 Apr 2 16:23:39 localhost kernel: [<c04bec4e>] do_vfs_ioctl+0x483/0x4bd Apr 2 16:23:39 localhost kernel: [<c04569ca>] ? lock_release_holdtime+0x2b/0x123 Apr 2 16:23:39 localhost kernel: [<c04748ef>] ? audit_filter_syscall+0xcc/0xed Apr 2 16:23:39 localhost kernel: [<c0474903>] ? audit_filter_syscall+0xe0/0xed Apr 2 16:23:39 localhost kernel: [<c0475ac2>] ? audit_syscall_entry+0x163/0x185 Apr 2 16:23:39 localhost kernel: [<c04becce>] sys_ioctl+0x46/0x66 Apr 2 16:23:39 localhost kernel: [<c040946b>] sysenter_do_call+0x12/0x3f I installed all of today's updates as recommended in the fedora forum. Still the same results. Tried adding selinux 0 3 and running startx after login and it locks. Then I tried adding selinux 0 3 nomodeset and still had the same hard lock. Here's the entry from /var/log/messages: Apr 3 17:51:04 localhost kernel: reserve_memtype: calling reserve_ram_pages_type for 0x32d99000 0x32d9a000 16 This was with nomodeset, the previous boot had no messages at all. The above line spawns for thousands of entries. Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue. Please attach your X server config file (/etc/X11/xorg.conf, if available) and X server log file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link below. We will review this issue again once you've had a chance to attach this information. Thanks in advance. Created attachment 338385 [details]
xorg.conf
I created this xorg.conf to see if it would improve the condition. Previous errors had no xorg.conf associated by default.
I would like to post a copy of the Xorg.log, but it's size is 0, so it doesn't seem to ever get generated due to the hang. I will attempt a boot with the PAE kernel to runlevel 3 and then run startx to see if that generates something. Created attachment 338387 [details]
Xorg.log file
This file was generated by booting the PAE kernel to run-level 3 and then issuing startx from the command-line of a logged in user account.
As an update, I installed the newest kernel from koji, 2.6.29.1-52.fc11.i686-PAE, and it made no difference. Same lockup.. I applied all of today's updates and have the same symptom with the latest -54 PAE kernel as well as the latest Intel driver package.. I reinstalled the x86 version from snap1 and installed the latest updates and the -68 PAE kernel and had the same problem... Forgot to add that I've also been testing the x86_64 build and have not seen this issue with that kernel... The -70 PAE and -85 PAE kernels still experience the same problem. I'm also updating xorg with every update I find. I've also confirmed this is happening on other Intel-based systems trying to use the PAE kernel. Any update on the status? I went to the -104 PAE kernel and still no luck. I even implemented a basic xorg.conf file with the vesa driver and X wouldn't start. I'm starting to think that this has more to do with the PAE kernel than it does with the intel driver. (In reply to comment #13) > I went to the -104 PAE kernel and still no luck. I even implemented a basic > xorg.conf file with the vesa driver and X wouldn't start. I'm starting to > think that this has more to do with the PAE kernel than it does with the intel > driver. Try the i586 kernel... The i586 kernel works fine, so does the x86_64 kernel. This would lead me to believe that there's something wrong with the PAE kernel which is what I originally filed this bug with. I would like to use the PAE kernel because I have 4gb of RAM and don't want to have to mix 32- and 64-bit libraries on the same system. I reinstalled the preview release and applied updates. Still the same issue with the -111 PAE kernel. I can boot up in runlevel 3, but when I execute startx, it hangs. Any Xorg.log file for the corresponding boot is a 0 size file. Is there anything else I can provide that would help debug this issue? Created attachment 341847 [details]
Xorg.log that shows HAL is not presenting devices to the PAE kernel
The previous post and attachment was accomplished by booting to runlevel 3 and executing startx from a user account. I experienced the same issue with my intel i965 laptop. Based on comments related to 4GB RAM and the PAE kernel, I decided to try to reinstall with 2GB RAM removed. The install was successful. It was surprised to see that the PAE kernel was installed again with only 2GB RAM in the system. At any rate, after successful install, I put the additional 2GB RAM back in and the system hung just before switching to graphical X mode. Some one suggested (here: http://forums.fedoraforum.org/showthread.php?t=219710) that I restart the system with selinux 0 3 and compare the Xorg.0.conf file with one that succeeded to see what might have caused the issue. I commented that the 4GB test failed as it was starting the randr. Log exerpt: (==) intel(0): Backing store disabled (==) intel(0): Silken mouse enabled (II) intel(0): Initializing HW Cursor (II) intel(0): Fixed memory allocation layout: (II) intel(0): 0x00000000-0xffffffff: DRI memory manager (0 kB) (II) intel(0): 0x00000000: end of aperture (II) intel(0): BO memory allocation layout: (II) intel(0): 0x00000000: start of memory manager (II) intel(0): 0x00bd7000-0x00fbefff: front buffer (4000 kB) X tiled (II) intel(0): 0x00bc7000-0x00bd0fff: HW cursors (40 kB) (II) intel(0): 0x00000000: end of memory manager ***4gb test fails here*** (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message. (II) intel(0): DPMS enabled (==) intel(0): Intel XvMC decoder disabled (II) intel(0): Set up textured video (II) intel(0): direct rendering: DRI2 Enabled (--) RandR disabled (II) Initializing built-in extension Generic Event Extension (II) Initializing built-in extension SHAPE Following that, another suggestion was made to add a kernel parameter to grub's entry as follows: mem=3072M That was successful and I have since bumped that up to 4095M, which is also successful. This is a nasty bug and should be considered as a release blocker. There are a *lot* of systems out there right now that will trigger this bug. Further, its not a bug that can be fixed by an update since people can potentially have broken installs. I was able to bump up my mem= statement to 4096M and I can log into X. Here's the catch now. I have an interest to run the PAE kernel to be able to make use of all 4gb of my ram. I think it has something to do with the memory statement, but I can only use 3gb of ram. I have a T7300 core2-duo and with other PAE-enabled kernels I'm able to see all 4gb in /proc/meminfo. If I login to runlevel 3 I can see all 4gb in /proc/meminfo. Adding it to the blocker list. Whoever is managing is, please review. Trying the -129 PAE kernel and I still have the same condition. mem=4096M allows me to boot, but I can only use 3gb of the 4gb of RAM in my system. *** Bug 496283 has been marked as a duplicate of this bug. *** Trying the -132 PAE kernel yields the same result. I had hopes when I saw krh scheduled the build, but it must've been for a different intel related issue. Tried the -135 PAE kernel with the same results. Posted the Xorg.log.old for any clues. I only see some warnings about a vendor ID block. Created attachment 343197 [details]
Xorg.log file from -135 PAE kernel
Not really related to the real bug, but why are you using the 32bit os on this hardware? You should be able to run the x86_64 system which will allow you full use of your ram and likely avoid this PAE bug. For me, two reasons: 1. While running 64bit Fedora, my total resident memory usage is consistently over 2GB. Running the same system, with the same applications installed, results in a total resident memory size of < 800M. Thus, my usable memory on 32bit is ~3.2GB while on 64bit it drops to around 1.8G. 2. I need to run 32bit for work related reasons, specifically compiling 32bit apps is far simpler on 32bit. Yes, I know it is possible on x86_64, but it would waste precious time to set it up. Additionally, I can think of a variety of reasons why people would run 32bit on a 64bit processor: 1. applications that won't run on 64bit 2. your friend hands you a 32bit cd and says "try this, it rocks!" 3. ignorance 4. working with data files saved in 32bit (rrd files for instance) ... In short, the vast majority of the world is not running 64bit yet (Windows, Mac OS X or Linux). And while, within a year or two this will probably change, there are many growing pains and it is not a sufficient "remedy" to merely suggest that people use 64bit. Thanks Nate.. Most of what he said and also: 1. I'd like to avoid the bloat of installing 32- and 64-bit libs on the same system. 2. PAE does what I need and there's no reason a bug like this should be confined to saying, just use x86_64. 3. Lack of 64-bit plugins for Firefox The -142 PAE kernel still fails. Comparing the logs, the attempt with no mem=4096M append stops at the point where RandR is initializing. There's no clear entry to indicate why RandR isn't initializing. I know it's not proper etiquette to bump a bug, but I've seen no movement at all on this and the release date is 8 days away. Is anyone working on this? Does anyone care about the impact of PAE not working with what should be considered common hardware? The developers are hard at work on this bug. If I understand everything correctly, this patch has been proposed as a fix: http://kyle.fedorapeople.org/gem_dma32_on_pae.diff It's now in the process of being built and tested. We still plan to have this fixed before releasing F11 if at all possible. The problem with 4 gig and PAE kernels is also mentioned at https://bugzilla.redhat.com/show_bug.cgi?id=488633 I was originally thinking the locking error was causing a hang, but booting with mem=3G was the workaround Will, thanks for the comment. I look forward to trying the patched kernel as soon as possible. Darrell, I can add mem=4096M to my grub.conf, but the machine will then only see 3gb of RAM. If I boot to runlevel 3 with no kernel append, /proc/meminfo will post all 4gb of RAM. Please try the i686-PAE scratch build here, which may fix the issue: http://koji.fedoraproject.org/koji/taskinfo?taskID=1365129 thanks, Kyle Works like a charm... Well done and thank you very much... [SMcMackin@localhost Desktop]$ cat /proc/meminfo MemTotal: 4046100 kB MemFree: 3193444 kB Buffers: 31000 kB Cached: 433188 kB SwapCached: 0 kB Active: 555044 kB Inactive: 213360 kB Active(anon): 471552 kB Inactive(anon): 92 kB Active(file): 83492 kB Inactive(file): 213268 kB Unevictable: 8 kB Mlocked: 8 kB HighTotal: 3209928 kB HighFree: 2620244 kB LowTotal: 836172 kB LowFree: 573200 kB SwapTotal: 4192956 kB SwapFree: 4192956 kB Dirty: 88 kB Writeback: 0 kB AnonPages: 304216 kB Mapped: 82556 kB Slab: 26040 kB SReclaimable: 11220 kB SUnreclaim: 14820 kB PageTables: 7832 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 6216004 kB Committed_AS: 1221620 kB VmallocTotal: 122880 kB VmallocUsed: 30868 kB VmallocChunk: 85600 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10232 kB DirectMap2M: 899072 kB Works like a charm... Well done and thank you very much... [SMcMackin@localhost Desktop]$ cat /proc/meminfo MemTotal: 4046100 kB MemFree: 3193444 kB Buffers: 31000 kB Cached: 433188 kB SwapCached: 0 kB Active: 555044 kB Inactive: 213360 kB Active(anon): 471552 kB Inactive(anon): 92 kB Active(file): 83492 kB Inactive(file): 213268 kB Unevictable: 8 kB Mlocked: 8 kB HighTotal: 3209928 kB HighFree: 2620244 kB LowTotal: 836172 kB LowFree: 573200 kB SwapTotal: 4192956 kB SwapFree: 4192956 kB Dirty: 88 kB Writeback: 0 kB AnonPages: 304216 kB Mapped: 82556 kB Slab: 26040 kB SReclaimable: 11220 kB SUnreclaim: 14820 kB PageTables: 7832 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 6216004 kB Committed_AS: 1221620 kB VmallocTotal: 122880 kB VmallocUsed: 30868 kB VmallocChunk: 85600 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10232 kB DirectMap2M: 899072 kB Cool, I'm very glad to hear it. I've tossed it into -153. thanks for testing! kyle |