Description of problem: Usage of X on i965 under dom0 often panics the kernel, leading to a fault that spontaneously reboots the machine. Operations that can cause this: 1) starting X (has happened in gdm) 2) running a GL app 3) typing in a terminal (!) ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at arch/x86_64/mm/pageattr-xen.c:309 invalid opcode: 0000 [1] SMP last sysfs file: /class/drm/card0/dev CPU 1 Modules linked in: nfs lockd fscache nfs_acl i915 drm netconsole bridge netloop netbk blktap blkbk autofs4 hidp rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport intel_rng snd_hda_intel snd_hda_codec sr_mod cdrom snd_seq_dummy snd_seq_oss sg snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 e1000 i2c_core pcspkr serio_raw serial_core snd_timer snd soundcore shpchp snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 3050, comm: Xorg Not tainted 2.6.18-1.2745.el5xen #1 RIP: e030:[<ffffffff8024f677>] [<ffffffff8024f677>] __change_page_attr+0x78c/0xa5e RSP: e02b:ffff88006ec67dd8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000102b000 RCX: 0000000000000023 RDX: 7fffffffffffffff RSI: 0000000000000067 RDI: ffff880000000000 RBP: 0000000000000000 R08: ffff880001418968 R09: 0000000000000001 R10: 0000000000007f00 R11: 8000000000000063 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff80201880 FS: 00002aaaac62afb0(0000) GS:ffffffff8058d080(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process Xorg (pid: 3050, threadinfo ffff88006ec66000, task ffff8800607d0860) Stack: ffffffff8027549e 8000000000000063 000000000005e846 ffffffff8026073d 0000000080000000 ffffffff802601d0 0000000000000000 0000000000000000 ffff880000000000 0000000000000000 Call Trace: [<ffffffff8027549e>] flush_kernel_map+0x0/0x11 [<ffffffff8026073d>] _spin_lock_irq+0x9/0x14 [<ffffffff802601d0>] __down_write_nested+0x35/0x9a [<ffffffff80275b68>] change_page_attr_addr+0x91/0x11a [<ffffffff80381250>] agp_generic_destroy_page+0x4e/0x7a [<ffffffff80381130>] agp_free_memory+0x65/0x90 [<ffffffff80380347>] agp_release+0x9f/0x18a [<ffffffff802124e1>] __fput+0xae/0x198 [<ffffffff80223669>] filp_close+0x5c/0x64 [<ffffffff8021d22a>] sys_close+0x88/0xa2 [<ffffffff8025c65d>] tracesys+0xa7/0xb2 Code: 0f 0b 68 bd 77 46 80 c2 35 01 48 ff c8 49 89 40 10 eb 0a 0f RIP [<ffffffff8024f677>] __change_page_attr+0x78c/0xa5e RSP <ffff88006ec67dd8> <0>Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): 2.6.18-1.2745.el5xen How reproducible: Often. Basically, X + Xen is unusable. Steps to Reproduce: 1. Install tree. 2. Boot Xen domain0 3. Start X. 4. Do some stuff/restart X/wait
Fixing subject, as xen reboots on panic by default.
As a wild guess, agp_release isn't paranoid enough to nop away double-frees or frees of invalid regions.
A kernel built with most CONFIG_DEBUG_* enabled yields: Xorg: Corrupted page table at address a7f300 PGD 6706e067 PUD 695bf067 PMD 5d39c067 PTE 74992fff Bad pagetable: 000f [1] SMP last sysfs file: /class/drm/card0/dev
*** Bug 214287 has been marked as a duplicate of this bug. ***
As an interesting data point (that I have no idea what it means): If you boot into runlevel 5, this happens with the first X instance (if you're using GL.) If you boot into runlevel 3 and run 'telinit 5', it doesn't crash until the second or third time X is started.
And here's one with dri disabled: swap_free: Bad swap offset entry 1000000000000 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/rmap.c:587 invalid opcode: 0000 [1] SMP last sysfs file: /class/xen/blktap0/dev CPU 0 Modules linked in: nfs lockd fscache nfs_acl xt_physdev x_tables netconsole bridge netloop netbk blktap blkbk autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport intel_rng snd_hda_intel snd_hda_codec sr_mod cdrom snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sg snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 e1000 serial_core i2c_core shpchp pcspkr serio_raw dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 3806, comm: python Not tainted 2.6.18-1.2746.el5xen #1 RIP: e030:[<ffffffff8020adfa>] [<ffffffff8020adfa>] page_remove_rmap+0x13/0x2c RSP: e02b:ffff880039dc9c40 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffff880001a165b8 RCX: 030000001f1d1d1d RDX: 0000000000000000 RSI: 0000000057ad6120 RDI: ffff880001a165b8 RBP: 000000001f1d1d00 R08: 0000000000074992 R09: 0000000000006400 R10: 0000003eec201000 R11: 0000000000000000 R12: 0000003eec201000 R13: ffff880041f94008 R14: ffff880038c93b00 R15: 0000003eec208000 FS: 00002aaaaaabdf40(0000) GS:ffffffff8058e000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process python (pid: 3806, threadinfo ffff880039dc8000, task ffff8800411f9100) Stack: ffffffff80207972 0000000000000000 ffff880039dc9d18 ffffffffffffffff 0000000000000000 ffff880037d6d5d0 ffff880039dc9d20 000000000003931e 0000000000000000 0000000138c93b00 Call Trace: [<ffffffff80207972>] unmap_vmas+0x793/0xae7 [<ffffffff80239cea>] exit_mmap+0x7d/0xf8 [<ffffffff8023c02a>] mmput+0x30/0x83 [<ffffffff80214fd5>] do_exit+0x288/0x89a [<ffffffff80247aa3>] cpuset_exit+0x0/0x6b [<ffffffff8022aaab>] get_signal_to_deliver+0x439/0x46c [<ffffffff8025a182>] do_notify_resume+0x9c/0x7b4 [<ffffffff80280afb>] task_rq_lock+0x3f/0x71 [<ffffffff802458c4>] try_to_wake_up+0x365/0x376 [<ffffffff8028d912>] signal_wake_up+0x1e/0x2d [<ffffffff8028e40b>] specific_send_sig_info+0xa4/0xaf [<ffffffff8028e682>] force_sig_info+0xa9/0xb3 [<ffffffff80269537>] do_stack_segment+0x84/0x8b [<ffffffff8025cace>] retint_signal+0x5d/0xb7 Code: 0f 0b 68 21 b3 46 80 c2 4b 02 8b 77 18 83 f6 01 83 e6 01 e9 RIP [<ffffffff8020adfa>] page_remove_rmap+0x13/0x2c RSP <ffff880039dc9c40> <1>Fixing recursive fault but reboot is needed!
Proposing as RC blocker. Xen dom0 + X + i965 = panic-o-rama.
Aha, we have an interesting data point. This is x86-64 specific - the i386 xen kernel is fine.
*** Bug 214650 has been marked as a duplicate of this bug. ***
This really needs to not be assigned to me, I don't understand the agpgart code at all.
Should be fixed by adding the include/asm-x86_64/mach-xen/asm/agp.h file, as in bug 217715. *** This bug has been marked as a duplicate of 217715 ***