Description of problem: Switching from X to tty1 and back again causes a hard freeze. Display freezes with corrupt symbols on it, and becomes un-pingable. Version-Release number of selected component (if applicable): How reproducible: Every time. Steps to Reproduce: 1. Start X 2. Switch to tty1 3. Switch to tty7 Actual results: crash Expected results: X screen Additional info:
I see this too. (II) DevInputMice: ps2EnableDataReporting: succeeded (**) RADEON(0): RADEONSaveScreen(2) (**) RADEON(0): RADEONLeaveVT (**) RADEON(0): RADEONRestore (**) RADEON(0): Ok, leaving now... (**) RADEON(0): RADEONEnterVT *** If unresolved symbols were reported above, they might not *** be the reason for the server aborting. Backtrace: 0: X(xf86SigHandler+0xa8) [0x100839e8] 1: [0x100374] 2: /usr/lib/xorg/modules/drivers/radeon_drv.so(RADEONEnterVT+0x7c) [0xe800e6c]
Program received signal SIGBUS, Bus error. 0x0e7fff58 in RADEONEnterVT (scrnIndex=0, flags=0) at /usr/include/xorg/compiler.h:1148 1148 __asm__ __volatile__( (gdb) p info->MMIO $4 = (unsigned char *) 0x30002000 <Address 0x30002000 out of bounds> (gdb) p RADEONMMIO $5 = (unsigned char *) 0x0
Add on a 'me too' for a Powerbook G4 5,4 (radeon 9600 mobility), using radeon driver, dri disabled, fbdev enabled. I also get a similar freeze when coming out of suspend (not 100% sure it's related but I would expect so...) Happy to do some testing if there are any ideas.
'RADEONMMIO == 0' above is a red herring -- it happened to be using the same register both for the address, and to load the result. But something very bizarre is happening in RADEONEnterVT().
The bizarreness seems to happen when something is logged -- with RADEONTRACE() or xf86DrvMsg(). The PCI config space gets screwed up, and you start to get bus errors when reading the device -- even with radeontool. From a breakpoint in RADEONEnterVT() I can let X run freely to another breakpoint slightly later, and the PCI config space appears to remain sane. However, if I let it run through the call to ErrorF(), that's when strange things happen. When I _removed_ the RADEONTRACE() from the beginning of RADEONEnterVT(), the problem didn't occur there -- it happened later in RADEONWaitForIdleMMIO() instead. If I just stop with gdb on the breakpoint and 'p xf86DrvMsg(0,6,"fish\n")' the device also gets screwed. This is what I see... 00:10.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] (prog-if 00 [VGA]) - Subsystem: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] + Subsystem: Unknown device 3030:202c Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- - Latency: 255 (2000ns min), Cache Line Size 08 + Latency: 82 (2000ns min), Cache Line Size 20 Interrupt: pin A routed to IRQ 48 Region 0: Memory at b8000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at f0000400 [size=256] Region 2: Memory at b0000000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at f1000000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4 - Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- FW- Rate=<none> + Command: RQ=49 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW+ Rate=x2,x4 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- -00: 02 10 50 4e 07 00 b0 02 00 00 00 03 08 ff 00 00 -10: 08 00 00 b8 01 04 00 00 00 00 00 b0 00 00 00 00 -20: 00 00 00 00 00 00 00 00 00 00 00 00 02 10 50 4e -30: 00 00 00 f1 58 00 00 00 00 00 00 00 30 01 08 00 -40: 00 00 00 00 00 00 00 00 00 00 00 00 02 10 50 4e +00: 02 10 50 4e 07 00 b0 02 00 00 00 03 20 52 00 00 +10: 08 00 00 28 01 29 3a 20 00 00 44 45 00 00 00 00 +20: 00 00 00 00 00 00 00 00 00 00 00 00 30 30 2c 20 +30: 00 00 30 29 58 00 00 00 00 00 00 00 4d 01 08 00 +40: 00 00 00 00 00 00 00 00 00 00 00 00 30 30 2c 20 50: 01 00 02 06 00 00 00 00 02 50 20 00 17 02 00 4f -60: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +60: 16 03 00 30 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -80: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +80: 05 00 20 00 48 66 20 75 6e 72 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
This persists when I use the ati-1-0-branch from CVS. Also when I build with gcc 3.2, when I boot with 'video=radeonfb.noaccel=1' or even with 'video=ofonly'. As I said before, it doesn't matter if I have 'usbfbdev' enabled or not either.
Tracing through it with gdb. I didn't spot the _precise_ moment when the latency and cache line size got changed, but it was deep in glibc code at the time. Then another time through the same code, that's when the rest got changed and it started to bus error. It was somewhere in here... 0x0f6994b8 in write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf6994b8 <write>: lwz r10,-29792(r2) (gdb) 0x0f6994bc in write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf6994bc <write+4>: cmpwi r10,0 (gdb) 0x0f6994c0 in write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf6994c0 <write+8>: bne- 0xf6994d4 <__write_nocancel+16> (gdb) 0x0f6994c4 in __write_nocancel () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf6994c4 <__write_nocancel>: li r0,4 (gdb) 0x0f6994c8 in __write_nocancel () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf6994c8 <__write_nocancel+4>: sc (gdb) 0x0f63687c in _IO_new_file_write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf63687c <_IO_new_file_write+108>: cmpwi cr7,r3,0 (gdb) 0x0f636880 in _IO_new_file_write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf636880 <_IO_new_file_write+112>: add r29,r29,r3 (gdb) 0x0f636884 in _IO_new_file_write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf636884 <_IO_new_file_write+116>: bge+ cr7,0xf636850 <_IO_new_file_write+64> (gdb) 0x0f636850 in _IO_new_file_write () from /lib/libc.so.6 4: /x $r2 = 0x3002a210 2: x/i $pc 0xf636850 <_IO_new_file_write+64>: subf. r31,r3,r31 (gdb) i reg r0 0x0 0 r1 0x7fd0dd20 2144394528 r2 0x3002a210 805478928 r3 0xe 14 r4 0xf63687c 258173052 r5 0x20084482 537412738 r6 0xe 14 r7 0xf6994cc 258577612 r8 0xd432 54322 r9 0x0 0 r10 0x1032 4146 r11 0x0 0 r12 0xee0a8000 3993665536 r13 0x1021dc70 270654576 r14 0x10215f54 270622548 r15 0x10215e88 270622344 r16 0x10215f80 270622592 r17 0x10215e78 270622328 r18 0x10215e60 270622304 r19 0x10215ef4 270622452 r20 0x7fd0e260 2144395872 r21 0x0 0 r22 0x102167ac 270624684 r23 0x10215ee4 270622436 r24 0x0 0 r25 0xe 14 r26 0xe 14 r27 0xe 14 r28 0x10246480 270820480 r29 0x102225de 270673374 r30 0xf72dff4 259186676 r31 0xe 14 pc 0xf636850 258173008 cr 0x20084484 537412740 lr 0xf63687c 258173052 ctr 0xc001cc74 3221343348 xer 0x0 0 (gdb) This is part of /proc/$$/maps... 10246000-107b1000 rwxp 10246000 00:00 0 [heap] 30000000-30002000 rw-p 30000000 00:00 0 30021000-30024000 rw-p 30021000 00:00 0 30024000-30044000 rw-s f0000000 00:0e 1577 /dev/mem 30044000-30153000 r-xp 00000000 03:04 263157 /usr/lib/libstdc++.so.6.0.8 30153000-30163000 ---p 0010f000 03:04 263157 /usr/lib/libstdc++.so.6.0.8 30163000-30166000 r--p 0010f000 03:04 263157 /usr/lib/libstdc++.so.6.0.8 30166000-30169000 rw-p 00112000 03:04 263157 /usr/lib/libstdc++.so.6.0.8 30169000-3016e000 rw-p 30169000 00:00 0 3016e000-301ee000 rw-s b0000000 00:0e 1577 /dev/mem 301ee000-341ee000 rw-s b8000000 00:0e 1577 /dev/mem 7fcfa000-7fd0f000 rw-p 7fcfa000 00:00 0 [stack] Note that R2 is 0x3002a210 (in the /dev/mem map) and R2-29792 is 0x30022db0, which is in the previous anonymous mapping. It wasn't that instruction which did it, anyway -- it could well have been the system call. Will try again and be more careful around that area...
Same issue here, on a powerBook pismo 5OO. When switching from text mode to X, also when atempting to put the computer in suspend activity mode, the screen became strange colored, like burning, and the computer is still on, but totaly freezed.
It's definitely happening _on_ the write() syscall. This is the one where the cache line size and latency change: (gdb) 0x0f6994c8 in ?? () from /lib/libc.so.6 2: x/i $pc 0xf6994c8: sc (gdb) i reg r0 0x4 4 r1 0x7f898d20 2139720992 r2 0x3002a210 805478928 r3 0x0 0 r4 0x102225d0 270673360 r5 0x10 16 r6 0x10246480 270820480 r7 0x7f7f7f7f 2139062143 r8 0x80000000 2147483648 r9 0x0 0 r10 0x0 0 r11 0xf636810 258172944 r12 0xffffffff 4294967295 r13 0x1021dc70 270654576 r14 0x10215f54 270622548 r15 0x10215e88 270622344 r16 0x10215f80 270622592 r17 0x10215e78 270622328 r18 0x10215e60 270622304 r19 0x10215ef4 270622452 r20 0x7f899260 2139722336 r21 0x0 0 r22 0x102167ac 270624684 r23 0x10215ee4 270622436 r24 0x0 0 r25 0x10 16 r26 0x10 16 r27 0x10 16 r28 0x10246480 270820480 r29 0x102225d0 270673360 r30 0xf72dff4 259186676 r31 0x10 16 pc 0xf6994c8 258577608 cr 0x20084482 537412738 lr 0xf63687c 258173052 ctr 0xf636810 258172944 xer 0x20000000 536870912 (gdb) si Breakpoint 2, 0x0f6994cc in ?? () from /lib/libc.so.6 2: x/i $pc 0xf6994cc: bnslr+
And this is the one where the Radeon just turns to goo: 0x0f6994c8 in ?? () from /lib/libc.so.6 2: x/i $pc 0xf6994c8: sc (gdb) i reg r0 0x4 4 r1 0x7f898d20 2139720992 r2 0x3002a210 805478928 r3 0x0 0 r4 0x102225d0 270673360 r5 0xe 14 r6 0x10246480 270820480 r7 0x7f7f7f7f 2139062143 r8 0x8000 32768 r9 0x0 0 r10 0x0 0 r11 0xf636810 258172944 r12 0xffffffff 4294967295 r13 0x1021dc70 270654576 r14 0x10215f54 270622548 r15 0x10215e88 270622344 r16 0x10215f80 270622592 r17 0x10215e78 270622328 r18 0x10215e60 270622304 r19 0x10215ef4 270622452 r20 0x7f899260 2139722336 r21 0x0 0 r22 0x102167ac 270624684 r23 0x10215ee4 270622436 r24 0x0 0 r25 0xe 14 r26 0xe 14 r27 0xe 14 r28 0x10246480 270820480 r29 0x102225d0 270673360 r30 0xf72dff4 259186676 r31 0xe 14 pc 0xf6994c8 258577608 cr 0x20084482 537412738 lr 0xf63687c 258173052 ctr 0xf636810 258172944 xer 0x20000000 536870912 (gdb) si Breakpoint 2, 0x0f6994cc in ?? () from /lib/libc.so.6 2: x/i $pc 0xf6994cc: bnslr+ (gdb)
In each case, r3 is zero. According to /proc/$$/maps, file descriptor #0 is /proc/bus/pci/00/10.0
It goes like this... [0f6994cc] write(2, "Ok, leaving now...\n", 19) = 19 [0f6994cc] write(0, "Ok, leaving now...\n", 19) = 19 [0f6994cc] write(2, "(WW) RADEON(0): MMIO is 0x3016e0"..., 57) = 57 [0f6994cc] write(0, "(WW) RADEON(0): MMIO is 0x3016e0"..., 57) = 57 [0f699544] lseek(0, 4, SEEK_SET) = 4 [0f6994cc] write(0, "\4\0\260\2", 4) = 4 [0f699544] lseek(0, 4, SEEK_SET) = 4 [0f6994cc] write(0, "\4\0\260\2", 4) = 4 [0f699544] lseek(0, 4, SEEK_SET) = 4 [0f6994cc] write(0, "\4\0\260\2", 4) = 4 [0f6993d4] close(0) = 0 [0f697918] stat64("/proc/bus/pci/10", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 [0f698cb4] open("/proc/bus/pci/10/13.0", O_RDWR) = 0 Sticking a breakpoint in to catch the close(), the backtrace is this: Breakpoint 1, 0x0f6993d4 in ?? () from /lib/libc.so.6 (gdb) bt #0 0x0f6993d4 in ?? () from /lib/libc.so.6 #1 0x100bfc18 in xf86CloseSerial () #2 0x0e8794e4 in xf86MouseProtocolNameToID () from /usr/lib/xorg/modules/input/mouse_drv.so #3 0x100388cc in DisableDevice () #4 0x100850a0 in xf86Wakeup () #5 0x1004a9c8 in WakeupHandler () #6 0x1019c9b4 in WaitForSomething () #7 0x10045a6c in Dispatch () #8 0x10027260 in main ()
No, wrong backtrace. This is the _offending_ close(): (gdb) bt #0 0x0f6993d0 in ?? () from /lib/libc.so.6 #1 0x100c3aa4 in ATScancode () #2 0x100c3dec in ATScancode () #3 0x100c2740 in pciReadWord () #4 0x100a0418 in initPciBusState () #5 0x1009dd08 in DisablePciBusAccess () #6 0x1007a864 in xf86AccessLeave () #7 0x1008511c in xf86Wakeup () #8 0x1004a9c8 in WakeupHandler () #9 0x1019c9b4 in WaitForSomething () #10 0x10045a6c in Dispatch () #11 0x10027260 in main ()
It's not ATScancode -- it's linuxPciOpenFile(). And it doesn't happen in my own build from what's in CVS for FC-5. But what's in CVS for FC-5 is _older_ than the 1.0.1-9 package which is actually released -- where _is_ that?
I don't know why there's any difference between my build and the official one -- they should be built with the same compiler (I suspect a compiler bug). Neither do I know why the latest package isn't actually in CVS. Nevertheless, my build of 1.0.1-9 is working fine. It's available from http://david.woodhou.se/Xorg-1.0.1-9_FC5.ppc -- just drop it on top of /usr/bin/Xorg Still confused though.
(In reply to comment #15) > Nevertheless, my build of 1.0.1-9 is working fine. It's available from > http://david.woodhou.se/Xorg-1.0.1-9_FC5.ppc -- just drop it on top of /usr/bin/Xorg Be sure to set the permissions correctly when you do that. chown root.root /usr/bin/Xorg chmod 4711 /usr/bin/Xorg
Setting a watchpoint to see when the offending variable gets changed, I see that it's in readKernelMapping() in lnxKbdMap.c. Gdb isn't wonderfully helpful though ... Watchpoint 11: *(int *) 270615684 Old value = 9 New value = 0 KbdGetMapping (pInfo=0x10783e60, pKeySyms=0x7fcbe4f0, pModMap=0x7fcbe4fc "") at lnx_KbdMap.c:306 306 kbe.kb_table = tbl[j]; 3: x/i $pc 0x100be8f8 <KbdGetMapping+184>: nop 2: /d fd = 0 (gdb) p k Variable "k" is not available. (gdb) p j No symbol "j" in current context.
I can confirm David's Xorg binary works for me aswell, suspend and tty switching working again (Thanks David!).
Created attachment 127207 [details] debugging patch This patch shows what's going on. The helpfully-named global array 'map' is declared in hw/xfree86/common/xf86Keymap.h as follows: static KeySym map[NUM_KEYCODES * GLYPHS_PER_KEY] = { ... The loops in readKernelMapping() go over the end of that array, as demonstrated by the debugging output produced by this patch... readKernelMapping. map 0x10213488 is of size 0xf80 (i.e. ends 0x10214408) NUM_CUSTOMKEYS 128, NUM_AT2LNX 248, NUM_KEYCODES 248, GLYPHS_PER_KEY 4 i is 0, j is 0, k is 0x10213498 i is 0, j is 1, k is 0x1021349c i is 0, j is 2, k is 0x102134a0 i is 0, j is 3, k is 0x102134a4 i is 1, j is 0, k is 0x102134a8 .... i is 245, j is 3, k is 0x102143f4 i is 246, j is 0, k is 0x102143f8 i is 246, j is 1, k is 0x102143fc i is 246, j is 2, k is 0x10214400 i is 246, j is 3, k is 0x10214404 i is 247, j is 0, k is 0x10214408 i is 247, j is 1, k is 0x1021440c i is 247, j is 2, k is 0x10214410 i is 247, j is 3, k is 0x10214414 Those last four are off the end of the array (because we _know_ we started four from the start of the array). And it's that which was scribbling on the PCI routine's file descriptor.
Created attachment 127208 [details] Potential fix.
The patch above fixes the problem for me. The Xorg binary linked from above is still broken in some way -- the keyboard code is scribbling over _some_ random memory, but it's just not having such an immediate and dramatic effect.
*** Bug 176759 has been marked as a duplicate of this bug. ***
Switching hardware to 'all' since the buffer overflow isn't arch-specific. It's just coincidence that it happens to land in a place which causes a crash on our beehive powerpc builds, this week.
Hey guys, I'm pushing this fix to updates-testing so we can get wider testing out of it.
xorg-x11-server-1.0.1-9.fc5 has been pushed for fc5, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report.