Bug 187083

Summary: buffer overflow causes memory corruption, crash on VT switch
Product: [Fedora] Fedora Reporter: Aldy Hernandez <aldyh>
Component: xorg-x11Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: dwmw2, euphorbe, frank, jbarnes, matt, nobody+pnasrat, rstrode
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-21 00:06:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 182226    
Attachments:
Description Flags
debugging patch
none
Potential fix. none

Description Aldy Hernandez 2006-03-28 11:54:06 UTC
Description of problem:
Switching from X to tty1 and back again causes a hard freeze.  Display freezes
with corrupt symbols on it, and becomes un-pingable.

Version-Release number of selected component (if applicable):


How reproducible:
Every time.

Steps to Reproduce:
1. Start X
2. Switch to tty1
3. Switch to tty7
  
Actual results:
crash

Expected results:
X screen

Additional info:

Comment 1 David Woodhouse 2006-03-28 12:26:59 UTC
I see this too.


(II) DevInputMice: ps2EnableDataReporting: succeeded
(**) RADEON(0): RADEONSaveScreen(2)
(**) RADEON(0): RADEONLeaveVT
(**) RADEON(0): RADEONRestore
(**) RADEON(0): Ok, leaving now...
(**) RADEON(0): RADEONEnterVT

   *** If unresolved symbols were reported above, they might not
   *** be the reason for the server aborting.

Backtrace:
0: X(xf86SigHandler+0xa8) [0x100839e8]
1: [0x100374]
2: /usr/lib/xorg/modules/drivers/radeon_drv.so(RADEONEnterVT+0x7c) [0xe800e6c]

Comment 2 David Woodhouse 2006-03-29 11:23:59 UTC
Program received signal SIGBUS, Bus error.
0x0e7fff58 in RADEONEnterVT (scrnIndex=0, flags=0)
    at /usr/include/xorg/compiler.h:1148
1148            __asm__ __volatile__(
(gdb) p info->MMIO
$4 = (unsigned char *) 0x30002000 <Address 0x30002000 out of bounds>
(gdb) p RADEONMMIO
$5 = (unsigned char *) 0x0


Comment 3 Matthew Hall 2006-03-29 16:48:57 UTC
Add on a 'me too' for a Powerbook G4 5,4 (radeon 9600 mobility), using radeon
driver, dri disabled, fbdev enabled.

I also get a similar freeze when coming out of suspend (not 100% sure it's
related but I would expect so...)

Happy to do some testing if there are any ideas.

Comment 4 David Woodhouse 2006-03-29 23:07:40 UTC
'RADEONMMIO == 0' above is a red herring -- it happened to be using the same
register both for the address, and to load the result. But something very
bizarre is happening in RADEONEnterVT().



Comment 5 David Woodhouse 2006-03-30 12:01:30 UTC
The bizarreness seems to happen when something is logged -- with RADEONTRACE()
or xf86DrvMsg(). The PCI config space gets screwed up, and you start to get bus
errors when reading the device -- even with radeontool.

From a breakpoint in RADEONEnterVT() I can let X run freely to another
breakpoint slightly later, and the PCI config space appears to remain sane.

However, if I let it run through the call to ErrorF(), that's when strange
things happen. When I _removed_ the RADEONTRACE() from the beginning of
RADEONEnterVT(), the problem didn't occur there -- it happened later in
RADEONWaitForIdleMMIO() instead.

If I just stop with gdb on the breakpoint and 'p xf86DrvMsg(0,6,"fish\n")' the
device also gets screwed.

This is what I see...

 00:10.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon
9600 M10] (prog-if 00 [VGA])
-       Subsystem: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10]
+       Subsystem: Unknown device 3030:202c
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
-       Latency: 255 (2000ns min), Cache Line Size 08
+       Latency: 82 (2000ns min), Cache Line Size 20
        Interrupt: pin A routed to IRQ 48
        Region 0: Memory at b8000000 (32-bit, prefetchable) [size=128M]
        Region 1: I/O ports at f0000400 [size=256]
        Region 2: Memory at b0000000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at f1000000 [disabled] [size=128K]
        Capabilities: [58] AGP version 2.0
                Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans-
64bit- FW+ AGP3- Rate=x1,x2,x4
-               Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- FW- Rate=<none>
+               Command: RQ=49 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW+ Rate=x2,x4
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
-00: 02 10 50 4e 07 00 b0 02 00 00 00 03 08 ff 00 00
-10: 08 00 00 b8 01 04 00 00 00 00 00 b0 00 00 00 00
-20: 00 00 00 00 00 00 00 00 00 00 00 00 02 10 50 4e
-30: 00 00 00 f1 58 00 00 00 00 00 00 00 30 01 08 00
-40: 00 00 00 00 00 00 00 00 00 00 00 00 02 10 50 4e
+00: 02 10 50 4e 07 00 b0 02 00 00 00 03 20 52 00 00
+10: 08 00 00 28 01 29 3a 20 00 00 44 45 00 00 00 00
+20: 00 00 00 00 00 00 00 00 00 00 00 00 30 30 2c 20
+30: 00 00 30 29 58 00 00 00 00 00 00 00 4d 01 08 00
+40: 00 00 00 00 00 00 00 00 00 00 00 00 30 30 2c 20
 50: 01 00 02 06 00 00 00 00 02 50 20 00 17 02 00 4f
-60: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+60: 16 03 00 30 00 00 00 00 00 00 00 00 00 00 00 00
 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-80: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+80: 05 00 20 00 48 66 20 75 6e 72 00 00 00 00 00 00
 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Comment 6 David Woodhouse 2006-03-30 16:59:47 UTC
This persists when I use the ati-1-0-branch from CVS. Also when I build with gcc
3.2, when I boot with 'video=radeonfb.noaccel=1' or even with 'video=ofonly'.
As I said before, it doesn't matter if I have 'usbfbdev' enabled or not either.

Comment 7 David Woodhouse 2006-03-30 18:31:44 UTC
Tracing through it with gdb. I didn't spot the _precise_ moment when the latency
and cache line size got changed, but it was deep in glibc code at the time. Then
another time through the same code, that's when the rest got changed and it
started to bus error.

It was somewhere in here...

0x0f6994b8 in write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf6994b8 <write>:  lwz     r10,-29792(r2)
(gdb)
0x0f6994bc in write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf6994bc <write+4>:        cmpwi   r10,0
(gdb)
0x0f6994c0 in write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf6994c0 <write+8>:        bne-    0xf6994d4 <__write_nocancel+16>
(gdb)
0x0f6994c4 in __write_nocancel () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf6994c4 <__write_nocancel>:       li      r0,4
(gdb)
0x0f6994c8 in __write_nocancel () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf6994c8 <__write_nocancel+4>:     sc
(gdb)
0x0f63687c in _IO_new_file_write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf63687c <_IO_new_file_write+108>: cmpwi   cr7,r3,0
(gdb)
0x0f636880 in _IO_new_file_write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf636880 <_IO_new_file_write+112>: add     r29,r29,r3
(gdb)
0x0f636884 in _IO_new_file_write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf636884 <_IO_new_file_write+116>:
    bge+    cr7,0xf636850 <_IO_new_file_write+64>
(gdb)
0x0f636850 in _IO_new_file_write () from /lib/libc.so.6
4: /x $r2 = 0x3002a210
2: x/i $pc  0xf636850 <_IO_new_file_write+64>:  subf.   r31,r3,r31
(gdb) i reg
r0             0x0      0
r1             0x7fd0dd20       2144394528
r2             0x3002a210       805478928
r3             0xe      14
r4             0xf63687c        258173052
r5             0x20084482       537412738
r6             0xe      14
r7             0xf6994cc        258577612
r8             0xd432   54322
r9             0x0      0
r10            0x1032   4146
r11            0x0      0
r12            0xee0a8000       3993665536
r13            0x1021dc70       270654576
r14            0x10215f54       270622548
r15            0x10215e88       270622344
r16            0x10215f80       270622592
r17            0x10215e78       270622328
r18            0x10215e60       270622304
r19            0x10215ef4       270622452
r20            0x7fd0e260       2144395872
r21            0x0      0
r22            0x102167ac       270624684
r23            0x10215ee4       270622436
r24            0x0      0
r25            0xe      14
r26            0xe      14
r27            0xe      14
r28            0x10246480       270820480
r29            0x102225de       270673374
r30            0xf72dff4        259186676
r31            0xe      14
pc             0xf636850        258173008
cr             0x20084484       537412740
lr             0xf63687c        258173052
ctr            0xc001cc74       3221343348
xer            0x0      0
(gdb)

This is part of /proc/$$/maps...

10246000-107b1000 rwxp 10246000 00:00 0          [heap]
30000000-30002000 rw-p 30000000 00:00 0
30021000-30024000 rw-p 30021000 00:00 0
30024000-30044000 rw-s f0000000 00:0e 1577       /dev/mem
30044000-30153000 r-xp 00000000 03:04 263157     /usr/lib/libstdc++.so.6.0.8
30153000-30163000 ---p 0010f000 03:04 263157     /usr/lib/libstdc++.so.6.0.8
30163000-30166000 r--p 0010f000 03:04 263157     /usr/lib/libstdc++.so.6.0.8
30166000-30169000 rw-p 00112000 03:04 263157     /usr/lib/libstdc++.so.6.0.8
30169000-3016e000 rw-p 30169000 00:00 0
3016e000-301ee000 rw-s b0000000 00:0e 1577       /dev/mem
301ee000-341ee000 rw-s b8000000 00:0e 1577       /dev/mem
7fcfa000-7fd0f000 rw-p 7fcfa000 00:00 0          [stack]

Note that R2 is 0x3002a210 (in the /dev/mem map) and R2-29792 is 0x30022db0,
which is in the previous anonymous mapping. It wasn't that instruction which did
it, anyway -- it could well have been the system call. Will try again and be
more careful around that area...

Comment 8 pascal 2006-03-30 21:41:06 UTC
Same issue here, on a powerBook pismo 5OO.
When switching from text mode to X, also
when atempting to put the computer in suspend
activity mode, the screen became strange colored,
like burning, and the computer is still on, but
totaly freezed.

Comment 9 David Woodhouse 2006-03-31 07:20:37 UTC
It's definitely happening _on_ the write() syscall. This is the one where the
cache line size and latency change:

(gdb)
0x0f6994c8 in ?? () from /lib/libc.so.6
2: x/i $pc  0xf6994c8:  sc
(gdb) i reg
r0             0x4      4
r1             0x7f898d20       2139720992
r2             0x3002a210       805478928
r3             0x0      0
r4             0x102225d0       270673360
r5             0x10     16
r6             0x10246480       270820480
r7             0x7f7f7f7f       2139062143
r8             0x80000000       2147483648
r9             0x0      0
r10            0x0      0
r11            0xf636810        258172944
r12            0xffffffff       4294967295
r13            0x1021dc70       270654576
r14            0x10215f54       270622548
r15            0x10215e88       270622344
r16            0x10215f80       270622592
r17            0x10215e78       270622328
r18            0x10215e60       270622304
r19            0x10215ef4       270622452
r20            0x7f899260       2139722336
r21            0x0      0
r22            0x102167ac       270624684
r23            0x10215ee4       270622436
r24            0x0      0
r25            0x10     16
r26            0x10     16
r27            0x10     16
r28            0x10246480       270820480
r29            0x102225d0       270673360
r30            0xf72dff4        259186676
r31            0x10     16
pc             0xf6994c8        258577608
cr             0x20084482       537412738
lr             0xf63687c        258173052
ctr            0xf636810        258172944
xer            0x20000000       536870912
(gdb) si

Breakpoint 2, 0x0f6994cc in ?? () from /lib/libc.so.6
2: x/i $pc  0xf6994cc:  bnslr+

Comment 10 David Woodhouse 2006-03-31 07:21:59 UTC
And this is the one where the Radeon just turns to goo:

0x0f6994c8 in ?? () from /lib/libc.so.6
2: x/i $pc  0xf6994c8:  sc
(gdb) i reg
r0             0x4      4
r1             0x7f898d20       2139720992
r2             0x3002a210       805478928
r3             0x0      0
r4             0x102225d0       270673360
r5             0xe      14
r6             0x10246480       270820480
r7             0x7f7f7f7f       2139062143
r8             0x8000   32768
r9             0x0      0
r10            0x0      0
r11            0xf636810        258172944
r12            0xffffffff       4294967295
r13            0x1021dc70       270654576
r14            0x10215f54       270622548
r15            0x10215e88       270622344
r16            0x10215f80       270622592
r17            0x10215e78       270622328
r18            0x10215e60       270622304
r19            0x10215ef4       270622452
r20            0x7f899260       2139722336
r21            0x0      0
r22            0x102167ac       270624684
r23            0x10215ee4       270622436
r24            0x0      0
r25            0xe      14
r26            0xe      14
r27            0xe      14
r28            0x10246480       270820480
r29            0x102225d0       270673360
r30            0xf72dff4        259186676
r31            0xe      14
pc             0xf6994c8        258577608
cr             0x20084482       537412738
lr             0xf63687c        258173052
ctr            0xf636810        258172944
xer            0x20000000       536870912
(gdb) si

Breakpoint 2, 0x0f6994cc in ?? () from /lib/libc.so.6
2: x/i $pc  0xf6994cc:  bnslr+
(gdb)



Comment 11 David Woodhouse 2006-03-31 07:25:27 UTC
In each case, r3 is zero. According to /proc/$$/maps, file descriptor #0 is
/proc/bus/pci/00/10.0

Comment 12 David Woodhouse 2006-03-31 07:58:18 UTC
It goes like this...

[0f6994cc] write(2, "Ok, leaving now...\n", 19) = 19
[0f6994cc] write(0, "Ok, leaving now...\n", 19) = 19
[0f6994cc] write(2, "(WW) RADEON(0): MMIO is 0x3016e0"..., 57) = 57
[0f6994cc] write(0, "(WW) RADEON(0): MMIO is 0x3016e0"..., 57) = 57
[0f699544] lseek(0, 4, SEEK_SET)        = 4
[0f6994cc] write(0, "\4\0\260\2", 4)    = 4
[0f699544] lseek(0, 4, SEEK_SET)        = 4
[0f6994cc] write(0, "\4\0\260\2", 4)    = 4
[0f699544] lseek(0, 4, SEEK_SET)        = 4
[0f6994cc] write(0, "\4\0\260\2", 4)    = 4
[0f6993d4] close(0)                     = 0
[0f697918] stat64("/proc/bus/pci/10", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[0f698cb4] open("/proc/bus/pci/10/13.0", O_RDWR) = 0

Sticking a breakpoint in to catch the close(), the backtrace is this:

Breakpoint 1, 0x0f6993d4 in ?? () from /lib/libc.so.6
(gdb) bt
#0  0x0f6993d4 in ?? () from /lib/libc.so.6
#1  0x100bfc18 in xf86CloseSerial ()
#2  0x0e8794e4 in xf86MouseProtocolNameToID ()
   from /usr/lib/xorg/modules/input/mouse_drv.so
#3  0x100388cc in DisableDevice ()
#4  0x100850a0 in xf86Wakeup ()
#5  0x1004a9c8 in WakeupHandler ()
#6  0x1019c9b4 in WaitForSomething ()
#7  0x10045a6c in Dispatch ()
#8  0x10027260 in main ()




Comment 13 David Woodhouse 2006-03-31 08:30:18 UTC
No, wrong backtrace. This is the _offending_ close():

(gdb) bt
#0  0x0f6993d0 in ?? () from /lib/libc.so.6
#1  0x100c3aa4 in ATScancode ()
#2  0x100c3dec in ATScancode ()
#3  0x100c2740 in pciReadWord ()
#4  0x100a0418 in initPciBusState ()
#5  0x1009dd08 in DisablePciBusAccess ()
#6  0x1007a864 in xf86AccessLeave ()
#7  0x1008511c in xf86Wakeup ()
#8  0x1004a9c8 in WakeupHandler ()
#9  0x1019c9b4 in WaitForSomething ()
#10 0x10045a6c in Dispatch ()
#11 0x10027260 in main ()


Comment 14 David Woodhouse 2006-03-31 09:33:00 UTC
It's not ATScancode -- it's linuxPciOpenFile(). And it doesn't happen in my own
build from what's in CVS for FC-5. But what's in CVS for FC-5 is _older_ than
the 1.0.1-9 package which is actually released -- where _is_ that?

Comment 15 David Woodhouse 2006-03-31 10:07:40 UTC
I don't know why there's any difference between my build and the official one --
they should be built with the same compiler (I suspect a compiler bug). Neither
do I know why the latest package isn't actually in CVS. 

Nevertheless, my build of 1.0.1-9 is working fine. It's available from
http://david.woodhou.se/Xorg-1.0.1-9_FC5.ppc -- just drop it on top of /usr/bin/Xorg

Still confused though.

Comment 16 Josh Boyer 2006-03-31 15:23:28 UTC
(In reply to comment #15)
> Nevertheless, my build of 1.0.1-9 is working fine. It's available from
> http://david.woodhou.se/Xorg-1.0.1-9_FC5.ppc -- just drop it on top of
/usr/bin/Xorg

Be sure to set the permissions correctly when you do that.

chown root.root /usr/bin/Xorg
chmod 4711 /usr/bin/Xorg

Comment 17 David Woodhouse 2006-03-31 18:22:30 UTC
Setting a watchpoint to see when the offending variable gets changed, I see that
it's in readKernelMapping() in lnxKbdMap.c. Gdb isn't wonderfully helpful though ...

Watchpoint 11: *(int *) 270615684

Old value = 9
New value = 0
KbdGetMapping (pInfo=0x10783e60, pKeySyms=0x7fcbe4f0, pModMap=0x7fcbe4fc "")
    at lnx_KbdMap.c:306
306           kbe.kb_table = tbl[j];
3: x/i $pc  0x100be8f8 <KbdGetMapping+184>:     nop
2: /d fd = 0
(gdb) p k
Variable "k" is not available.
(gdb) p j
No symbol "j" in current context.


Comment 18 Matthew Hall 2006-04-01 08:14:54 UTC
I can confirm David's Xorg binary works for me aswell, suspend and tty switching
working again (Thanks David!).

Comment 19 David Woodhouse 2006-04-02 16:05:02 UTC
Created attachment 127207 [details]
debugging patch

This patch shows what's going on. The helpfully-named global array 'map' is
declared in hw/xfree86/common/xf86Keymap.h as follows:

static KeySym map[NUM_KEYCODES * GLYPHS_PER_KEY] = {
...

The loops in readKernelMapping() go over the end of that array, as demonstrated
by the debugging output produced by this patch...

readKernelMapping. map 0x10213488 is of size 0xf80 (i.e. ends 0x10214408)
NUM_CUSTOMKEYS 128, NUM_AT2LNX 248, NUM_KEYCODES 248, GLYPHS_PER_KEY 4
i is 0, j is 0, k is 0x10213498
i is 0, j is 1, k is 0x1021349c
i is 0, j is 2, k is 0x102134a0
i is 0, j is 3, k is 0x102134a4
i is 1, j is 0, k is 0x102134a8
 ....
i is 245, j is 3, k is 0x102143f4
i is 246, j is 0, k is 0x102143f8
i is 246, j is 1, k is 0x102143fc
i is 246, j is 2, k is 0x10214400
i is 246, j is 3, k is 0x10214404
i is 247, j is 0, k is 0x10214408
i is 247, j is 1, k is 0x1021440c
i is 247, j is 2, k is 0x10214410
i is 247, j is 3, k is 0x10214414


Those last four are off the end of the array (because we _know_ we started four
from the start of the array). And it's that which was scribbling on the PCI
routine's file descriptor.

Comment 20 David Woodhouse 2006-04-02 16:34:07 UTC
Created attachment 127208 [details]
Potential fix.

Comment 21 David Woodhouse 2006-04-02 17:12:43 UTC
The patch above fixes the problem for me.

The Xorg binary linked from above is still broken in some way -- the keyboard
code is scribbling over _some_ random memory, but it's just not having such an
immediate and dramatic effect.

Comment 22 Jesse Barnes 2006-04-02 17:44:10 UTC
*** Bug 176759 has been marked as a duplicate of this bug. ***

Comment 23 David Woodhouse 2006-04-02 20:22:01 UTC
Switching hardware to 'all' since the buffer overflow isn't arch-specific. It's
just coincidence that it happens to land in a place which causes a crash on our
beehive powerpc builds, this week.

Comment 24 Ray Strode [halfline] 2006-04-09 16:44:58 UTC
Hey guys, I'm pushing this fix to updates-testing so we can get wider testing
out of it.

Comment 25 Fedora Update System 2006-04-09 17:08:13 UTC
xorg-x11-server-1.0.1-9.fc5 has been pushed for fc5, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 26 Fedora Update System 2006-04-13 13:41:26 UTC
xorg-x11-server-1.0.1-9.fc5 has been pushed for fc5, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.