478622 – memory corruption (mouse movement during file io causes corruption)

Bug 478622 - memory corruption (mouse movement during file io causes corruption)

Summary: memory corruption (mouse movement during file io causes corruption)

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	xorg-x11-server
Sub Component:
Version:	10
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	X/OpenGL Maintenance List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-01-02 10:46 UTC by Manfred Spraul
Modified:	2009-07-28 17:15 UTC (History)
CC List:	3 users (show)
Fixed In Version:	Fedora 11
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-07-28 17:15:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ioload load tester (2.14 KB, text/plain) 2009-01-02 15:39 UTC, Manfred Spraul	no flags	Details
Xorg log (90.91 KB, text/plain) 2009-01-02 15:44 UTC, Manfred Spraul	no flags	Details
hexdumps of the corrupted memory (28.90 KB, text/plain) 2009-01-12 19:32 UTC, Manfred Spraul	no flags	Details
View All

Description Manfred Spraul 2009-01-02 10:46:24 UTC

Description of problem:
Memory corruption, probably file system related

Version-Release number of selected component (if applicable):
Linux cores 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Not really got reproducable, but the corruption happens 5-10 times/day.

Steps to Reproduce:
1. with gcc:
make -j4 bzImage frequently hangs with not-reproducable error messages.

2. with tar (happened once)
tar cvfz out.tar.gz <large directory, around 2.6 GB uncompressed, 1 GB compressed>
<copy, move files>
tar tfz out.tar.gz
The tar file contained one block filled with 0x00 instead of the real data.

3. diff --brief -r <dir 1> <dir 2>
dir 1 on usb hd
dir 2 on SATA drive
around 60 GB data.

Two identical files were reported as different.
  
Actual results:

From make:
1) "The bug is not reproducible, so it is likely a hardware or OS problem."
2) (only once)
<<<
*** glibc detected *** make: free(): invalid next size (fast): 0x0000000002340000 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3c47677ec8]
/lib64/libc.so.6(cfree+0x76)[0x3c4767a486]
>>>

From diff: "binary files differ"
from tar: output of tar cvfz passes tar tfz

Expected results:
no corruptions, no differences

Additional info:
2 GB memory
AMD Phenom 9350e (2 GHz 4 core)
AMD 780 with "ATI Technologies Inc Radeon 3100 Graphics rev 0"
All filesystems ext3

Comment 1 Manfred Spraul 2009-01-02 15:39:45 UTC

Created attachment 328072 [details]
ioload load tester

Test app.
Causes crashes and/or corruptions.
3 parallel instances:

  ioload 800
  ioload 1200
  ioload 2500

Appears to be related to X: No corruptions from console, booted with initlevel=3.

Comment 2 Manfred Spraul 2009-01-02 15:44:00 UTC

Created attachment 328074 [details]
Xorg log

Xorg log.
the radeon driver is from xorg-x11-drv-ati-6.9.0-63.fc10.x86_64

Comment 3 Manfred Spraul 2009-01-02 16:11:55 UTC

I've tried to reassign the bug to the x driver:

No mouse movement, no crash

  ioload 400 and mouse movements - crash after a few seconds (< 5 seconds)

The mouse pointer is sometimes corrupted - instead of the "normal" pointer, just a rectangle (64x64 pixels?) with some colored pixels appears.
After moving, it is sooner or later replaced again with the normal pointer.

Comment 4 Peter Hutterer 2009-01-09 00:43:04 UTC

verified, but reassigning to the server. you're not using the mouse driver but evdev, and 2.0.8 is so simple that I have reasonable doubt that this would be in the server. the corrupted cursor (which I see even when not moving the mouse) indicates this as well.

Anyway, I don't even know how to start debugging this. valgrind doesn't say anything interesting.

ajax: any suggestions?

Comment 5 Manfred Spraul 2009-01-12 19:32:51 UTC

Created attachment 328773 [details]
hexdumps of the corrupted memory

What about the reverse approach?

I got a random memory corruption - sys_read() on the file in /tmp returned incorrect data. Mostly 0x00, sometimes some byte were set.

I've attached all log files that I have. Each syscall reads 512 bytes, the actual and theoretical content are hexdumped.

The numbers appear to be big endian 32-bit values.

Which components are capable of corrupting unrelated memory?
- someone does DMA to the wrong target address (use after free or something similar). My system is Radeon HD 3100, i.e. on-board graphics with shared memory)

- an os kernel component accesses the wrong target address

Who could use big-endian 32-bit numbers?

Who needs at least 512 bytes and zeros the memory?

Comment 6 Manfred Spraul 2009-07-28 17:15:18 UTC

The bug appears to be resolved: with Fedora 11, I do not get any cursor corruptions anymore.

xorg-x11-server-Xorg-1.6.1.901-1.fc11.x86_64
xorg-x11-server-utils-7.4-7.fc11.x86_64
xorg-x11-server-common-1.6.1.901-1.fc11.x86_64

xorg-x11-drv-ati-6.12.2-14.fc11.x86_64

mesa-libGLU-7.6-0.1.fc11.x86_64
mesa-libGL-7.6-0.1.fc11.x86_64
mesa-dri-drivers-7.6-0.1.fc11.x86_64

Note You need to log in before you can comment on or make changes to this bug.