Bug 478622 - memory corruption (mouse movement during file io causes corruption)
memory corruption (mouse movement during file io causes corruption)
Product: Fedora
Classification: Fedora
Component: xorg-x11-server (Show other bugs)
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: X/OpenGL Maintenance List
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2009-01-02 05:46 EST by Manfred Spraul
Modified: 2009-07-28 13:15 EDT (History)
3 users (show)

See Also:
Fixed In Version: Fedora 11
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-07-28 13:15:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
ioload load tester (2.14 KB, text/plain)
2009-01-02 10:39 EST, Manfred Spraul
no flags Details
Xorg log (90.91 KB, text/plain)
2009-01-02 10:44 EST, Manfred Spraul
no flags Details
hexdumps of the corrupted memory (28.90 KB, text/plain)
2009-01-12 14:32 EST, Manfred Spraul
no flags Details

  None (edit)
Description Manfred Spraul 2009-01-02 05:46:24 EST
Description of problem:
Memory corruption, probably file system related

Version-Release number of selected component (if applicable):
Linux cores #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Not really got reproducable, but the corruption happens 5-10 times/day.

Steps to Reproduce:
1. with gcc:
make -j4 bzImage frequently hangs with not-reproducable error messages.

2. with tar (happened once)
tar cvfz out.tar.gz <large directory, around 2.6 GB uncompressed, 1 GB compressed>
<copy, move files>
tar tfz out.tar.gz
The tar file contained one block filled with 0x00 instead of the real data.

3. diff --brief -r <dir 1> <dir 2>
dir 1 on usb hd
dir 2 on SATA drive
around 60 GB data.

Two identical files were reported as different.
Actual results:

From make:
1) "The bug is not reproducible, so it is likely a hardware or OS problem."
2) (only once)
*** glibc detected *** make: free(): invalid next size (fast): 0x0000000002340000 ***
======= Backtrace: =========

From diff: "binary files differ"
from tar: output of tar cvfz passes tar tfz

Expected results:
no corruptions, no differences

Additional info:
2 GB memory
AMD Phenom 9350e (2 GHz 4 core)
AMD 780 with "ATI Technologies Inc Radeon 3100 Graphics rev 0"
All filesystems ext3
Comment 1 Manfred Spraul 2009-01-02 10:39:45 EST
Created attachment 328072 [details]
ioload load tester

Test app.
Causes crashes and/or corruptions.
3 parallel instances:

  ioload 800
  ioload 1200
  ioload 2500

Appears to be related to X: No corruptions from console, booted with initlevel=3.
Comment 2 Manfred Spraul 2009-01-02 10:44:00 EST
Created attachment 328074 [details]
Xorg log

Xorg log.
the radeon driver is from xorg-x11-drv-ati-6.9.0-63.fc10.x86_64
Comment 3 Manfred Spraul 2009-01-02 11:11:55 EST
I've tried to reassign the bug to the x driver:

No mouse movement, no crash

  ioload 400 and mouse movements - crash after a few seconds (< 5 seconds)

The mouse pointer is sometimes corrupted - instead of the "normal" pointer, just a rectangle (64x64 pixels?) with some colored pixels appears.
After moving, it is sooner or later replaced again with the normal pointer.
Comment 4 Peter Hutterer 2009-01-08 19:43:04 EST
verified, but reassigning to the server. you're not using the mouse driver but evdev, and 2.0.8 is so simple that I have reasonable doubt that this would be in the server. the corrupted cursor (which I see even when not moving the mouse) indicates this as well.

Anyway, I don't even know how to start debugging this. valgrind doesn't say anything interesting.

ajax: any suggestions?
Comment 5 Manfred Spraul 2009-01-12 14:32:51 EST
Created attachment 328773 [details]
hexdumps of the corrupted memory

What about the reverse approach?

I got a random memory corruption - sys_read() on the file in /tmp returned incorrect data. Mostly 0x00, sometimes some byte were set.

I've attached all log files that I have. Each syscall reads 512 bytes, the actual and theoretical content are hexdumped.

The numbers appear to be big endian 32-bit values.

Which components are capable of corrupting unrelated memory?
- someone does DMA to the wrong target address (use after free or something similar). My system is Radeon HD 3100, i.e. on-board graphics with shared memory)

- an os kernel component accesses the wrong target address

Who could use big-endian 32-bit numbers?

Who needs at least 512 bytes and zeros the memory?
Comment 6 Manfred Spraul 2009-07-28 13:15:18 EDT
The bug appears to be resolved: with Fedora 11, I do not get any cursor corruptions anymore.




Note You need to log in before you can comment on or make changes to this bug.