Red Hat Bugzilla – Bug 478622
memory corruption (mouse movement during file io causes corruption)
Last modified: 2009-07-28 13:15:18 EDT
Description of problem:
Memory corruption, probably file system related
Version-Release number of selected component (if applicable):
Linux cores 126.96.36.199-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Not really got reproducable, but the corruption happens 5-10 times/day.
Steps to Reproduce:
1. with gcc:
make -j4 bzImage frequently hangs with not-reproducable error messages.
2. with tar (happened once)
tar cvfz out.tar.gz <large directory, around 2.6 GB uncompressed, 1 GB compressed>
<copy, move files>
tar tfz out.tar.gz
The tar file contained one block filled with 0x00 instead of the real data.
3. diff --brief -r <dir 1> <dir 2>
dir 1 on usb hd
dir 2 on SATA drive
around 60 GB data.
Two identical files were reported as different.
1) "The bug is not reproducible, so it is likely a hardware or OS problem."
2) (only once)
*** glibc detected *** make: free(): invalid next size (fast): 0x0000000002340000 ***
======= Backtrace: =========
From diff: "binary files differ"
from tar: output of tar cvfz passes tar tfz
no corruptions, no differences
2 GB memory
AMD Phenom 9350e (2 GHz 4 core)
AMD 780 with "ATI Technologies Inc Radeon 3100 Graphics rev 0"
All filesystems ext3
Created attachment 328072 [details]
ioload load tester
Causes crashes and/or corruptions.
3 parallel instances:
Appears to be related to X: No corruptions from console, booted with initlevel=3.
Created attachment 328074 [details]
the radeon driver is from xorg-x11-drv-ati-6.9.0-63.fc10.x86_64
I've tried to reassign the bug to the x driver:
No mouse movement, no crash
ioload 400 and mouse movements - crash after a few seconds (< 5 seconds)
The mouse pointer is sometimes corrupted - instead of the "normal" pointer, just a rectangle (64x64 pixels?) with some colored pixels appears.
After moving, it is sooner or later replaced again with the normal pointer.
verified, but reassigning to the server. you're not using the mouse driver but evdev, and 2.0.8 is so simple that I have reasonable doubt that this would be in the server. the corrupted cursor (which I see even when not moving the mouse) indicates this as well.
Anyway, I don't even know how to start debugging this. valgrind doesn't say anything interesting.
ajax: any suggestions?
Created attachment 328773 [details]
hexdumps of the corrupted memory
What about the reverse approach?
I got a random memory corruption - sys_read() on the file in /tmp returned incorrect data. Mostly 0x00, sometimes some byte were set.
I've attached all log files that I have. Each syscall reads 512 bytes, the actual and theoretical content are hexdumped.
The numbers appear to be big endian 32-bit values.
Which components are capable of corrupting unrelated memory?
- someone does DMA to the wrong target address (use after free or something similar). My system is Radeon HD 3100, i.e. on-board graphics with shared memory)
- an os kernel component accesses the wrong target address
Who could use big-endian 32-bit numbers?
Who needs at least 512 bytes and zeros the memory?
The bug appears to be resolved: with Fedora 11, I do not get any cursor corruptions anymore.