Bug 56042

Summary: (IDE VIA)Sawfish fails, gnome fails, disk errors
Product: [Retired] Red Hat Linux Reporter: Philip Shearer <philip.shearer>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: other   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Philip Shearer 2001-11-11 20:54:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.3-12 i686)

Description of problem:
/usr file system errors. causing sawfish and gnome failure.    
seems too be related to gugzila 37292 but is a file system corruption
problem.

It does NOT seem to be related to 41070 -- which is the
only other disk error bug report I saw.

kernel: CPU: AMD Duron(tm)
Redhat 7.1, Linux kernel 2.4.2 and 2.4.3


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. remove gmc and sawfish rpm 
2. reinstall
3. run gnome window environment a numer of times
4. stop and start the system a number of times.
5. after each start up 
    cd /usr/share
    ls -lR >ls-lR[n]
    diff ls-lR1 ls-lRn



Actual Results:  After a few times Sawfish fails to start.
Gnome will run with twm as the window manager but eventually it fails as
well.

The diffrences between the first and last listing get larger.

with Gnome RPM :
        User Interface -> Desktop:
                gnome-applets
                sawfish

When first installed they verify as OK then they gradually corrupt.

Eventually the /usr file system fails fsck. fsck has to be run in single
user mode
to fix the error . BUT /usr does not fail until after there are a lot of
errors.


Expected Results:  Would not expect any of this.

As far as I can tell no other file in any other subsystem is effected
by this creeping failure.


Additional info:

Here is a diff from my current system 

# diff ls-lR2  ls-lR5
        [snip]
        ---
        > total 7212
        44396c44399
        < ?-w-rw--wT    0 50135649 50135040 213642651262255593 Jan  1  1970
wm
        ---
        > -rw-r--r--    1 root     root      3213967 Nov 11 10:08 wm
        44398c44401
        < ?-wx-wxr-T    0 50135733 50135040 215331642868040307 Jan  7  1970
wm.jlc
        ---
        > -rw-r--r--    1 root     root      3214094 Nov 11 10:13 wm.jlc

I have recently restored gnome and upgraded sawfish for the second time,
so the file corruption has just started.

However so far there is no report by fsdisk that there is anything
wrong! after a few more times of running gnome I expect that it will
do the same as last time and eventually report that the disk is corrupt
and that I will have to run fdisk manually.
At the moment I have a number of chunks of an old version of sawfish
and gnonme sitting in /usr/lost+found.
#ls -R /usr/lost+found
        /usr/lost+found:
        #146262  #146269  #178335  #178337  #178338
        /usr/lost+found/#146262:
        caution.png  important.png  note.png  tip.png  warning.png
        /usr/lost+found/#146269:
        caution.png  important.png  note.png  tip.png  warning.png

Machine's Vital statistics
--------------------------

from /var/log/message
kernel: Detected 797.176 MHz processor.
kernel: Calibrating delay loop... 1592.52 BogoMIPS
Memory: 221468k/229376k available (1246k kernel code, 5728k reserved, 93k
data, 228k init, 0k highmem)
kernel: CPU: AMD Duron(tm) Processor stepping 01

uname -a
Linux xxxxx 2.4.3-12 #1 Fri Jun 8 13:35:30 EDT 2001 i686 unknown

df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda7               257673     79126    165243  33% /
/dev/hda1                19487      3407     15074  19% /boot
/dev/hda6               519836      5688    487740   2% /home
/dev/hda5              1765992   1258444    417840  76% /usr
/dev/hda9               257673     26067    218302  11% /var
Disk /dev/hda: 128 heads, 63 sectors, 779 cylinders
Units = cylinders of 8064 * 512 bytes

fdisk
Disk /dev/hda: 128 heads, 63 sectors, 779 cylinders
Units = cylinders of 8064 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         5     20128+  83  Linux
/dev/hda2             6       779   3120768    5  Extended
/dev/hda5             6       450   1794208+  83  Linux
/dev/hda6           451       581    528160+  83 Linux 
/dev/hda7           582       647    266080+  83  Linux
/dev/hda8           648       713    266080+  82  Linux swap
/dev/hda9           714       779    266080+  83  Linux

Comment 1 Havoc Pennington 2001-11-12 19:22:46 UTC
No way is GNOME supposed to be robust against filesystem corruption. ;-) 
I'm reassigning to kernel since that's the only thing that could be responsible,
but I'm guessing they will tell you your hardware has problems.

Comment 2 Arjan van de Ven 2001-11-12 19:28:51 UTC
Could you try the 2.4.9-12 kernel? It has a few patches to work around via IDE
hardware corruption :(

Comment 3 Philip Shearer 2001-11-13 20:03:30 UTC
I re-read the 37292 report and followed the comments to bug 27614
My installation is reporting the same error problem: in the messages files:
  11:49:24 x kernel: hda: status error: status=0x58 { DriveReady SeekComplete
DataRequest }
  11:49:24 x kernel: hda: drive not ready for command  
So it probably is related to bug 41070

In bug 27614 arjanv suggests that it is a cable problem and that 
        'you can always boot with "ide=nodma" '
I persume that is why you suggest I build a newer kernel with IDE patches

I'll try the boot arguments and I'll also rebuild the kernal (probably with
2.4.14 unless there is a reason you want me to use linux-2.4.9.) and let
you know the results.



Comment 4 Philip Shearer 2001-11-20 16:54:52 UTC
Upgraded to RH 7.2 with ext3.
Built kernel 2.4.14 (with the loop.c bug fix and ext3 extentions)
Sawfish failed :-(

Output on the screen having run gnome for the first time after upgrade and
kernel 2.4.14 installed:

  rep: received fatal signal: Segmentation fault
  struct debug_buf common:
  Backtrace in `fatal_signal_handler':
        <(null)+1076680776>
        <rep_symbols_init+52>
        <rep_init_from_dump+93>
        <rep_init+45>
        <main+85>
        <__libc_start_main+147>
        <XMapRaised+53>

  Lisp backtrace:

  cursor_addr value: 187
  Message: Successfully registered
`OAFIID:nautilus_factory:bd1e1862-92d7-4391-963 e-37583f0daef3'
 cursor_addr value: 9c30
 Message: Successfully registered `OAFIID:Bonobo_Moniker_std_Factory

--------------------------------------------------------------------
In var/log/message file:
  Nov 19 17:08:04 xxx gnome-name-server[1695]: starting
  Nov 19 17:08:04 xxx gnome-name-server[1695]: name server starting
  Nov 19 17:08:04 xxx kernel: ide-floppy driver 0.97.sv
  Nov 19 17:08:04 xxx kernel: hda: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
  Nov 19 17:08:04 xxx kernel: hda: drive not ready for command
  Nov 19 17:08:04 xxx kernel: hdb: ATAPI 12X CD-ROM drive, 128kB Cache, DMA
  Nov 19 17:08:04 xxx kernel: Uniform CD-ROM driver Revision: 3.12
  Nov 19 17:08:40 xxx gconfd (root-1744): starting (version 1.0.4), pid 1744
user 'root'
---------------------------------------------------------------------
Reinstaling sawfish a second time produced similar errors but  including  4
  hdb: status error: status=0x50  { DriveReady SeekComplete }
and then
  hdb: DNA disabled
  hbd ATAPI reset complete
Also when listing the /usr/share/sawfish subdir there is now an entry in the
message file
Nov 20 17:12:18 xxx  kernel: EXT3-fs error (device ide0(3,5)): ext3_readdir: bad
entry in directory #3304: rec_len %% 4 != 0 - offset=0, inode=1077291218,
rec_len=10466, name_len=54

---------------------------------------------------------------------
I do not have a kernel driver for the onboard modem of sound card.

Two web sites which hold information on the motherboard:
http://www.sis.com/support/driver/linux.htm
http://www.sysopt.com/articles/sis730/index.html

I will try messing about with BIOS settings and see if that helps In the mean
time,
is there any more useful information I can provide? Do you have any suggestion
what I can do to fix the problem?

Comment 5 Philip Shearer 2001-11-21 14:02:28 UTC
Changing BIOS settings did not make a diffrence.
I removed the CD ROM (MATSHITA CR-584) the disk errors  on
hardisk (Quantom Fireball TM3200A) stopped.

I checked the jumpers on both:
  CDROM is jumpered as the salve.
  Hardisk is jumpered as the master.

I replaced the IDE disk cable with a new one. With the CD ROM attached
the errors were still there.

I placed the CD ROM onto the second IDE slot (with another cable).

#########################################################################
THERE HAVE BEEN NO MORE HARD DISK ERRORS since I made this hardware change.
#########################################################################

However since I made this change there are is a kernel message in the message
log which appears during the startup of gnome (if there is no CD in the CD ROM:  
  kernel: cdrom: This disj doesn't have any tracks I recognize!
I wonder why the kernel is trying to grope a CD when there is none present.
Perhapse this is connected to the initial problem as hardware write problems
only seem to occure when sawfish is starting.



Comment 6 Bugzilla owner 2004-09-30 15:39:16 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/