Red Hat Bugzilla – Bug 29508
[via] Oops on copy from Wolverine kernel
Last modified: 2007-04-18 12:31:46 EDT
This oops from a Wolverine default kernel happend on an attempt to
copy files from vfat to vfat directories. This oops actually looks
pretty similar to problems I got when trying to install (#29427 and
#29472 in bugzilla).
Here is a decoded oops and the whole log from reboot to oops is
ksymoops 2.4.0 on i686 2.4.1-0.1.9. Options used
-v /boot/vmlinux-2.4.1-0.1.9 (specified)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.1-0.1.9/ (default)
-m /boot/System.map-2.4.1-0.1.9 (default)
Error (expand_objects): cannot stat(/lib/ncr53c8xx.o) for ncr53c8xx
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
Error (pclose_local): find_objects pclose failed 0x100
Warning (compare_maps): ksyms_base symbol
__VERSIONED_SYMBOL(shmem_file_setup) not found in vmlinux. Ignoring
Warning (compare_maps): mismatch on symbol partition_name , ksyms_base
says c01b0600, vmlinux says c01524a0. Ignoring ksyms_base entry
Unable to handle kernel paging request at virtual address 08000004
*pde = 1d3a7067
Using defaults from ksymoops -t elf32-i386 -a i386
eax: c18a0000 ebx: 00000009 ecx: 000014e0 edx: 08000000
esi: 00040008 edi: 00002202 ebp: 0000000f esp: dd311e14
ds: 0018 es: 0018 ss: 0018
Process cp (pid: 791, stackpage=dd311000)
Stack: 000014e0 00000000 00000e00 cb2b0da0 00000c00 c013517b 00002202
00000200 cb2b0da0 00000c00 c0135517 cbd54420 00000000 c0135538
cb2b0f80 cb2ac000 dd311e6c 00000200 000016ae cb2b0d40 06b60600
Call Trace: [<c013517b>] [<c0135517>] [<c0135538>] [<c0134b0c>]
[<c0134b5c>] [<e086d5f0>] [<c0135cfa>]
[<e086d5f0>] [<e086efd5>] [<e086d5f0>] [<c0127ad9>] [<e086d731>]
[<e086d709>] [<c0132fe6>] [<c0109007>]
Code: 39 72 04 75 f5 0f b7 42 08 3b 44 24 20 75 eb 66 39 7a 0c 75
>>EIP; c0134336 <get_hash_table+66/90> <=====
Trace; c013517b <unmap_underlying_metadata+1b/60>
Trace; c0135517 <__block_prepare_write+117/300>
Trace; c0135538 <__block_prepare_write+138/300>
Trace; c0134b0c <balance_dirty_state+c/50>
Trace; c0134b5c <balance_dirty+c/40>
Trace; e086d5f0 <[cdrom]cdrom_ioctl+ab0/e20>
Trace; c0135cfa <cont_prepare_write+22a/370>
Trace; e086d5f0 <[cdrom]cdrom_ioctl+ab0/e20>
Trace; e086efd5 <[cdrom]cdrom_sysctl_info+5a5/5d0>
Trace; e086d5f0 <[cdrom]cdrom_ioctl+ab0/e20>
Trace; c0127ad9 <generic_file_write+3a9/5f0>
Trace; e086d731 <[cdrom]cdrom_ioctl+bf1/e20>
Trace; e086d709 <[cdrom]cdrom_ioctl+bc9/e20>
Trace; c0132fe6 <sys_write+96/d0>
Trace; c0109007 <system_call+33/38>
Code; c0134336 <get_hash_table+66/90>
Code; c0134336 <get_hash_table+66/90> <=====
0: 39 72 04 cmp %esi,0x4(%edx) <=====
Code; c0134339 <get_hash_table+69/90>
3: 75 f5 jne fffffffa <_EIP+0xfffffffa>
Code; c013433b <get_hash_table+6b/90>
5: 0f b7 42 08 movzwl 0x8(%edx),%eax
Code; c013433f <get_hash_table+6f/90>
9: 3b 44 24 20 cmp 0x20(%esp,1),%eax
Code; c0134343 <get_hash_table+73/90>
d: 75 eb jne fffffffa <_EIP+0xfffffffa>
Code; c0134345 <get_hash_table+75/90>
f: 66 39 7a 0c cmp %di,0xc(%edx)
Code; c0134349 <get_hash_table+79/90>
13: 75 00 jne 15 <_EIP+0x15> c013434b
2 warnings and 4 errors issued. Results may not be reliable.
Created attachment 11081 [details]
log file leading to an oops
I am not sure if tying that bug to vfat is a correct thing to do. It was
observed while attempting to copy between vfat file systems but possibly only
because I cannot do much more with this minimal installation yet. A quoted
decoded oops seems to imply CD and an IDE channel which was not even used during
the operation in question. A copy was from /dev/hdg to /dev/hde and CD is
This defect is considered MUST-FIX for Florence Gold release
It may be vfat, vm, or interaction between them.
Could you see if this happens with the latest kernel from rawhide?
I think that's currently 2.4.1-0.1.14
I know now that this is NOT associated with vfat as I suggested from the
very beginning (again, see #29427 and #29472). I can repeat similar troubles
when using 2.2.19pre14 and 2.4.2-ac5 kernels and also when copying from
ext2 to ext2 system. I simply had the biggest block of files on vfat
I cannot exclude a broken hardware at this point.
I think that I found what triggers (as opposed to a reason) the described
behaviour. In 1005C Award BIOS there are two "advanced" options:
System Performance Setting [Optimal, Normal]
USB Legacy Support [Auto, Enabled, Disabled]
If the first one is set to "Normal" and the second one to "Disabled" then the
whole system becomes stable. I copied from various file systems to a directory
on ext2 around 1.2 GB of files without any ill effects and run succesfully
'diff -r' between two directories 475 MB each. If BIOS options are any other
way then one should expect spectacular blowups with corrupted file systems
and other nasty effects after the first oops.
It is difficult to know what is "System Performance Setting" as it always
shows "Optimal" regardless of a status on the last save. But a system behaviour
depends on how it was set. How "USB Legacy Support" comes into the picture
I cannot even imagine.
I did try with 2.2.19pre and 2.4 kernels and the picture does not change.
I still have to try more extensive tests, including a full installation,
(cf. other reports referenced above) but this looks like it.
No idea how to even start explaining all of that in installation instructions.
Thanks for that information, that's useful.
This is with the promise controller discussed in #29427, right?
Yes. The same "box from hell" all over. K6 Athlon on A7V Asus, Award 1005C
BIOS, PDC20265 Promise IDE controller, NCR 53c810 SCSI controller (but at
this moment I doubt if any of the later has anything to do with it).
In the currently available kernels from rawhide, lots of corruption issues
with VIA chipsets in combination with Promise controllers are fixed.
Can you please try one of there kernels and verify they actually fix the
problem? (it basically fixes the bios-settings you mentioned)
(kernels 2.4.2-0.1.25 or later)
I will close this bug; if you can reproduce the problem please reopen it.
I am afraid that I have a bad news. I tried the latest kernel from
rawhide, i.e. "2.4.2-0.1.29 #1 Thu Mar 15 20:34:20 EST 2001 i686".
After switching BIOS to default factory settings (the board was updated
to the latest version of 1007 by now) a removal of close to 1 GB of data
with 'rm -rf' went without troubles. But an attempt to copy some files
to a target went awry after 192 Megabytes from ext2 file system was
copied. I can see that this happened while copying /dev/ttyU* nodes as
only 46 out of 288 was found later in a copy and /dev/ttyU136 ended up
as lost+found/#6171. :-) In case you wonder previous blow ups happened
when copying regular files (there are no special nodes if data are
coming from a vfat system) or directories. Only amount of data seems
to matter and 192 is the record so far. With previous kernels this
was happening regularly in 130 - 140 Megabytes range; so some
improvement can be claimed. :-)
A failure of a copy was followed by oops in an attempted shutdown.
Luckily sysrq key (I do have that turned on) still worked and it
was possible to remount all file systems read-only.
After that failure I switched BIOS back to "safe" settings and in
that form, and with the same kernel, I was able to copy around
1 GB of stuff from one disk to another without any incidents.
I attach my full log of errors from the last attempt. I started to
wonder if this does not have anything to do with "256 -> 255"
error/not-error which was discussed on linux-kernel list very recently.
Note: I am afraid that this particular test machine is going away any
hour right now. I already kept it much longer than I really should
Created attachment 13024 [details]
fragment of log files with errors triggered by 'cp'
We used 128 for the maxsector, not 256, so we're safe against that.
If you have still some time, I would appreciate the output of
"lspci -vxxx" for both the "safe" and the "failure" bios setting.
It is not that bad. :-) This box is still here but it will likely
go pretty soon.
A SCSI controller definitely dislikes 'lscpi -vxxx' and reacts with
ncr53c810-0: SCSI parity error detected: SCR1=65 DBC=50000000 SSTAT1=f
and is unhappy on reboot.
Also with "factory" settings in BIOS I start collecting messages like
usb-uhci.c: interrupt, status 31, frame# 506
usb-uhci.c: interrupt, status 31, frame# 1357
usb-uhci.c: interrupt, status 31, frame# 1534
This does not happens with "safe" settings when "Legacy USB Support"
is turned off.
With "factory" BIOS settings I also had problems on a shutdown. A claim
was that network file systems are busy and the whole process got stuck
(the same kernel from rawhide and there are no troubles of that sort
with BIOS in a "safe" position). If you want to tell me that this
hardware/firmware is a junk I heartily agree.
Attached 'lspci.default' is for an output with BIOS in default and
'lspci.safe' is my "normal" stuff ("System Performance" is "Normal"
and "Legacy USB Support" is off).
Created attachment 13043 [details]
an output from 'lspci -vxxx' with different BIOS settings
We have found SO many problems with viachipsets that we decided to turn off
IDE dma for those machines. It is nearly impossible to fix the corruption as it
is a chipset/motherboard bug, and the workarounds are board and bios-version