Bug 26853 - rpm across NFS (Fisher to Fisher, SMP client) causes CPU LOCKUP error and stack dump
rpm across NFS (Fisher to Fisher, SMP client) causes CPU LOCKUP error and sta...
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Michael K. Johnson
David Lawrence
Florence RC-2
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-02-09 15:25 EST by Panic
Modified: 2007-04-18 12:31 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-02-14 16:11:43 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Panic 2001-02-09 15:25:07 EST
Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.2.16-22smp i686)


Situation:

client: SMP IBM Intellistation running stock Fisher
server: SMP Compaq ProLiant ML370 running stock Fisher

The client must be SMP -- with the uniprocessor kernel, this works
perfectly.  (Should this be under the kernel or nfs?) A test install from a
7.0 NFS server also works properly.  The error does change, and sometimes
lists CPU 0 as well.  This machine was installed via NFS as well, from that
same server.  The bug causes a hard lock of the machine (i.e., pull the
plug and start over).  What package is being installed does not appear to
matter,I tried the -27 samba from rawhide and xmms from Guinness.  Running
rpm on a local package does not cause this problem.

Reproducible: Always
Steps to Reproduce:
1. From an SMP client, mount an NFS directory from a Fisher-based server.
2. run "rpm -Uvh" on an RPM package located on the NFS share
3. Enjoy. :)
        

Actual Results:  LOCKUP on CPU error and stack trace, sometimes with a
partial rpm installation.

Expected Results:  full installation of the target RPM package

*********

Here's what happens:

[root@asterix /root]mount chatak.support:/opt/fisher /mnt/tmp
[root@asterix /root]cd /mnt/tmp/RedHat/RPMS
[root@asterix /root]rpm -Uvh xmms-1.2.4-6.i386.rpm
Preparing...            #######################################[100%]
  1:xmms        NMI Watchdog detected LOCKUP on CPU1, registers:%)
CPU:    1
EIP:    0010:[<c01072bc>]
EFLAGS: 00000246
eax: 00000000   ebx: c7ff2000   ecx:00000032    edx: c7ff2000
esi: c0107290   edi: c7ff2000   ebp:ffffe000    esp: c7ff3fb0
ds: 0018        es: 0018        ss: 0018
Process swapper (pid: 0, stackpage=c7ff3000)
Stack:  c0107342 00000000 00000000 00000000 00000000 00000000 00000019
00000000
        c1223000 0000260e 0000c00f 00000000 00000000 0000000d 0000000e
00000000
        00000000 c00bcd80 00000000 c0181869
Call Trace: [<c0107342>] [<c0181869>]

Code: c3 8d 76 00 fb c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00
console shuts up...
Comment 1 Jeff Johnson 2001-02-09 15:35:18 EST
This is a kernel problem ...
Comment 2 Glen Foster 2001-02-09 18:10:39 EST
We (Red Hat) should really try to resolve this before next release.
Comment 3 Michael K. Johnson 2001-02-13 16:51:36 EST
What network card are you using?
Comment 4 Panic 2001-02-13 21:02:38 EST
00:10.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)
        Subsystem: Netgear FA310TX
        Flags: bus master, medium devsel, latency 32, IRQ 9
        I/O ports at 6800
        Memory at f5ffdf00 (32-bit, non-prefetchable)

Module is tulip, of course.
Comment 5 Michael K. Johnson 2001-02-13 21:26:56 EST
Thanks!

Now, could you run ksymoops on that oops?
Comment 6 Panic 2001-02-14 13:41:25 EST
In working on the ksymoops, I discovered a few things:

1) it doesn't happen all the time, but is quite frequent
2) another SMP machine with a 3c905C is not affected at all
3) it is not limited to NFS or rpm, so far I've crashed cp'ing across NFS and
trying to ftp from the server, all errors are different.

Here's the best I could do with ksymoops (cat fullerror | ksymoops):

ksymoops 2.4.0 on i686 2.4.0-0.99.11smp.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.0-0.99.11smp/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
Error (regular_file): read_system_map stat /usr/src/linux/System.map failed
Warning (compare_maps): mismatch on symbol tulip_max_interrupt_work  , tulip
says c8882ec0, /lib/modules/2.4.0-0.99.11smp/kernel/drivers/net/tulip/tulip.o
says c88825c0.  Ignoring
/lib/modules/2.4.0-0.99.11smp/kernel/drivers/net/tulip/tulip.o entry
Warning (compare_maps): mismatch on symbol tulip_rx_copybreak  , tulip says
c8882ec4, /lib/modules/2.4.0-0.99.11smp/kernel/drivers/net/tulip/tulip.o says
c88825c4.  Ignoring
/lib/modules/2.4.0-0.99.11smp/kernel/drivers/net/tulip/tulip.o entry
Warning (compare_maps): mismatch on symbol aic7xxx_verbose  , aic7xxx says
c8840400, /lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/aic7xxx.o says
c883dfc0.  Ignoring /lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/aic7xxx.o
entry
Warning (compare_maps): mismatch on symbol sd  , sd_mod says c8821b98,
/lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/sd_mod.o says c8821a40. 
Ignoring /lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/sd_mod.o entry
Warning (compare_maps): mismatch on symbol proc_scsi  , scsi_mod says c881d0f8,
/lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/scsi_mod.o says c881b9d8. 
Ignoring /lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_logging_level  , scsi_mod says
c881d0f4, /lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/scsi_mod.o says
c881b9d4.  Ignoring /lib/modules/2.4.0-0.99.11smp/kernel/drivers/scsi/scsi_mod.o
entry
 EIP:    0010:[<c01072bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
 EFLAGS: 00000246
 eax: 00000000   ebx: c7ff2000   ecx:00000032    edx: c7ff2000
 esi: c0107290   edi: c7ff2000   ebp:ffffe000    esp: c7ff3fb0
 ds: 0018   es: 0018        ss: 0018
 Process swapper (pid: 0, stackpage=c7ff3000)
 Stack:  c0107342 00000000 00000000 00000000 00000000 00000000 00000019
 00000000
         c1223000 0000260e 0000c00f 00000000 00000000 0000000d 0000000e
 00000000
         00000000 c00bcd80 00000000 c0181869
 Call Trace: [<c0107342>] [<c0181869>]
 Code: c3 8d 76 00 fb c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00

>>EIP; c01072bc <enable_hlt+3c/e0>   <=====
Trace; c0107342 <enable_hlt+c2/e0>
Trace; c0181869 <secure_tcp_sequence_number+4549/4990>
Code;  c01072bc <enable_hlt+3c/e0>
00000000 <_EIP>:
Code;  c01072bc <enable_hlt+3c/e0>   <=====
   0:   c3                        ret       <=====
Code;  c01072bd <enable_hlt+3d/e0>
   1:   8d 76 00                  lea    0x0(%esi),%esi
Code;  c01072c0 <enable_hlt+40/e0>
   4:   fb                        sti    
Code;  c01072c1 <enable_hlt+41/e0>
   5:   c3                        ret    
Code;  c01072c2 <enable_hlt+42/e0>
   6:   8d b4 26 00 00 00 00      lea    0x0(%esi,1),%esi
Code;  c01072c9 <enable_hlt+49/e0>
   d:   8d bc 27 00 00 00 00      lea    0x0(%edi,1),%edi


7 warnings and 4 errors issued.  Results may not be reliable.

Comment 7 Michael K. Johnson 2001-02-14 13:59:02 EST
Ah, please try again with the -K and -L options.

Also, please update to the rawhide kernel and see if it is still
happening there.
Comment 8 Panic 2001-02-14 14:32:28 EST
Okay, ksymoops with -K and -L options.  No symbols available, so I'm guessing
its not too useful.  I'll try the newer kernel -- stay tuned. :)

************************
ksymoops 2.4.0 on i686 2.4.0-0.99.11smp.  Options used
     -V (default)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.0-0.99.11smp/ (default)
     -m /usr/src/linux/System.map (default)

No modules in ksyms, skipping objects
Error (regular_file): read_system_map stat /usr/src/linux/System.map failed
Warning (merge_maps): no symbols in merged map
 EIP:    0010:[<c01072bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
 EFLAGS: 00000246
 eax: 00000000   ebx: c7ff2000   ecx:00000032    edx: c7ff2000
 esi: c0107290   edi: c7ff2000   ebp:ffffe000    esp: c7ff3fb0
 ds: 0018        es: 0018        ss: 0018
 Process swapper (pid: 0, stackpage=c7ff3000)
 Stack:  c0107342 00000000 00000000 00000000 00000000 00000000 00000019
 00000000
         c1223000 0000260e 0000c00f 00000000 00000000 0000000d 0000000e
 00000000
         00000000 c00bcd80 00000000 c0181869
 Call Trace: [<c0107342>] [<c0181869>]
 Code: c3 8d 76 00 fb c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00

>>EIP; c01072bc No symbols available   <=====
Trace; c0107342 No symbols available
Trace; c0181869 No symbols available
Code;  c01072bc No symbols available
00000000 <_EIP>:
Code;  c01072bc No symbols available   <=====
   0:   c3                        ret       <=====
Code;  c01072bd No symbols available
   1:   8d 76 00                  lea    0x0(%esi),%esi
Code;  c01072c0 No symbols available
   4:   fb                        sti    
Code;  c01072c1 No symbols available
   5:   c3                        ret    
Code;  c01072c2 No symbols available
   6:   8d b4 26 00 00 00 00      lea    0x0(%esi,1),%esi
Code;  c01072c9 No symbols available
   d:   8d bc 27 00 00 00 00      lea    0x0(%edi,1),%edi


1 warning and 1 error issued.  Results may not be reliable.
*********************

Comment 9 Panic 2001-02-14 15:45:12 EST
The rawhide kernel does not boot, I'm cursed.  It fails trying to insmod the
aic7xxx driver from the initrd image.  Is this the Adaptec driver or Doug's
driver?  Here's the lspci output on the SCSI controller from the .99.11smp kernel:

00:03.0 SCSI storage controller Adaptec AHA-2940U/UW / AHA-39xx / AIC-7895 (rev 04)

	Subsystem: Adaptec AHA-2940U/2940UW Dual AHA-394xAU/AUW/AUWD AIC-7895B
	Flags: busmaster, medium devsel, latency 64, IRQ 14
	I/O ports at 6000 [disabled]
	Memory at f5ffe000 (32-bit, non-prefetchable)
	Capabilities: [dc] Power Management version 1

and another of these at I/O port 6400 and Memory address f5fff000.  SCSI adapter
BIOS information:

Adaptec AIC-7895 SCSI BIOS version 1.31

In a possibly interesting twist, I noticed that the .99.11 kernel also gets the
kmod error listed below, but does not have the "Loading aic7xxx module" line,
and instead shows scsi0 and scsi1 normally.

***************************

kmod failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
Loading aic7xxx module
NMI Watchdog detected LOCKUP on CPU0, registers:

<snip information that I now know is useless without being run through ksymoops>

ERROR: insmod exited abnormally!
****************************

ksymoops output with the rawhide kernel (.99.23) specified but running on the
.99.11 kernel, checking on the insmod failure:

Command: ksymoops -K -L -v /boot/vmlinux-2.4.0.-0.99.23smp -o
/lib/modules/2.4.0-0.99.23smp -m /boot/System.map-2.4.0-0.99.23smp

****************************

ksymoops 2.4.0 on i686 2.4.0-0.99.11smp.  Options used
     -v /boot/vmlinux-2.4.0-0.99.23smp (specified)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.0-0.99.23smp (specified)
     -m /boot/System.map-2.4.0-0.99.23smp (specified)

No modules in ksyms, skipping objects
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU:    1
EIP:    0010:[<c0207d82>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000083
eax: be406757   ebx: 0006f734     ecx: be3ff65c       edx: 00000000
esi: 00000340   edi: 00000000     ebp: c126de00       esp: c7da3dd8
ds: 0018 es: 0018 ss: 0018
Process insmod (pid: 12, stackpage=c7da3000)
Stack:  003a0f40 c0207e16 0006f734 00000000 c126de00 c881c8fb c126de00 c881ca54
        00000340 00000046 00000002 00000000 c7de5e20 c7da3e4c c126de00 c7da3e4c
        c7da3e52 c126de00 c7da3e52 c7ff8a86 c126de00 c7da3e52 c126de00 c12468c0
Call Trace: [<c0207e16>] [<c881c8fb>] [<c881ca54>] [<c881c86d>] [<c88374e0>]
[<c88374e0>] [<c01a84ce>]
        [<c8837480>] [<c88374e0>] [<c01a8542>] [<c88374e0>] [<c8837400>]
[<c881f93d>] [<c8837400>] [<c8837400>]
        [<c881c727>] [<c8837400>] [<c8801cd8>] [<c8837400>] [<c88281c0>]
[<c88281c0>] [<c881c000>] [<c881f7b7>]
        [<c8837400>] [<c8837400>] [<c0119cf5>] [<c8839930>] [<c8818000>]
[<c881c060>] [<c01091c7>]
Code: 29 c8 39 d8 72 f8 5b c3 8d b6 00 00 00 00 8b 44 24 04 eb 0a

>>EIP; c0207d82 <__rdtsc_delay+12/20>   <=====
Trace; c0207e16 <__udelay+36/40>
Trace; c881c8fb <END_OF_CODE+85106ab/???
Trace; c881ca54 <END_OF_CODE+8510804/???
Trace; c881c86d <END_OF_CODE+851061d/???
Trace; c88374e0 <END_OF_CODE+852b290/???
Trace; c88374e0 <END_OF_CODE+852b290/???
Trace; c01a84ce <pci_announce_device+1e/50>
Trace; c8837480 <END_OF_CODE+852b230/???
Trace; c88374e0 <END_OF_CODE+852b290/???
Trace; c01a8542 <pci_register_driver+42/60>
Trace; c88374e0 <END_OF_CODE+852b290/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c881f93d <END_OF_CODE+85136ed/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c881c727 <END_OF_CODE+85104d7/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c8801cd8 <END_OF_CODE+84f5a88/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c88281c0 <END_OF_CODE+851bf70/???
Trace; c88281c0 <END_OF_CODE+851bf70/???
Trace; c881c000 <END_OF_CODE+850fdb0/???
Trace; c881f7b7 <END_OF_CODE+8513567/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c8837400 <END_OF_CODE+852b1b0/???
Trace; c0119cf5 <sys_init_module+545/630>
Trace; c8839930 <END_OF_CODE+852d6e0/???
Trace; c8818000 <END_OF_CODE+850bdb0/???
Trace; c881c060 <END_OF_CODE+850fe10/???
Trace; c01091c7 <system_call+33/38>
Code;  c0207d82 <__rdtsc_delay+12/20>
00000000 <_EIP>:
Code;  c0207d82 <__rdtsc_delay+12/20>   <=====
   0:   29 c8                     sub    %ecx,%eax   <=====
Code;  c0207d84 <__rdtsc_delay+14/20>
   2:   39 d8                     cmp    %ebx,%eax
Code;  c0207d86 <__rdtsc_delay+16/20>
   4:   72 f8                     jb     fffffffe <_EIP+0xfffffffe> c0207d80
<__rdtsc_delay+10/20>
Code;  c0207d88 <__rdtsc_delay+18/20>
   6:   5b                        pop    %ebx
Code;  c0207d89 <__rdtsc_delay+19/20>
   7:   c3                        ret
Code;  c0207d8a <__rdtsc_delay+1a/20>
   8:   8d b6 00 00 00 00         lea    0x0(%esi),%esi
Code;  c0207d90 <__loop_delay+0/30>
   e:   8b 44 24 04               mov    0x4(%esp,1),%eax
Code;  c0207d94 <__loop_delay+4/30>
  12:   eb 0a                     jmp    1e <_EIP+0x1e> c0207da0
<__loop_delay+10/30>

******************************

Encouraged by that success (no errors!) I tried the same command configuration
on the original error, and got this possibly useful output:

******************************

ksymoops 2.4.0 on i686 2.4.0-0.99.11smp.  Options used
     -v /boot/vmlinux-2.4.0-0.99.11smp (specified)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.0-0.99.11smp (specified)
     -m /boot/System.map-2.4.0-0.99.11smp (specified)

No modules in ksyms, skipping objects
 EIP:    0010:[<c01072bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
 EFLAGS: 00000246
 eax: 00000000   ebx: c7ff2000   ecx:00000032    edx: c7ff2000
 esi: c0107290   edi: c7ff2000   ebp:ffffe000    esp: c7ff3fb0
 ds: 0018        es: 0018        ss: 0018
 Process swapper (pid: 0, stackpage=c7ff3000)
 Stack:  c0107342 00000000 00000000 00000000 00000000 00000000 00000019
 00000000
         c1223000 0000260e 0000c00f 00000000 00000000 0000000d 0000000e
 00000000
         00000000 c00bcd80 00000000 c0181869
 Call Trace: [<c0107342>] [<c0181869>]
 Code: c3 8d 76 00 fb c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00
>>EIP; c01072bc <default_idle+2c/40>   <=====
Trace; c0107342 <cpu_idle+52/70>
Trace; c0181869 <set_cursor+69/80>
Code;  c01072bc <default_idle+2c/40>
00000000 <_EIP>:
Code;  c01072bc <default_idle+2c/40>   <=====
   0:   c3                        ret       <=====
Code;  c01072bd <default_idle+2d/40>
   1:   8d 76 00                  lea    0x0(%esi),%esi
Code;  c01072c0 <default_idle+30/40>
   4:   fb                        sti
Code;  c01072c1 <default_idle+31/40>
   5:   c3                        ret
Code;  c01072c2 <default_idle+32/40>
   6:   8d b4 26 00 00 00 00      lea    0x0(%esi,1),%esi
Code;  c01072c9 <default_idle+39/40>
   d:   8d bc 27 00 00 00 00      lea    0x0(%edi,1),%edi


Comment 10 Michael K. Johnson 2001-02-14 16:11:39 EST
Matt, could you please try the 2.4.1-0.1.8 kernel in our
build tree?  Ask me for directions if you need to.
That might solve the booting problem with the aic7xxx
driver.
Comment 11 Panic 2001-02-23 12:13:45 EST
I ended up trying 2.4.1-0.1.14smp -- works like a charm on repeated tests, no
oops, no anything.  I'll mark it as resolved.  Thanks for your help.

Note You need to log in before you can comment on or make changes to this bug.