This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 79078 - Machine lock up when using NFS
Machine lock up when using NFS
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: nfs-utils (Show other bugs)
7.3
i586 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-12-05 08:42 EST by Mark
Modified: 2015-01-04 17:02 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-11-27 18:15:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mark 2002-12-05 08:42:25 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0; PANCREDITv5.0; 
PANCREDIT5.0; .NET CLR 1.0.3705)

Description of problem:
I have two Linux boxes.  A single processor p133 on RH7.1 and a dual processor 
p133 running RH7.3.  A job on the RH7.1 box mount a directory on the RH7.3 SMP 
box and copies any changed files, then unmounts.  At some point, the RH7.3 SMP 
box locks up, no console no ping no ftp or telnet.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.mount a directory from RH71 to RH73 SMP
2.use a "cp -pu" to copy changed files
3.wait for the target machine to die
	

Actual Results:  After a period of time, could be 5 minutes could be 20, the 
SMP box ceases to respond to all external stimuli.

Expected Results:  The job should finish after an hour or so and all is well 
with the world.

Additional info:

There does not appear to be an error message, I think it is such a complete 
halt that Linux is not getting chance to write an error to the message log.
If the NFS services are disabled then the machine seems to stay up quite 
happily.  The copy job has worked without modification for many months when the 
target machine was a RH6.2 single processor box.
Comment 1 Arjan van de Ven 2002-12-05 08:50:35 EST
question: what exact kernel is running on the hanging box?
also can you try adding "nmi_watchdog=1" to the kernel commandline of the
hanging box (that's the vmlinuz line in /boot/grub/grub.conf). Doing this will
make the kernel try to detect deadlocks and print a backtrace if it detects one.
Comment 2 Mark 2002-12-05 09:15:57 EST
Thanks for such a rapid response.  The machines are at home, I will provide the 
requested details tomorrow.
Comment 3 Mark 2002-12-06 08:04:52 EST
Here is my grub.conf
=========================================
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
password --md5 $1$cF\V1.BG$556CULbW1ITuEhGZ4.gLO1
title Red Hat Linux (2.4.18-3smp)
        root (hd0,0)
        kernel /vmlinuz-2.4.18-3smp ro root=/dev/hda6 noapic
        initrd /initrd-2.4.18-3smp.img
title Red Hat Linux-up (2.4.18-3)
        root (hd0,0)
        kernel /vmlinuz-2.4.18-3 ro root=/dev/hda6
        initrd /initrd-2.4.18-3.img
=========================================
I tried running with nmi-watch=1
Dec  5 18:22:27 puntus kernel: Default MP configuration #6
Dec  5 18:22:27 puntus kernel: Processor #0 Pentium(tm) APIC version 16
Dec  5 18:22:27 puntus kernel: Processor #1 Pentium(tm) APIC version 16
Dec  5 18:22:27 puntus kernel: I/O APIC #2 Version 16 at 0xFEC00000.
Dec  5 18:22:27 puntus kernel: Processors: 2
Dec  5 18:22:27 puntus kernel: Kernel command line: ro root=/dev/hda6 noapic
nmi_watchdog=1
Dec  5 18:22:27 puntus kernel: Initializing CPU#0
Dec  5 18:22:27 puntus kernel: Detected 133.610 MHz processor.
 ++++++++++++++++++++++
I was tailing the messages file when it crashed
gort.squirrelsoft:1005 for /backup (/backup)
Dec  5 18:28:32 puntus rpc.mountd: authenticated mount request from
gort.squirrelsoft:1008 for /backup (/backup)
Dec  5 18:28:51 puntus rpc.mountd: authenticated mount request from
gort.squirrelsoft:762 for /backup (/backup)
Dec  5 18:38:53 puntus su(pam_unix)[1702]: session opened for user news by
(uid=0)
Dec  5 18:38:54 puntus su(pam_unix)[1702]: session closed for user news

+++++++++++++++++++++++++++++++++++++++

No trace was found (where would it be?)

I also ran in single processor mode but it still hung.
Comment 4 Arjan van de Ven 2002-12-06 08:26:44 EST
hmm
noapic and the nmi watchdog are exclusive ;(
Comment 5 Mark 2002-12-06 08:32:43 EST
If i don't run noapic, i wold normally get lost interupt errors relating to the 
hard disks.  Will nmi watchdog still stop that happining or am I stucj now?
Comment 6 Mark 2002-12-12 06:34:10 EST
I have given up on this one.  I have reverted to the old single processor box.  
Thanks for your time but let's not waste any more effort.  Cheers, Mark.

Note You need to log in before you can comment on or make changes to this bug.