Bug 161743

Summary:

System hard hang under disk I/O

Product:

[Fedora] Fedora

Reporter:

Craig McLean <craig>

Component:

kernel

Assignee:

Dave Jones <davej>

Status:

CLOSED DUPLICATE

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

pfrields, teicher-fedora, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-07-08 00:58:48 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Combined output from dmesg, lsmod, rpm -qa and lspci	none
output from smartctl -a and hdparm -I	none
output from "strace -f -- find /etc \| xargs grep youwontfindme"	none
gzipped output from "strace -f -- find /etc \| xargs grep youwontfindme"	none

Description Craig McLean 2005-06-26 21:57:38 UTC

Description of problem:
Briefly, I am running FC4, kernel 2.6.11-1.1369_FC4 on a Toshiba Tecra M2. It's
a Pentium M 1500Mhz, 256Mb RAM.
Under reasonably heavy disk I/O (ls -lR /etc), the system locks up hard. No
capslock/numlock, no magic sysreq, no oops or crashdump.
I've attached a bunch of potentially useful information, but without a crash
dump I have no clue where to start.
I have raised this under kernel as I can boot into SuSE using a 2.4.24 kernel
and perform very heavy I/O without issue. I am loathe to install a 2.4 kernel on
the Fedora side because this will complicate things.

Version-Release number of selected component (if applicable):
2.6.11-1.1369_FC4

How reproducible:
Every time.

Steps to Reproduce:
1. Boot system to init 5, log in, gnome starts.
2. open terminal
3. 'ls -lR /etc'
  
Actual results:
Lockup

Expected results:


Additional info:
Attached lsmod, lspci, rpm -qa, dmesg.

Comment 1 Craig McLean 2005-06-26 21:57:38 UTC

Created attachment 115994 [details]
Combined output from dmesg, lsmod, rpm -qa and lspci

Comment 2 Craig McLean 2005-07-01 07:49:32 UTC

Now on Kernel 2.6.12-1.1400_FC5. Same fault.
Another simple (and more reliable) way to cause this problem is to issue 'find
/etc | xargs grep blahblahblah'. The system will grind for a while, then I get
the old flashing caps-lock.

Comment 3 Dan Carpenter 2005-07-01 08:28:10 UTC

All those things are not strenous for the disk drive.  They are just ordinary
things.

Try this under SuSE.

for i in `seq 100` ; do
     find /usr/ -type f -exec md5sum \{\} \;
done > /tmp/list_md5
cat /tmp/list_md5 | sort | uniq -c | sort | less

It should all say 100.

What harddrive are you using?  Does smartctl -a say the harddrive is bad?  What
mobo do you have?  what does `strace ls -lR /etc` say?  `hdparm`?

Comment 4 Craig McLean 2005-07-01 10:14:50 UTC

Fair enough, I assumed (I know, I know) that because the problem only occurred
while the disk is thrashing (It's a laptop, so easy to hear the heads moving)
that it was an I/O+CPU combo causing the problem.
That script seems to run without error on either 2.4.24 or 2.6.12, I get 100 of
everything.
Anyhow, the disk is a TOSHIBA MK4025GAS, as stated in the dmesg attached to the
original report. smartctl -a says the disk is ok, it's a toshiba tecra M2 as
stated, so it's toshi's own motherboard.
hdparm -I /dev/hda1 and smartctl -a /dev/hda1 in new attachment, I'll attach an
strace when I can get one, but the box will panic so I'll need to get to init S
and stick it on a usb drive or something.

Comment 5 Craig McLean 2005-07-01 10:15:42 UTC

Created attachment 116231 [details]
output from smartctl -a and hdparm -I

Comment 6 Craig McLean 2005-07-01 10:41:36 UTC

Created attachment 116232 [details]
output from "strace -f -- find /etc | xargs grep youwontfindme"

This is the strace output from the panic'd system. This command hangs the
system and requires a power-cycle to clear. Interestingly, if "/" is remounted
with "-o sync" this problem seems not to occur.

Comment 7 Craig McLean 2005-07-01 10:42:24 UTC

Created attachment 116233 [details]
gzipped output from "strace -f -- find /etc | xargs grep youwontfindme"

This is the strace output from the panic'd system. This command hangs the
system and requires a power-cycle to clear. Interestingly, if "/" is remounted
with "-o sync" this problem seems not to occur.

Comment 8 Pete Zaitcev 2005-07-01 16:05:25 UTC

In case of blinking LEDs the first priority should be getting the oops trace.
Re-run the test with console in text mode.

Comment 9 Craig McLean 2005-07-04 17:40:25 UTC

Sitting on the console shows a panic/oops string (about 2 pages), but this is a
laptop with no rs232 ports, only USB. LKCD requires me to rebuild an old kernel,
and netdump (i believe) won't support the ipw2100. Also, my nice
(point-and-shoot) digital camera won't focus properly on the TFT screen, so I
can only get pretty blurry info like that.
Can you suggest a way of getting the oops output from the machine?

Comment 10 Dave Jones 2005-07-08 00:58:48 UTC

I notice you have the nvidia module loaded.  Some versions of their driver
(maybe they still do, I havent looked)  created a /dev node in /etc , which when
read, would crash the system.  The symptoms you report are in line with what we
saw with earlier reports from users of similar set ups.

Please reopen if you can reproduce without the nvidia driver present (and check
your /etc for device nodes)


*** This bug has been marked as a duplicate of 73733 ***