Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 80532

Summary:	processes in D state around iput or vfs or something
Product:	[Retired] Red Hat Linux	Reporter:	Brown, Zach <zab>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED DUPLICATE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	8.0	CC:	drjohnson1, mgalgoci, michael, ox23fgu02, shaver
Target Milestone:	---
Target Release:	---
Hardware:	athlon
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-02-21 18:50:38 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Brown, Zach 2002-12-27 19:19:08 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021018

Description of problem:
mgalgoci asked that I bugzilla this, but it may not be useful as this machine is
using the nvidia kernel modules (no, really, for opengl functionality.).  the
machine is a dual athlon 1800 running smp athlon 2.4.18-18.8.0 (with the nvidia
poo relinked with their source rpm shim).

Linux kiyoko 2.4.18-18.8.0smp #1 SMP Wed Nov 13 22:22:40 EST 2002 i686 athlon
i386 GNU/Linux

I make heavy use of UML in day-to-day development work.  UML still exhibits the
mysterious hang bug (missing sigio signals?) which can be hacked around by
having a script in the background which runs 'uml_mconsole version' on the umls
regularly.  This seems to stop them from being stuck.

sometimes killall or 'halt' in the uml console will push the uml_console and
kernel tracing thread of uml into D state;

zab      31696  0.0  0.0  1756  444 pts/9    D    10:16   0:00 uml_mconsole uml0
version
zab       9035  0.0  8.1 115292 83972 ?      D    Dec26   0:00
/var/zab/l/linux-2.4.18-18.8.0-l5/linux (uml0) [(kernel thread)]

linux         D C03AD240     0  9035      1         31696 25904 (NOTLB)
Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdaf2decc))
[<c0136235>] truncate_list_pages [kernel] 0x215 (0xdaf2def4))
[<c01362ed>] truncate_inode_pages [kernel] 0x5d (0xdaf2df34))
[<c01637ee>] iput [kernel] 0x19e (0xdaf2df4c))
[<c0158a25>] vfs_unlink [kernel] 0x185 (0xdaf2df68))
[<c0158c99>] sys_unlink [kernel] 0x119 (0xdaf2df84))
[<c010945f>] system_call [kernel] 0x33 (0xdaf2dfc0))

uml_mconsole  D C03AD240     0 31696      1         31980  9035 (NOTLB)
Call Trace: [<c0107e32>] __down [kernel] 0x82 (0xdb289e78))
[<c0107fcc>] __down_failed [kernel] 0x8 (0xdb289e9c))
[<c015a906>] .text.lock.namei [kernel] 0x35 (0xdb289eac))
[<c0156b39>] link_path_walk [kernel] 0x459 (0xdb289ecc))
[<c01571d9>] path_lookup [kernel] 0x39 (0xdb289f0c))
[<c0157529>] __user_walk [kernel] 0x49 (0xdb289f1c))
[<c0152f2f>] vfs_stat [kernel] 0x1f (0xdb289f38))
[<c013561f>] do_munmap [kernel] 0x2cf (0xdb289f58))
[<c01535ab>] sys_stat64 [kernel] 0x1b (0xdb289f70))
[<c012adbe>] update_process_times [kernel] 0x3e (0xdb289f8c))
[<c011b6f0>] do_page_fault [kernel] 0x0 (0xdb289fb0))
[<c0109550>] error_code [kernel] 0x34 (0xdb289fb8))
[<c010945f>] system_call [kernel] 0x33 (0xdb289fc0))

while poking around at this, I noticed another D uml that has been around for
ages :)

zab      28907  0.0  0.0     0    0 ?        DW   Dec12   0:00 [linux]

linux         D C03AC880     0 28907      1         16544   850 (L-TLB)
Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdab49dc0))
[<c0136235>] truncate_list_pages [kernel] 0x215 (0xdab49de8))
[<c013ee7e>] lru_cache_del [kernel] 0xe (0xdab49e04))
[<c01362fa>] truncate_inode_pages [kernel] 0x6a (0xdab49e28))
[<c01637ee>] iput [kernel] 0x19e (0xdab49e40))
[<c0160ac4>] dput [kernel] 0xb4 (0xdab49e5c))
[<c014b4e4>] fput [kernel] 0xe4 (0xdab49e70))
[<c014967e>] filp_close [kernel] 0x8e (0xdab49e8c))
[<c01254ac>] close_files [kernel] 0x7c (0xdab49ea8))
[<c0124358>] put_files_struct [kernel] 0x28 (0xdab49ec8))
[<c0124c00>] do_exit [kernel] 0x130 (0xdab49ed8))
[<c012ba03>] sig_exit [kernel] 0xc3 (0xdab49ef4))
[<c012bc14>] dequeue_signal [kernel] 0x64 (0xdab49efc))
[<c010918f>] do_signal [kernel] 0x20f (0xdab49f14))
[<c0160a40>] dput [kernel] 0x30 (0xdab49f44))
[<c01f85e7>] sock_read [kernel] 0xa7 (0xdab49f50))
[<c014a2a9>] sys_read [kernel] 0x109 (0xdab49f94))
[<c0109498>] signal_return [kernel] 0x14 (0xdab49fc0))

/proc/$pid/mem doesn't exist for all of these guys:

open("/proc/31696/mem", O_RDONLY|O_LARGEFILE) = 3
read(3, 0x804cc30, 4096)                = -1 ESRCH (No such process)

I'll work around the stuck umls for now; I'll be glad to provide any more
information.  I also wont shed a tear if this is deemed uninteresting due to the
presence of the nvidia drivers.

Version-Release number of selected component (if applicable):
smp-2.4.18-18.8.0.athlon

How reproducible:
Sometimes

Steps to Reproduce:
1.  run uml
2.  spin on 'uml_mconsole version' on the running uml
3.  'killall linux'  and see if the mconsole and tracing thread went 'D'

Comment 1 Michael Lee Yohe 2002-12-27 20:05:09 UTC

If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does
this problem still occur?

If it does not - then you've probably found the culprit (which would be a
duplicate of 78616 or the nVidia binary-only module bug equivalent).  I think
Mike Harris pointed out (or maybe Arjan) that the nVidia drivers are compiled
with a gcc of an old egcs release (2.91?) and will create inherently bad joojoo
when used with the gcc 3.2 compiled kernel.

Comment 2 Brown, Zach 2002-12-27 20:21:30 UTC

> If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does
> this problem still occur?

unless I get a new video card I'm not interested in finding that out.

this bug report isn't driven by a desire to see RH work on fixing the bug.   Its
just a heads-up.  If the problem is easily attributible to the binary nvidia
drivers (compiler mismatches -- _awesome_) then the bug can be closed or marked
a clone of the catch-all-nvidia bug, I imagine.

Comment 3 Alan Cox 2002-12-30 14:19:43 UTC

UML sigio is definitely a problem on SMP. The latest and greatest UML seems to
have all that fixed but thats rather newer than RH8 and needs some kernel addons
Linus still wants cleaning up further. 

The other stuff has still only been seen on the Nvidia module cases

If you are playing seriously with UML look at building a 2.4.20 series tree with
the UML patched host kernel - you get a lot lot more performance.


*** This bug has been marked as a duplicate of 78616 ***

Comment 4 Red Hat Bugzilla 2006-02-21 18:50:38 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.