Red Hat Bugzilla – Bug 80532
processes in D state around iput or vfs or something
Last modified: 2007-04-18 12:49:20 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021018
Description of problem:
mgalgoci asked that I bugzilla this, but it may not be useful as this machine is
using the nvidia kernel modules (no, really, for opengl functionality.). the
machine is a dual athlon 1800 running smp athlon 2.4.18-18.8.0 (with the nvidia
poo relinked with their source rpm shim).
Linux kiyoko 2.4.18-18.8.0smp #1 SMP Wed Nov 13 22:22:40 EST 2002 i686 athlon
I make heavy use of UML in day-to-day development work. UML still exhibits the
mysterious hang bug (missing sigio signals?) which can be hacked around by
having a script in the background which runs 'uml_mconsole version' on the umls
regularly. This seems to stop them from being stuck.
sometimes killall or 'halt' in the uml console will push the uml_console and
kernel tracing thread of uml into D state;
zab 31696 0.0 0.0 1756 444 pts/9 D 10:16 0:00 uml_mconsole uml0
zab 9035 0.0 8.1 115292 83972 ? D Dec26 0:00
/var/zab/l/linux-2.4.18-18.8.0-l5/linux (uml0) [(kernel thread)]
linux D C03AD240 0 9035 1 31696 25904 (NOTLB)
Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdaf2decc))
[<c0136235>] truncate_list_pages [kernel] 0x215 (0xdaf2def4))
[<c01362ed>] truncate_inode_pages [kernel] 0x5d (0xdaf2df34))
[<c01637ee>] iput [kernel] 0x19e (0xdaf2df4c))
[<c0158a25>] vfs_unlink [kernel] 0x185 (0xdaf2df68))
[<c0158c99>] sys_unlink [kernel] 0x119 (0xdaf2df84))
[<c010945f>] system_call [kernel] 0x33 (0xdaf2dfc0))
uml_mconsole D C03AD240 0 31696 1 31980 9035 (NOTLB)
Call Trace: [<c0107e32>] __down [kernel] 0x82 (0xdb289e78))
[<c0107fcc>] __down_failed [kernel] 0x8 (0xdb289e9c))
[<c015a906>] .text.lock.namei [kernel] 0x35 (0xdb289eac))
[<c0156b39>] link_path_walk [kernel] 0x459 (0xdb289ecc))
[<c01571d9>] path_lookup [kernel] 0x39 (0xdb289f0c))
[<c0157529>] __user_walk [kernel] 0x49 (0xdb289f1c))
[<c0152f2f>] vfs_stat [kernel] 0x1f (0xdb289f38))
[<c013561f>] do_munmap [kernel] 0x2cf (0xdb289f58))
[<c01535ab>] sys_stat64 [kernel] 0x1b (0xdb289f70))
[<c012adbe>] update_process_times [kernel] 0x3e (0xdb289f8c))
[<c011b6f0>] do_page_fault [kernel] 0x0 (0xdb289fb0))
[<c0109550>] error_code [kernel] 0x34 (0xdb289fb8))
[<c010945f>] system_call [kernel] 0x33 (0xdb289fc0))
while poking around at this, I noticed another D uml that has been around for
zab 28907 0.0 0.0 0 0 ? DW Dec12 0:00 [linux]
linux D C03AC880 0 28907 1 16544 850 (L-TLB)
Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdab49dc0))
[<c0136235>] truncate_list_pages [kernel] 0x215 (0xdab49de8))
[<c013ee7e>] lru_cache_del [kernel] 0xe (0xdab49e04))
[<c01362fa>] truncate_inode_pages [kernel] 0x6a (0xdab49e28))
[<c01637ee>] iput [kernel] 0x19e (0xdab49e40))
[<c0160ac4>] dput [kernel] 0xb4 (0xdab49e5c))
[<c014b4e4>] fput [kernel] 0xe4 (0xdab49e70))
[<c014967e>] filp_close [kernel] 0x8e (0xdab49e8c))
[<c01254ac>] close_files [kernel] 0x7c (0xdab49ea8))
[<c0124358>] put_files_struct [kernel] 0x28 (0xdab49ec8))
[<c0124c00>] do_exit [kernel] 0x130 (0xdab49ed8))
[<c012ba03>] sig_exit [kernel] 0xc3 (0xdab49ef4))
[<c012bc14>] dequeue_signal [kernel] 0x64 (0xdab49efc))
[<c010918f>] do_signal [kernel] 0x20f (0xdab49f14))
[<c0160a40>] dput [kernel] 0x30 (0xdab49f44))
[<c01f85e7>] sock_read [kernel] 0xa7 (0xdab49f50))
[<c014a2a9>] sys_read [kernel] 0x109 (0xdab49f94))
[<c0109498>] signal_return [kernel] 0x14 (0xdab49fc0))
/proc/$pid/mem doesn't exist for all of these guys:
open("/proc/31696/mem", O_RDONLY|O_LARGEFILE) = 3
read(3, 0x804cc30, 4096) = -1 ESRCH (No such process)
I'll work around the stuck umls for now; I'll be glad to provide any more
information. I also wont shed a tear if this is deemed uninteresting due to the
presence of the nvidia drivers.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run uml
2. spin on 'uml_mconsole version' on the running uml
3. 'killall linux' and see if the mconsole and tracing thread went 'D'
If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does
this problem still occur?
If it does not - then you've probably found the culprit (which would be a
duplicate of 78616 or the nVidia binary-only module bug equivalent). I think
Mike Harris pointed out (or maybe Arjan) that the nVidia drivers are compiled
with a gcc of an old egcs release (2.91?) and will create inherently bad joojoo
when used with the gcc 3.2 compiled kernel.
> If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does
> this problem still occur?
unless I get a new video card I'm not interested in finding that out.
this bug report isn't driven by a desire to see RH work on fixing the bug. Its
just a heads-up. If the problem is easily attributible to the binary nvidia
drivers (compiler mismatches -- _awesome_) then the bug can be closed or marked
a clone of the catch-all-nvidia bug, I imagine.
UML sigio is definitely a problem on SMP. The latest and greatest UML seems to
have all that fixed but thats rather newer than RH8 and needs some kernel addons
Linus still wants cleaning up further.
The other stuff has still only been seen on the Nvidia module cases
If you are playing seriously with UML look at building a 2.4.20 series tree with
the UML patched host kernel - you get a lot lot more performance.
*** This bug has been marked as a duplicate of 78616 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.