Bug 80532 - processes in D state around iput or vfs or something
processes in D state around iput or vfs or something
Status: CLOSED DUPLICATE of bug 78616
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
8.0
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-12-27 14:19 EST by Brown, Zach
Modified: 2007-04-18 12:49 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 13:50:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Brown, Zach 2002-12-27 14:19:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021018

Description of problem:
mgalgoci asked that I bugzilla this, but it may not be useful as this machine is
using the nvidia kernel modules (no, really, for opengl functionality.).  the
machine is a dual athlon 1800 running smp athlon 2.4.18-18.8.0 (with the nvidia
poo relinked with their source rpm shim).

Linux kiyoko 2.4.18-18.8.0smp #1 SMP Wed Nov 13 22:22:40 EST 2002 i686 athlon
i386 GNU/Linux

I make heavy use of UML in day-to-day development work.  UML still exhibits the
mysterious hang bug (missing sigio signals?) which can be hacked around by
having a script in the background which runs 'uml_mconsole version' on the umls
regularly.  This seems to stop them from being stuck.

sometimes killall or 'halt' in the uml console will push the uml_console and
kernel tracing thread of uml into D state;

zab      31696  0.0  0.0  1756  444 pts/9    D    10:16   0:00 uml_mconsole uml0
version
zab       9035  0.0  8.1 115292 83972 ?      D    Dec26   0:00
/var/zab/l/linux-2.4.18-18.8.0-l5/linux (uml0) [(kernel thread)]

linux         D C03AD240     0  9035      1         31696 25904 (NOTLB)
Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdaf2decc))
[<c0136235>] truncate_list_pages [kernel] 0x215 (0xdaf2def4))
[<c01362ed>] truncate_inode_pages [kernel] 0x5d (0xdaf2df34))
[<c01637ee>] iput [kernel] 0x19e (0xdaf2df4c))
[<c0158a25>] vfs_unlink [kernel] 0x185 (0xdaf2df68))
[<c0158c99>] sys_unlink [kernel] 0x119 (0xdaf2df84))
[<c010945f>] system_call [kernel] 0x33 (0xdaf2dfc0))

uml_mconsole  D C03AD240     0 31696      1         31980  9035 (NOTLB)
Call Trace: [<c0107e32>] __down [kernel] 0x82 (0xdb289e78))
[<c0107fcc>] __down_failed [kernel] 0x8 (0xdb289e9c))
[<c015a906>] .text.lock.namei [kernel] 0x35 (0xdb289eac))
[<c0156b39>] link_path_walk [kernel] 0x459 (0xdb289ecc))
[<c01571d9>] path_lookup [kernel] 0x39 (0xdb289f0c))
[<c0157529>] __user_walk [kernel] 0x49 (0xdb289f1c))
[<c0152f2f>] vfs_stat [kernel] 0x1f (0xdb289f38))
[<c013561f>] do_munmap [kernel] 0x2cf (0xdb289f58))
[<c01535ab>] sys_stat64 [kernel] 0x1b (0xdb289f70))
[<c012adbe>] update_process_times [kernel] 0x3e (0xdb289f8c))
[<c011b6f0>] do_page_fault [kernel] 0x0 (0xdb289fb0))
[<c0109550>] error_code [kernel] 0x34 (0xdb289fb8))
[<c010945f>] system_call [kernel] 0x33 (0xdb289fc0))

while poking around at this, I noticed another D uml that has been around for
ages :)

zab      28907  0.0  0.0     0    0 ?        DW   Dec12   0:00 [linux]

linux         D C03AC880     0 28907      1         16544   850 (L-TLB)
Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdab49dc0))
[<c0136235>] truncate_list_pages [kernel] 0x215 (0xdab49de8))
[<c013ee7e>] lru_cache_del [kernel] 0xe (0xdab49e04))
[<c01362fa>] truncate_inode_pages [kernel] 0x6a (0xdab49e28))
[<c01637ee>] iput [kernel] 0x19e (0xdab49e40))
[<c0160ac4>] dput [kernel] 0xb4 (0xdab49e5c))
[<c014b4e4>] fput [kernel] 0xe4 (0xdab49e70))
[<c014967e>] filp_close [kernel] 0x8e (0xdab49e8c))
[<c01254ac>] close_files [kernel] 0x7c (0xdab49ea8))
[<c0124358>] put_files_struct [kernel] 0x28 (0xdab49ec8))
[<c0124c00>] do_exit [kernel] 0x130 (0xdab49ed8))
[<c012ba03>] sig_exit [kernel] 0xc3 (0xdab49ef4))
[<c012bc14>] dequeue_signal [kernel] 0x64 (0xdab49efc))
[<c010918f>] do_signal [kernel] 0x20f (0xdab49f14))
[<c0160a40>] dput [kernel] 0x30 (0xdab49f44))
[<c01f85e7>] sock_read [kernel] 0xa7 (0xdab49f50))
[<c014a2a9>] sys_read [kernel] 0x109 (0xdab49f94))
[<c0109498>] signal_return [kernel] 0x14 (0xdab49fc0))

/proc/$pid/mem doesn't exist for all of these guys:

open("/proc/31696/mem", O_RDONLY|O_LARGEFILE) = 3
read(3, 0x804cc30, 4096)                = -1 ESRCH (No such process)

I'll work around the stuck umls for now; I'll be glad to provide any more
information.  I also wont shed a tear if this is deemed uninteresting due to the
presence of the nvidia drivers.

Version-Release number of selected component (if applicable):
smp-2.4.18-18.8.0.athlon

How reproducible:
Sometimes

Steps to Reproduce:
1.  run uml
2.  spin on 'uml_mconsole version' on the running uml
3.  'killall linux'  and see if the mconsole and tracing thread went 'D'
Comment 1 Michael Lee Yohe 2002-12-27 15:05:09 EST
If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does
this problem still occur?

If it does not - then you've probably found the culprit (which would be a
duplicate of 78616 or the nVidia binary-only module bug equivalent).  I think
Mike Harris pointed out (or maybe Arjan) that the nVidia drivers are compiled
with a gcc of an old egcs release (2.91?) and will create inherently bad joojoo
when used with the gcc 3.2 compiled kernel.
Comment 2 Brown, Zach 2002-12-27 15:21:30 EST
> If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does
> this problem still occur?

unless I get a new video card I'm not interested in finding that out.

this bug report isn't driven by a desire to see RH work on fixing the bug.   Its
just a heads-up.  If the problem is easily attributible to the binary nvidia
drivers (compiler mismatches -- _awesome_) then the bug can be closed or marked
a clone of the catch-all-nvidia bug, I imagine.
Comment 3 Alan Cox 2002-12-30 09:19:43 EST
UML sigio is definitely a problem on SMP. The latest and greatest UML seems to
have all that fixed but thats rather newer than RH8 and needs some kernel addons
Linus still wants cleaning up further. 

The other stuff has still only been seen on the Nvidia module cases

If you are playing seriously with UML look at building a 2.4.20 series tree with
the UML patched host kernel - you get a lot lot more performance.


*** This bug has been marked as a duplicate of 78616 ***
Comment 4 Red Hat Bugzilla 2006-02-21 13:50:38 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.