From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021018 Description of problem: mgalgoci asked that I bugzilla this, but it may not be useful as this machine is using the nvidia kernel modules (no, really, for opengl functionality.). the machine is a dual athlon 1800 running smp athlon 2.4.18-18.8.0 (with the nvidia poo relinked with their source rpm shim). Linux kiyoko 2.4.18-18.8.0smp #1 SMP Wed Nov 13 22:22:40 EST 2002 i686 athlon i386 GNU/Linux I make heavy use of UML in day-to-day development work. UML still exhibits the mysterious hang bug (missing sigio signals?) which can be hacked around by having a script in the background which runs 'uml_mconsole version' on the umls regularly. This seems to stop them from being stuck. sometimes killall or 'halt' in the uml console will push the uml_console and kernel tracing thread of uml into D state; zab 31696 0.0 0.0 1756 444 pts/9 D 10:16 0:00 uml_mconsole uml0 version zab 9035 0.0 8.1 115292 83972 ? D Dec26 0:00 /var/zab/l/linux-2.4.18-18.8.0-l5/linux (uml0) [(kernel thread)] linux D C03AD240 0 9035 1 31696 25904 (NOTLB) Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdaf2decc)) [<c0136235>] truncate_list_pages [kernel] 0x215 (0xdaf2def4)) [<c01362ed>] truncate_inode_pages [kernel] 0x5d (0xdaf2df34)) [<c01637ee>] iput [kernel] 0x19e (0xdaf2df4c)) [<c0158a25>] vfs_unlink [kernel] 0x185 (0xdaf2df68)) [<c0158c99>] sys_unlink [kernel] 0x119 (0xdaf2df84)) [<c010945f>] system_call [kernel] 0x33 (0xdaf2dfc0)) uml_mconsole D C03AD240 0 31696 1 31980 9035 (NOTLB) Call Trace: [<c0107e32>] __down [kernel] 0x82 (0xdb289e78)) [<c0107fcc>] __down_failed [kernel] 0x8 (0xdb289e9c)) [<c015a906>] .text.lock.namei [kernel] 0x35 (0xdb289eac)) [<c0156b39>] link_path_walk [kernel] 0x459 (0xdb289ecc)) [<c01571d9>] path_lookup [kernel] 0x39 (0xdb289f0c)) [<c0157529>] __user_walk [kernel] 0x49 (0xdb289f1c)) [<c0152f2f>] vfs_stat [kernel] 0x1f (0xdb289f38)) [<c013561f>] do_munmap [kernel] 0x2cf (0xdb289f58)) [<c01535ab>] sys_stat64 [kernel] 0x1b (0xdb289f70)) [<c012adbe>] update_process_times [kernel] 0x3e (0xdb289f8c)) [<c011b6f0>] do_page_fault [kernel] 0x0 (0xdb289fb0)) [<c0109550>] error_code [kernel] 0x34 (0xdb289fb8)) [<c010945f>] system_call [kernel] 0x33 (0xdb289fc0)) while poking around at this, I noticed another D uml that has been around for ages :) zab 28907 0.0 0.0 0 0 ? DW Dec12 0:00 [linux] linux D C03AC880 0 28907 1 16544 850 (L-TLB) Call Trace: [<c0136c49>] ___wait_on_page [kernel] 0x99 (0xdab49dc0)) [<c0136235>] truncate_list_pages [kernel] 0x215 (0xdab49de8)) [<c013ee7e>] lru_cache_del [kernel] 0xe (0xdab49e04)) [<c01362fa>] truncate_inode_pages [kernel] 0x6a (0xdab49e28)) [<c01637ee>] iput [kernel] 0x19e (0xdab49e40)) [<c0160ac4>] dput [kernel] 0xb4 (0xdab49e5c)) [<c014b4e4>] fput [kernel] 0xe4 (0xdab49e70)) [<c014967e>] filp_close [kernel] 0x8e (0xdab49e8c)) [<c01254ac>] close_files [kernel] 0x7c (0xdab49ea8)) [<c0124358>] put_files_struct [kernel] 0x28 (0xdab49ec8)) [<c0124c00>] do_exit [kernel] 0x130 (0xdab49ed8)) [<c012ba03>] sig_exit [kernel] 0xc3 (0xdab49ef4)) [<c012bc14>] dequeue_signal [kernel] 0x64 (0xdab49efc)) [<c010918f>] do_signal [kernel] 0x20f (0xdab49f14)) [<c0160a40>] dput [kernel] 0x30 (0xdab49f44)) [<c01f85e7>] sock_read [kernel] 0xa7 (0xdab49f50)) [<c014a2a9>] sys_read [kernel] 0x109 (0xdab49f94)) [<c0109498>] signal_return [kernel] 0x14 (0xdab49fc0)) /proc/$pid/mem doesn't exist for all of these guys: open("/proc/31696/mem", O_RDONLY|O_LARGEFILE) = 3 read(3, 0x804cc30, 4096) = -1 ESRCH (No such process) I'll work around the stuck umls for now; I'll be glad to provide any more information. I also wont shed a tear if this is deemed uninteresting due to the presence of the nvidia drivers. Version-Release number of selected component (if applicable): smp-2.4.18-18.8.0.athlon How reproducible: Sometimes Steps to Reproduce: 1. run uml 2. spin on 'uml_mconsole version' on the running uml 3. 'killall linux' and see if the mconsole and tracing thread went 'D'
If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does this problem still occur? If it does not - then you've probably found the culprit (which would be a duplicate of 78616 or the nVidia binary-only module bug equivalent). I think Mike Harris pointed out (or maybe Arjan) that the nVidia drivers are compiled with a gcc of an old egcs release (2.91?) and will create inherently bad joojoo when used with the gcc 3.2 compiled kernel.
> If you do not use the nVidia kernel module (i.e. a stock Red Hat setup) does > this problem still occur? unless I get a new video card I'm not interested in finding that out. this bug report isn't driven by a desire to see RH work on fixing the bug. Its just a heads-up. If the problem is easily attributible to the binary nvidia drivers (compiler mismatches -- _awesome_) then the bug can be closed or marked a clone of the catch-all-nvidia bug, I imagine.
UML sigio is definitely a problem on SMP. The latest and greatest UML seems to have all that fixed but thats rather newer than RH8 and needs some kernel addons Linus still wants cleaning up further. The other stuff has still only been seen on the Nvidia module cases If you are playing seriously with UML look at building a 2.4.20 series tree with the UML patched host kernel - you get a lot lot more performance. *** This bug has been marked as a duplicate of 78616 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.