Description of problem: Any calls to ps or top, while another process is dumping core, will hang in rw_sem_down_read_failed Jul 30 12:45:09 rgmldap30 kernel: ps D 00000000 0 9539 9538 9540 (NOTLB) Jul 30 12:45:09 rgmldap30 kernel: Call Trace: [rwsem_down_read_failed+325/368] rwsem_down_read_failed [kernel] 0x145 (0xe3febee8) Jul 30 12:45:09 rgmldap30 kernel: Call Trace: [<c022d3c5>] rwsem_down_read_failed [kernel] 0x145 (0xe3febee8) Jul 30 12:45:09 rgmldap30 kernel: [stext_lock+15326/33696] stext_lock [kernel] 0x3bde (0xe3febf14) Jul 30 12:45:09 rgmldap30 kernel: [<c023cbde>] stext_lock [kernel] 0x3bde (0xe3febf14) Jul 30 12:45:09 rgmldap30 kernel: [__alloc_pages+15/160] __alloc_pages [kernel] 0xf (0xe3febf40) Jul 30 12:45:09 rgmldap30 kernel: [<c013ebbf>] __alloc_pages [kernel] 0xf (0xe3febf40) Jul 30 12:45:09 rgmldap30 kernel: [proc_info_read+76/256] proc_info_read [kernel] 0x4c (0xe3febf58) Jul 30 12:45:09 rgmldap30 kernel: [<c01692fc>] proc_info_read [kernel] 0x4c (0xe3febf58) Jul 30 12:45:09 rgmldap30 kernel: [sys_read+150/288] sys_read [kernel] 0x96 (0xe3febf7c) Jul 30 12:45:09 rgmldap30 kernel: [<c0146c66>] sys_read [kernel] 0x96 (0xe3febf7c) Jul 30 12:45:09 rgmldap30 kernel: [sys_open+149/224] sys_open [kernel] 0x95 (0xe3febfa4) Jul 30 12:45:10 rgmldap30 kernel: [<c0146675>] sys_open [kernel] 0x95 (0xe3febfa4) Jul 30 12:45:10 rgmldap30 kernel: [system_call+51/56] system_call [kernel] 0x33 (0xe3febfc0) Jul 30 12:45:10 rgmldap30 kernel: [<c01073e3>] system_call [kernel] 0x33 (0xe3febfc0) How reproducible: Always Steps to Reproduce: 0. ulimit -c unlimited 1.run a program which mmap()s a 2 gb file 2. kill -SIGSEGV $! 3. ps auxw Additional info: Workaround: set coresize to 0
Created attachment 103188 [details] coredeadlock patch Changes read sem to write sem in coredump path. by bert.barbe
From: Arjan van de Ven
From: Arjan van de Ven read semaphores are *NOT* recursive. In the 2.4.9 era we used to have a boatload of issues with this semaphore being taken recursively; I thought we had all of them fixed but either one came back or one missed the as2.1 branch... From: Stephen Tweedie And worse, they break sporadically and unpredictably. Unless another thread comes in with a down_write() between the two recursive down_read()s, everything _appears_ to be working fine.
Migrating discussion into bugzilla :)
i've tried the test program and, ps does hang, but eventually returns when the core file is written. The 'rwsem_down_read_failed' message in and off itself is not a problem, it just means that we didn't immediately acquire the semaphore. perhaps, it is poorly named.
Greg, when you tested, didn't you see an actual deadlock - in other words, the coredump never finishing ?
This has been open for more than two months. I'm closing it. If its really an issue pls re-open.
Sorry, it took forever for these folks to get back to us and install the kernel patch (I think it just happened). In any event, I did not see the same behavior that you did with your testcase, either.. Thus far, they are very happy with the hacked together coredeadlock patch which is attached to this bugzilla...
REOPENED status has been deprecated. ASSIGNED with keyword of Reopened is preferred.