Bug 127615
Summary: | 2.6.7: cfq io scheduler paniced? | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kaj J. Niemi <kajtzu> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-07-29 09:21:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Kaj J. Niemi
2004-07-10 23:54:51 UTC
Created attachment 101783 [details]
one of the two pdflushes vanished
There are two pdflush pseudo-processes running of which one died according to
the panic message.
Created attachment 101784 [details]
spectacular load spike
the load started going up at the same time
Created attachment 101785 [details]
meanwhile interface traffic dropped but did not stop completely
Created attachment 101786 [details]
rapid rise of tcp sessions established and in close wait
tcp sessions in time wait are not shown as they would throw off the graph.
Created attachment 101787 [details]
just the time wait sessions
tcp sessions in time wait dropped to zero at the same time close wait sessions
flatlined (about 00:20 local time in the graph)
Happened again on another otherwise identical system. Unfortunately the console was dead so there wasn't anything on it. The following got logged to syslog, though. kernel: Debug: sleeping function called from invalid context at mm/mempool.c:197 kernel: in_atomic():0[expected: 0], irqs_disabled():1 kernel: [<0211f978>] __might_sleep+0x7d/0x87 kernel: [<0213f8f3>] mempool_alloc+0x6a/0x198 kernel: [<021441e6>] poison_obj+0x1d/0x3d kernel: [<0211ff27>] autoremove_wake_function+0x0/0x2d kernel: [<02145982>] cache_alloc_debugcheck_after+0xcf/0x103 kernel: [<0211ff27>] autoremove_wake_function+0x0/0x2d kernel: [<0213f904>] mempool_alloc+0x7b/0x198 kernel: [<02220c34>] __cfq_get_queue+0x53/0x98 kernel: [<02220cc8>] cfq_get_queue+0x4f/0x86 kernel: [<02220f95>] cfq_set_request+0x20/0x63 kernel: [<02220f75>] cfq_set_request+0x0/0x63 kernel: [<02218107>] elv_set_request+0xa/0x17 kernel: [<02219c82>] get_request+0x18b/0x2b0 kernel: [<02219e24>] get_request_wait+0x7d/0xb9 kernel: [<0211ff27>] autoremove_wake_function+0x0/0x2d Seems to be rather unreliably reproduceable just by installing a new kernel with "rpm -ivh kernel-*.rpm" with 476, 478 and 481 kernels. In order to be more certain and isolate the problem, can you try booting with the anticipator or deadline elevator instead and see if it survives? Would that be elevator=anticipatory and/or elevator=deadline ? Hmmm I think anticipatory is actually "elevator=as". Booting with elevator=deadline has had the server up for 2+ days with simulated load (about 600 java threads, net-snmp full table walks against it). I installed 492, booted with elevator=deadline and will see what happens. Btw, all the panicing systems are Supermicro 6013P-T systems. A lot of companies also OEM these and sell them as their own. With elevator=deadline the uptimes are now around 12+ days If this issue is not resolved with the latest rawhide kernels, you can help by bringing this report to the attention of upstream lkml and the CFQ author Jens Axboe. |