Bug 235164
| Summary: | [PATCH] System hang caused by endless loop in create_buffers() | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Ryutaro Hayashi <ryutaro.hayashi> | ||||
| Component: | mm | Assignee: | Nalin Dahyabhai <nalin> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.8 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2007-10-19 18:37:38 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 151643 [details]
The patch file for slab_usable_pages()
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |
Description of problem: Machine: SunFire X4200 CPU : Dual-Core AMD Opteron(tm) Processor 2220 SE X 2 OS : RHEL3U8 Memory : 32GB Our customer run into system hang due to endless loop in create_buffers(). Fortunately, the system has NMI switch and we were able to get crash dump after the hang. And then we analyzed the dump. The issue happens if system is in memory shortage. kswapd try to make free pages by do_try_to_free_pages_kswapd(), and call rebalance_dirty_zone(). But, if there is no free pages for buffer_head which is used to write back dirty pages, the system run into endless loop in create_buffers(). The flow is shown below: ------------------------------------------------------------ kswapd() --> do_try_to_free_pages_kswapd() --> rebalance_dirty_zone() max_loop = zone->inactive_dirty_pages; if (max_loop < BATCH_WORK_AMOUNT) max_loop = BATCH_WORK_AMOUNT; lru_lock(zone); while (max_loop-- && !list_empty(&zone->inactive_dirty_list)) { --> launder_page() ... --> create_buffers() ... try_again: <-----------+ bh = get_unused_buffer_head(async); | if (!bh) | goto no_grow; --------+ | ... | | no_grow: <-------+ | free_more_memory() | goto try_again; ------------+ } lru_unlock(zone); ------------------------------------------------------------ Basically, this issue was fixed in launder_page() of RHEL3U6. I'm not sure the Bug ID. But, after the fix, launder_page() never call swap_writepage() if there is no unused buffer_head or free pages. It looks the issue has gone. RHEL3U5: if ((gfp_mask & __GFP_FS) && writepage) { ClearPageDirty(page); SetPageLaunder(page); lru_unlock(zone); writepage(page); RHEL3U6 or later: if ((gfp_mask & __GFP_FS) && writepage && (page->buffers || slab_usable_pages(zone))) { ^^^^^^^^^^^^^^^^^^^^^^^check free pages for buffer_head ClearPageDirty(page); SetPageLaunder(page); lru_unlock(zone); writepage(page); BUT, the issue still happens due to check miss of free page in slab_usable_pages(). ROOT CAUSE ========== In the slab_usable_pages(), this routine check free_pages of DMA zone. But, free of Normal Zone should be checked. Currently the DMA zone is pointed by macro ZONE_NORMAL(1) as index number of node_zonelists[]. node_zonelists[1] is "DMA" zone. So this must be zero in order to point Normal zone since node_zonelists[0] is Normal zone. Due to above reason, launder_page() think there are free pages since DMA zone has free pages, which is not able to use for buffer_head, and then start to swap out even if there is no free page for buffer_head in Norma zone, and result in infinite loop of create_buffers() and system hang. If slab_usable_pages() check free of Normal zone correctly, swap_writepage() is never called if there is no free page, and kswapd would continue to make free pages. This means system hang does not happen in this condition. Here is the code which check free_pages of DMA zone by mistake. mm/vmscan.c(RHEL3U8): 292 static int slab_usable_pages(zone_t * inzone) 293 { 294 pg_data_t *pgdat; 295 zonelist_t *zonelist; 296 zone_t **zone; 297 298 /* fast path to prevent looking at other zones */ 299 #if defined(CONFIG_IA64) || !defined(CONFIG_HIGHMEM) 300 if (inzone->free_pages) 301 return 1; 302 #else 303 if (inzone->zone_pgdat->node_zones[ZONE_NORMAL].free_pages) 304 return 1; 305 #endif 306 if (inzone - inzone->zone_pgdat->node_zones <= ZONE_NORMAL && 307 inzone->free_pages) 308 return 1; 309 310 /* slow path */ 311 for_each_pgdat(pgdat) { 312 zonelist = pgdat->node_zonelists + 313 #if defined(CONFIG_IA64) 314 ZONE_HIGHMEM; 315 #else 316 ZONE_NORMAL; <--- This is wrong. Should be zero. 317 #endif 318 zone = zonelist->zones; 319 if (*zone) { 320 for (;;) { 321 zone_t *z = *(zone++); 322 if (!z) 323 break; 324 if (z->free_pages) 325 return 1; 326 } 327 } 328 } 329 return 0; 330 } include/linux/mmzone.h: 136 #define ZONE_DMA 0 137 #define ZONE_NORMAL 1 138 #define ZONE_HIGHMEM 2 139 #define MAX_NR_ZONES 3 At the last, here is the result of dump analysis. crash> bt PID: 11 TASK: 10037f34000 CPU: 1 COMMAND: "kswapd" #0 [1081fffed90] disk_dump at ffffffffa00a6ee4 #1 [1081fffee28] .text.lock.sched at ffffffff80122283 #2 [1081fffee80] try_crashdump at ffffffff80124c9f #3 [1081fffee90] die_nmi at ffffffff80112605 #4 [1081fffeeb0] default_do_nmi at ffffffff80112705 #5 [1081fffef50] nmi at ffffffff80110f44 <------------- NMI switch pressed. [exception RIP: .text.lock.sched+23] RIP: ffffffff80122283 RSP: 0000010037f35c98 RFLAGS: 00000086 RAX: 0000000000000001 RBX: 0000010037f30000 RCX: 0000000000002580 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000010037f30000 RBP: 0000010037f35cc8 R8: 000000000000000b R9: 000001000002b8b0 R10: 0000000000000000 R11: 000001000002ac80 R12: ffffffff805f1800 R13: 0000000000000000 R14: 0000010037f35c98 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <exception stack> --- #6 [10037f35c98] .text.lock.sched at ffffffff80122283 #7 [10037f35cd0] __wake_up at ffffffff801206ef #8 [10037f35d20] free_more_memory at ffffffff801627bd #9 [10037f35d30] create_buffers at ffffffff8016329b <------- loop in create_buffers #10 [10037f35d60] create_empty_buffers at ffffffff80163478 #11 [10037f35d80] brw_page at ffffffff801650ad #12 [10037f35dd0] rw_swap_page_base at ffffffff80155d67 #13 [10037f35e50] rw_swap_page at ffffffff80155dba #14 [10037f35e60] swap_writepage at ffffffff801579a4 #15 [10037f35e70] launder_page at ffffffff80150a70 #16 [10037f35eb0] rebalance_dirty_zone at ffffffff801540c0 #17 [10037f35ef0] do_try_to_free_pages_kswapd at ffffffff80154834 #18 [10037f35f20] kswapd at ffffffff80154e52 #19 [10037f35f50] kernel_thread at ffffffff80110d11 ### page->buffers is NULL(this page does not have buffer_head) crash> struct page.buffers 10012efc1e8 buffers = 0x0 ### unused buffer is zero. crash> rd nr_unused_buffer_heads ffffffff805210b8: 0000000000000000 ........ ### There is no free pages except for DMA zone. crash> kmem -f NODE 0 ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR 0 DMA 4096 2459 10001000040 0 161320 ^^^^ ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR 1 Normal 4321279 0 10001068040 1000000 165416 ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR 2 HighMem 0 0 0 0 0 -------------------------------------------------------------------------- NODE 1 ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR 0 DMA 0 0 0 0 0 ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR 1 Normal 4194303 0 10420083030 420000000 170358430 ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR 2 HighMem 0 0 0 0 0 nr_free_pages: 2459 (verified) Since there are free pages for DMA zone as you can see, launder_page() call swap_writepage() and this cause system hang. Version-Release number of selected component (if applicable): How reproducible: Customer has test program which can reproduce this issue, but we cannot get it due to lisence reason. Steps to Reproduce: 1. Just run memory test program and wait one or two days. 2. 3. Actual results: Expected results: Additional info: The patch for this issue is attached.