Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 235164

Summary:

[PATCH] System hang caused by endless loop in create_buffers()

Product:

Red Hat Enterprise Linux 3

Reporter:

Ryutaro Hayashi <ryutaro.hayashi>

Component:

Assignee:

Nalin Dahyabhai <nalin>

Status:

CLOSED WONTFIX

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.8

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-10-19 18:37:38 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
The patch file for slab_usable_pages()	none

Description Ryutaro Hayashi 2007-04-04 05:41:26 UTC

Description of problem:

Machine: SunFire X4200
CPU    : Dual-Core AMD Opteron(tm) Processor 2220 SE X 2
OS     : RHEL3U8
Memory : 32GB

Our customer run into system hang due to endless loop in create_buffers().
Fortunately, the system has NMI switch and we were able to get crash dump 
after the hang. And then we analyzed the dump.

The issue happens if system is in memory shortage. kswapd try to make 
free pages by do_try_to_free_pages_kswapd(), and call rebalance_dirty_zone().
But, if there is no free pages for buffer_head which is used to write 
back dirty pages, the system run into endless loop in create_buffers().

The flow is shown below:

------------------------------------------------------------

  kswapd()
  --> do_try_to_free_pages_kswapd()
      --> rebalance_dirty_zone()
          max_loop = zone->inactive_dirty_pages;
          if (max_loop < BATCH_WORK_AMOUNT)
          	max_loop = BATCH_WORK_AMOUNT;
          lru_lock(zone);
          while (max_loop-- && !list_empty(&zone->inactive_dirty_list))
          {
            --> launder_page()
                ...
                  --> create_buffers()
                      ...
                      try_again:                       <-----------+
                        bh = get_unused_buffer_head(async);        |
                        if (!bh)                                   |
                        	goto no_grow;          --------+   |
                        ...                                    |   |
                      no_grow:                         <-------+   |
                        free_more_memory()                         |
                        goto try_again;                ------------+
          }
          lru_unlock(zone);


------------------------------------------------------------

Basically, this issue was fixed in launder_page() of RHEL3U6.
I'm not sure the Bug ID. But, after the fix, launder_page() never 
call swap_writepage() if there is no unused buffer_head or free pages. 
It looks the issue has gone.

RHEL3U5:
          if ((gfp_mask & __GFP_FS) && writepage) {
                  ClearPageDirty(page);
                  SetPageLaunder(page);
                  lru_unlock(zone);

                  writepage(page);

RHEL3U6 or later:

           if ((gfp_mask & __GFP_FS) && writepage &&
                           (page->buffers || slab_usable_pages(zone))) {
                                             ^^^^^^^^^^^^^^^^^^^^^^^check free
pages for buffer_head
                   ClearPageDirty(page);
                   SetPageLaunder(page);
                   lru_unlock(zone);

                   writepage(page);


BUT, the issue still happens due to check miss of free page in 
slab_usable_pages().


ROOT CAUSE
==========

In the slab_usable_pages(), this routine check free_pages of DMA zone.
But, free of Normal Zone should be checked. Currently the DMA zone is 
pointed by macro ZONE_NORMAL(1) as index number of node_zonelists[].
node_zonelists[1] is "DMA" zone. So this must be zero in order to point 
Normal zone since node_zonelists[0] is Normal zone.

Due to above reason, launder_page() think there are free pages since 
DMA zone has free pages, which is not able to use for buffer_head, and then 
start to swap out even if there is no free page for buffer_head in Norma zone, 
and result in infinite loop of create_buffers() and system hang.  

If slab_usable_pages() check free of Normal zone correctly, swap_writepage()
is never called if there is no free page, and kswapd would continue to make
free pages. This means system hang does not happen in this condition.

Here is the code which check free_pages of DMA zone by mistake.

mm/vmscan.c(RHEL3U8):

292 static int slab_usable_pages(zone_t * inzone)
293 {
294         pg_data_t *pgdat;
295         zonelist_t *zonelist;
296         zone_t **zone;
297 
298         /* fast path to prevent looking at other zones */
299 #if defined(CONFIG_IA64) || !defined(CONFIG_HIGHMEM)
300         if (inzone->free_pages)
301                 return 1;
302 #else
303         if (inzone->zone_pgdat->node_zones[ZONE_NORMAL].free_pages)
304                 return 1;
305 #endif
306         if (inzone - inzone->zone_pgdat->node_zones <= ZONE_NORMAL &&
307             inzone->free_pages)
308                 return 1;
309 
310         /* slow path */
311         for_each_pgdat(pgdat) {
312                 zonelist = pgdat->node_zonelists +
313 #if defined(CONFIG_IA64)
314                         ZONE_HIGHMEM;
315 #else
316                         ZONE_NORMAL;  <--- This is wrong. Should be zero.
317 #endif
318                 zone = zonelist->zones;
319                 if (*zone) {
320                         for (;;) {
321                                 zone_t *z = *(zone++);
322                                 if (!z)
323                                         break;
324                                 if (z->free_pages)
325                                         return 1;
326                         }
327                 }
328         }
329         return 0;
330 }

include/linux/mmzone.h:

136 #define ZONE_DMA                0
137 #define ZONE_NORMAL             1
138 #define ZONE_HIGHMEM            2
139 #define MAX_NR_ZONES            3



At the last, here is the result of dump analysis.


crash> bt
PID: 11     TASK: 10037f34000       CPU: 1   COMMAND: "kswapd"
 #0 [1081fffed90] disk_dump at ffffffffa00a6ee4
 #1 [1081fffee28] .text.lock.sched at ffffffff80122283
 #2 [1081fffee80] try_crashdump at ffffffff80124c9f
 #3 [1081fffee90] die_nmi at ffffffff80112605
 #4 [1081fffeeb0] default_do_nmi at ffffffff80112705
 #5 [1081fffef50] nmi at ffffffff80110f44    <------------- NMI switch pressed.
    [exception RIP: .text.lock.sched+23]
    RIP: ffffffff80122283  RSP: 0000010037f35c98  RFLAGS: 00000086
    RAX: 0000000000000001  RBX: 0000010037f30000  RCX: 0000000000002580
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000010037f30000
    RBP: 0000010037f35cc8   R8: 000000000000000b   R9: 000001000002b8b0
    R10: 0000000000000000  R11: 000001000002ac80  R12: ffffffff805f1800
    R13: 0000000000000000  R14: 0000010037f35c98  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <exception stack> ---
 #6 [10037f35c98] .text.lock.sched at ffffffff80122283
 #7 [10037f35cd0] __wake_up at ffffffff801206ef
 #8 [10037f35d20] free_more_memory at ffffffff801627bd
 #9 [10037f35d30] create_buffers at ffffffff8016329b    <------- loop in
create_buffers
#10 [10037f35d60] create_empty_buffers at ffffffff80163478
#11 [10037f35d80] brw_page at ffffffff801650ad
#12 [10037f35dd0] rw_swap_page_base at ffffffff80155d67
#13 [10037f35e50] rw_swap_page at ffffffff80155dba
#14 [10037f35e60] swap_writepage at ffffffff801579a4
#15 [10037f35e70] launder_page at ffffffff80150a70
#16 [10037f35eb0] rebalance_dirty_zone at ffffffff801540c0
#17 [10037f35ef0] do_try_to_free_pages_kswapd at ffffffff80154834
#18 [10037f35f20] kswapd at ffffffff80154e52
#19 [10037f35f50] kernel_thread at ffffffff80110d11

### page->buffers is NULL(this page does not have buffer_head)
crash> struct page.buffers 10012efc1e8
  buffers = 0x0

### unused buffer is zero. 
crash> rd nr_unused_buffer_heads
ffffffff805210b8:  0000000000000000                    ........

### There is no free pages except for DMA zone.
crash> kmem -f
NODE
  0
ZONE  NAME        SIZE    FREE      MEM_MAP       START_PADDR  START_MAPNR
  0   DMA         4096    2459    10001000040          0          161320  
                          ^^^^
ZONE  NAME        SIZE    FREE      MEM_MAP       START_PADDR  START_MAPNR
  1   Normal    4321279       0    10001068040       1000000       165416  

ZONE  NAME        SIZE    FREE      MEM_MAP       START_PADDR  START_MAPNR
  2   HighMem        0       0         0               0            0     

--------------------------------------------------------------------------

NODE
  1
ZONE  NAME        SIZE    FREE      MEM_MAP       START_PADDR  START_MAPNR
  0   DMA            0       0         0               0            0     

ZONE  NAME        SIZE    FREE      MEM_MAP       START_PADDR  START_MAPNR
  1   Normal    4194303       0    10420083030      420000000    170358430 

ZONE  NAME        SIZE    FREE      MEM_MAP       START_PADDR  START_MAPNR
  2   HighMem        0       0         0               0            0     

nr_free_pages: 2459  (verified)

Since there are free pages for DMA zone as you can see, launder_page() call 
swap_writepage() and this cause system hang.


Version-Release number of selected component (if applicable):


How reproducible:

Customer has test program which can reproduce this issue, but we cannot get it
due to lisence reason.

Steps to Reproduce:
1. Just run memory test program and wait one or two days.
2.
3.
  
Actual results:


Expected results:


Additional info:

The patch for this issue is attached.

Comment 1 Ryutaro Hayashi 2007-04-04 05:41:26 UTC

Created attachment 151643 [details]
The patch file for  slab_usable_pages()

Comment 2 RHEL Program Management 2007-10-19 18:37:38 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.