Bug 839529 - Stack gets close to overflowing when running a stress test
Stack gets close to overflowing when running a stress test
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.1
Unspecified Linux
unspecified Severity medium
: rc
: ---
Assigned To: Red Hat Kernel Manager
Boris Ranto
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-12 04:44 EDT by Boris Ranto
Modified: 2014-06-18 01:43 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 08:18:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Boris Ranto 2012-07-12 04:44:31 EDT
Description of problem:
Stack gets close to overflowing while running a stress test.

Version-Release number of selected component (if applicable):
kernel-3.3.0-0.14.el7.x86_64

How reproducible:
If the test runs long enough then always

Steps to Reproduce:
1. Run xfs/longterm-stress test case on a big (needed for the test) lvm device
TEST_PARAM_TEST_DEV=/dev/mapper/some_lvm_device make
2. Turn on stack tracer
echo 1 > /proc/sys/kernel/stack_tracer_enabled
3. Wait for 1+ days
4. Collect the stack info
cat /sys/kernel/debug/tracing/stack_trace
  
Actual results:
The stack gets close to overflowing.

Expected results:
The stack does not get close to overflowing.

Additional info:

It was suggested by Dave Chinner that this might have already been fixed upstream by following commit:
$ gl -n 1 f660378
commit f6603783f9f099bf7a83b3f6c689bbbf74f0e96e
Author: Jens Axboe <jaxboe@fusionio.com>
Date:   Fri Apr 15 15:49:07 2011 +0200

    block: only force kblockd unplugging from the schedule() path

    For the explicit unplugging, we'd prefer to kick things off
    immediately and not pay the penalty of the latency to switch
    to kblockd. So let blk_finish_plug() do the run inline, while
    the implicit-on-schedule-out unplug will punt to kblockd.

    Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

The stack_trace output after few hours:
        Depth    Size   Location    (62 entries)
        -----    ----   --------
  0)     6088      48   __module_address+0x45/0xe0
  1)     6040      32   __module_text_address+0x16/0x60
  2)     6008      32   is_module_text_address+0x27/0x40
  3)     5976      32   __kernel_text_address+0x58/0x80
  4)     5944     112   print_context_stack+0x8f/0xf0
  5)     5832     144   dump_trace+0x1af/0x2d0
  6)     5688      32   save_stack_trace+0x2f/0x50
  7)     5656      64   set_track+0x61/0x1b0
  8)     5592      48   alloc_debug_processing+0x83/0x109
  9)     5544     224   __slab_alloc+0x450/0x4fd
 10)     5320      80   kmem_cache_alloc+0x215/0x240
 11)     5240      16   mempool_alloc_slab+0x15/0x20
 12)     5224     128   mempool_alloc+0x68/0x180
 13)     5096      16   scsi_sg_alloc+0x50/0x60
 14)     5080      96   __sg_alloc_table+0x68/0x150
 15)     4984      48   scsi_alloc_sgtable+0x28/0x60
 16)     4936      32   scsi_init_sgtable+0x21/0x70
 17)     4904      80   scsi_init_io+0x3d/0x150
 18)     4824      32   scsi_setup_fs_cmnd+0x63/0xa0
 19)     4792     128   sd_prep_fn+0x15f/0xdf0 [sd_mod]
 20)     4664      80   blk_peek_request+0x128/0x2d0
 21)     4584      96   scsi_request_fn+0x5b/0x470
 22)     4488      16   __blk_run_queue+0x1e/0x20
 23)     4472      48   cfq_insert_request+0x3ea/0x600
 24)     4424      48   __elv_add_request+0x1d1/0x350
 25)     4376     112   blk_flush_plug_list+0x17b/0x230
 26)     4264      16   schedule+0x3a/0x60
 27)     4248     208   schedule_timeout+0x31c/0x4d0
 28)     4040     128   wait_for_common+0x12a/0x180
 29)     3912      16   wait_for_completion+0x1d/0x20
 30)     3896      48   xfs_buf_iowait+0x2b/0x240 [xfs]
 31)     3848      32   _xfs_buf_read+0x58/0x70 [xfs]
 32)     3816      64   xfs_buf_read+0x114/0x1e0 [xfs]
 33)     3752      80   xfs_trans_read_buf+0x397/0x7d0 [xfs]
 34)     3672      96   xfs_btree_read_buf_block.constprop.23+0x5b/0xc0 [xfs]
 35)     3576      96   xfs_btree_lookup_get_block+0x7c/0xf0 [xfs]
 36)     3480     144   xfs_btree_lookup+0xbf/0x4b0 [xfs]
 37)     3336      16   xfs_alloc_lookup_eq+0x1b/0x20 [xfs]
 38)     3320     112   xfs_alloc_fixup_trees+0x26b/0x360 [xfs]
 39)     3208     192   xfs_alloc_ag_vextent_near+0x6e1/0xe00 [xfs]
 40)     3016      32   xfs_alloc_ag_vextent+0xd5/0x120 [xfs]
 41)     2984     128   xfs_alloc_vextent+0x608/0xa40 [xfs]
 42)     2856     224   xfs_bmap_btalloc+0x287/0x720 [xfs]
 43)     2632      16   xfs_bmap_alloc+0xe/0x10 [xfs]
 44)     2616      80   xfs_bmapi_allocate+0xce/0x2c0 [xfs]
 45)     2536     320   xfs_bmapi_write+0x4aa/0x770 [xfs]
 46)     2216     192   xfs_iomap_write_allocate+0x139/0x360 [xfs]
 47)     2024     112   xfs_map_blocks+0x31a/0x410 [xfs]
 48)     1912     192   xfs_vm_writepage+0x192/0x5f0 [xfs]
 49)     1720      32   __writepage+0x1a/0x50
 50)     1688     352   write_cache_pages+0x270/0x630
 51)     1336      96   generic_writepages+0x4d/0x70
 52)     1240      48   xfs_vm_writepages+0x4d/0x60 [xfs]
 53)     1192      16   do_writepages+0x22/0x50
 54)     1176      96   writeback_single_inode+0x103/0x4d0
 55)     1080     176   writeback_sb_inodes+0x1d8/0x2d0
 56)      904      80   __writeback_inodes_wb+0x9f/0xd0
 57)      824     160   wb_writeback+0x323/0x4f0
 58)      664     192   wb_do_writeback+0x1c7/0x2f0
 59)      472     112   bdi_writeback_thread+0x93/0x480
 60)      360     176   kthread+0xb7/0xc0
 61)      184     184   kernel_thread_helper+0x4/0x10

The kernel messages reporting stack usage:
[98638.040772] xfs_fsr used greatest stack depth: 1344 bytes left
[105079.107764] xfs_fsr used greatest stack depth: 1312 bytes left
[114012.916144] flush-253:4 used greatest stack depth: 1192 bytes left
[120585.050453] do_IRQ: 8.38 No irq handler for vector (irq -1)
[122721.646227] xfs_fsr used greatest stack depth: 640 bytes left
[142696.386288] flush-253:4 used greatest stack depth: 360 bytes left
Comment 2 Eric Sandeen 2012-11-14 12:17:30 EST
so 

[142696.386288] flush-253:4 used greatest stack depth: 360 bytes left

looks worst, with 360 bytes left.  But I think about 300-400 bytes of the use is due to the stack tracing itself, from alloc_debug_processing() on down:

      /* Only entered in the debug case */
      if (kmem_cache_debug(s) && !alloc_debug_processing(s, page, freelist, addr))
              goto new_slab;  /* Slab failed checks. Next slab needed */


FWIW, the commit referenced above is 2011 and is already in the RHEL7 kernels.  And, it looks more like a culprit than a fix, no?

-Eric
Comment 3 Dave Chinner 2012-11-14 18:50:00 EST
(In reply to comment #2)
> FWIW, the commit referenced above is 2011 and is already in the RHEL7
> kernels.  And, it looks more like a culprit than a fix, no?

Well, when I pointed out that commit, I didn't know what kernel was being tested.
IIRC, The commit referenced when into the upstream kernels in 2.6.39, and prevents schedule() from running the queue directly (i.e. does it via kblockd).
I don't think the above code path is actually possible in 3.3...

I also pointed out that the major problem was fixed in 3.4 by the XFS allocation work queues, and then I proceeded to ignore it because I haven't been seeing problems on current upstream kernels....

-Dave.
Comment 4 Boris Ranto 2013-02-27 07:08:40 EST
I've retested with the recent rhel 7 builds (kernel-3.7.0-0.34.el7) with xfs-formatted lvm device. If I turned the stack tracer on, I could actually hit general protection fault (the call trace did not say much though, maybe the stack actually overflowed, longterm is the name of the binary that actually produces the stress on the fs):

[37149.000414] general protection fault: 0000 [#1] SMP 
[37149.005434] Modules linked in: lockd nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sg kvm_amd kvm ibmpex ibmaem ipmi_msghandler ses amd64_edac_mod k10temp edac_mce_amd enclosure edac_core shpchp serio_raw pcspkr i2c_piix4 microcode xfs dm_service_time sr_mod radeon cdrom ixgbe sd_mod ata_generic crc_t10dif pata_acpi i2c_algo_bit drm_kms_helper dca mdio lpfc ttm pata_serverworks scsi_transport_fc ptp e1000e drm aacraid bnx2 libata scsi_tgt pps_core i2c_core dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod
[37149.075988] CPU 8 
[37149.077841] Pid: 3353, comm: longterm Not tainted 3.7.0-0.34.el7.x86_64 #1 IBM IBM System x3755 -[7163AC1]-/System Planar
[37149.088977] RIP: 0010:[<ffffffff810bfd68>]  [<ffffffff810bfd68>] module_put+0x28/0x90
[37149.096830] RSP: 0018:ffff881ff35f7e88  EFLAGS: 00010286
[37149.102129] RAX: 180f00008000c7f7 RBX: 0000000000000100 RCX: 0000000000000009
[37149.109250] RDX: 0000000000000009 RSI: 0000000000000001 RDI: ffffffff81180ef5
[37149.116377] RBP: ffff881ff35f7ea8 R08: ffff8817ffc14a00 R09: 0000000000000000
[37149.123497] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff81180ef5
[37149.130618] R13: ffff881ff1dbcbf0 R14: 0000000000000000 R15: ffff881ff1dbcbf0
[37149.137741] FS:  00007f627d3d3740(0000) GS:ffff8817ffc00000(0000) knlGS:0000000000000000
[37149.145811] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[37149.151553] CR2: 00007f626cb61002 CR3: 00000000018c3000 CR4: 00000000000007e0
[37149.158673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37149.165792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[37149.172914] Process longterm (pid: 3353, threadinfo ffff881ff35f6000, task ffff881ff1dbcbf0)
[37149.181335] Stack:
[37149.183352]  0000000000000100 0000000000000100 0000000000000d19 ffff881ff1dbcbf0
[37149.190834]  ffff881ff35f7f38 ffffffff81065ee9 ffffffff815fbe40 00007f627cd8e800
[37149.198318]  00000001815fbe40 00000000000000e7 000000000000003c ffff881ff1dbd138
[37149.205802] Call Trace:
[37149.208262]  [<ffffffff81065ee9>] do_exit+0x1d9/0x8d0
[37149.213314]  [<ffffffff815fbe40>] ? ftrace_call+0x5/0x2f
[37149.218626]  [<ffffffff8106690f>] do_group_exit+0x3f/0xa0
[37149.224024]  [<ffffffff81066987>] sys_exit_group+0x17/0x20
[37149.229512]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[37149.235515] Code: 1f 40 00 e8 8b c0 53 00 55 48 89 e5 48 83 ec 20 48 85 ff 4c 89 65 f0 48 89 5d e8 49 89 fc 4c 89 6d f8 74 1c 48 8b 87 30 02 00 00 <65> 48 ff 40 08 4c 8b 6d 08 66 66 66 66 90 41 83 3c 24 02 74 3d 
[37149.255931] RIP  [<ffffffff810bfd68>] module_put+0x28/0x90
[37149.261435]  RSP <ffff881ff35f7e88>
[37149.280843] ---[ end trace 83f7658b7b2b4354 ]---
[37149.285483] Fixing recursive fault but reboot is needed!


Afterwards, I tried to collect the information from the stack tracer and I got:
[root@ibm-x3755-01 longterm-stress]# cat /sys/kernel/debug/tracing/stack_trace
        Depth    Size   Location    (66 entries)
        -----    ----   --------
  0)     7456      16   mem_cgroup_bad_page_check+0x21/0x30
  1)     7440      96   prep_new_page+0x83/0x240
  2)     7344     256   get_page_from_freelist+0x37c/0x730
  3)     7088     272   __alloc_pages_nodemask+0x18f/0x980
  4)     6816      80   alloc_pages_current+0xb0/0x120
  5)     6736      64   new_slab+0x265/0x310
  6)     6672     256   __slab_alloc+0x358/0x525
  7)     6416      80   kmem_cache_alloc+0xff/0x130
  8)     6336      48   scsi_pool_alloc_command+0x33/0x80
  9)     6288      48   scsi_host_alloc_command+0x28/0x90
 10)     6240      64   __scsi_get_command+0x29/0xf0
 11)     6176      64   scsi_get_command+0x45/0xc0
 12)     6112      32   scsi_get_cmd_from_req+0x5a/0x70
 13)     6080      48   scsi_setup_blk_pc_cmnd+0x4b/0x120
 14)     6032     112   sd_prep_fn+0xcf/0xbc0 [sd_mod]
 15)     5920      48   blk_peek_request+0x113/0x220
 16)     5872     112   scsi_request_fn+0x4f/0x520
 17)     5760      48   blk_insert_cloned_request+0xd1/0x100
 18)     5712      32   dm_dispatch_request+0x3e/0x70 [dm_mod]
 19)     5680      96   dm_request_fn+0x13d/0x240 [dm_mod]
 20)     5584      64   queue_unplugged+0x58/0xe0
 21)     5520     112   blk_flush_plug_list+0x161/0x210
 22)     5408      32   blk_finish_plug+0x18/0x50
 23)     5376     176   _xfs_buf_ioapply+0x29a/0x320 [xfs]
 24)     5200      48   xfs_buf_iorequest+0x4a/0xa0 [xfs]
 25)     5152      32   xlog_bdstrat+0x23/0x60 [xfs]
 26)     5120      80   xlog_sync+0x226/0x390 [xfs]
 27)     5040      48   xlog_state_release_iclog+0x98/0xd0 [xfs]
 28)     4992     208   xlog_write+0x55d/0x720 [xfs]
 29)     4784     224   xlog_cil_push+0x29e/0x3c0 [xfs]
 30)     4560     128   xlog_cil_force_lsn+0xf7/0x160 [xfs]
 31)     4432     128   _xfs_log_force+0x69/0x2a0 [xfs]
 32)     4304      48   xfs_log_force+0x2a/0x90 [xfs]
 33)     4256      48   xfs_buf_trylock+0xb3/0xc0 [xfs]
 34)     4208      80   _xfs_buf_find+0x12b/0x250 [xfs]
 35)     4128      80   xfs_buf_get_map+0x35/0x180 [xfs]
 36)     4048      80   xfs_trans_get_buf_map+0x101/0x170 [xfs]
 37)     3968      64   xfs_btree_get_buf_block.constprop.24+0x5a/0x90 [xfs]
 38)     3904     208   xfs_btree_split+0x12b/0x6c0 [xfs]
 39)     3696      96   xfs_btree_make_block_unfull+0xde/0x1d0 [xfs]
 40)     3600     208   xfs_btree_insrec+0x396/0x590 [xfs]
 41)     3392     128   xfs_btree_insert+0x6b/0x190 [xfs]
 42)     3264     176   xfs_free_ag_extent+0x384/0x740 [xfs]
 43)     3088     224   xfs_alloc_fix_freelist+0x176/0x4e0 [xfs]
 44)     2864      96   xfs_alloc_vextent+0x1ab/0x600 [xfs]
 45)     2768     224   xfs_bmap_btalloc+0x28e/0x6f0 [xfs]
 46)     2544      16   xfs_bmap_alloc+0xe/0x10 [xfs]
 47)     2528      64   __xfs_bmapi_allocate+0xe8/0x300 [xfs]
 48)     2464      64   xfs_bmapi_allocate+0xa5/0xb0 [xfs]
 49)     2400     384   xfs_bmapi_write+0x4c1/0x780 [xfs]
 50)     2016     224   xfs_da_grow_inode_int+0xbb/0x340 [xfs]
 51)     1792      64   xfs_da_grow_inode+0x4d/0xb0 [xfs]
 52)     1728     128   xfs_dir2_leafn_split+0x40/0x370 [xfs]
 53)     1600     112   xfs_da_split+0x13e/0x370 [xfs]
 54)     1488      48   xfs_dir2_node_addname+0x126/0x130 [xfs]
 55)     1440     224   xfs_dir_createname+0x195/0x1a0 [xfs]
 56)     1216     208   xfs_create+0x4d8/0x5f0 [xfs]
 57)     1008     112   xfs_vn_mknod+0xa6/0x1b0 [xfs]
 58)      896      16   xfs_vn_create+0x13/0x20 [xfs]
 59)      880      64   vfs_create+0xb5/0x120
 60)      816     224   do_last+0x987/0xe20
 61)      592     160   path_openat+0xbc/0x4d0
 62)      432     192   do_filp_open+0x41/0xa0
 63)      240      96   do_sys_open+0xf4/0x1e0
 64)      144      16   sys_open+0x21/0x30
 65)      128     128   system_call_fastpath+0x16/0x1b
Comment 7 Ludek Smid 2014-06-13 08:18:13 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.