Bug 985950
Summary: | multipath generates Call Trace reporting out of memory when trying to add 1024 LUNs (2 paths each) on i386 system | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Bruno Goncalves <bgoncalv> | ||||
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Bruno Goncalves <bgoncalv> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.5 | CC: | acathrow, agk, bmarzins, dwysocha, heinzm, msnitzer, prajnoha, prockai, zkabelac | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-06-16 14:37:45 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Do you know if this is a regression? It is not a regression, the same issue is reproducible on RHEL-6.4 2.6.32-358.el6.i686 device-mapper-multipath-0.4.9-64.el6.i686 Still reproducible on 2.6.32-431.el6.i686 Call Trace: [<c085cc75>] ? panic+0x6e/0x122 [<c04ed0da>] ? out_of_memory+0x1ba/0x1e0 [<c04f6d1f>] ? drain_pages+0x2f/0x70 [<c04f773d>] ? __alloc_pages_nodemask+0x7fd/0x810 [<c04f7764>] ? __get_free_pages+0x14/0x30 [<c0459335>] ? copy_process+0xe5/0x10c0 [<c05b447c>] ? security_file_alloc+0xc/0x10 [<c053816d>] ? get_empty_filp+0xfd/0x1a0 [<c045a396>] ? do_fork+0x86/0x3e0 [<c0540179>] ? do_pipe_flags+0xb9/0x120 [<c060ff23>] ? copy_to_user+0x33/0x110 [<c04081e3>] ? sys_clone+0x33/0x40 [<c04099bf>] ? sysenter_do_call+0x12/0x28 Would it be possible for me to get on this system and poke around? Actually, this is beginning to look a lot like bug#708916. It certainly isn't multipathd causing this. It never cracked 25M of memory usage, and the system I'm on has 12 GB. However there's loads of HighMemory left when the OOM killer starts. Here's when the oom-killer starts: DMA: 19*4kB 1*8kB 0*16kB 3*32kB 4*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3508kB Normal: 3734*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14944kB HighMem: 2254*4kB 1487*8kB 1398*16kB 873*32kB 461*64kB 271*128kB 191*256kB 146*512kB 95*1024kB 44*2048kB 2610*4096kB = 11137008kB Here's what the system looks when it's just booted up: DMA: 2*4kB 3*8kB 3*16kB 4*32kB 4*64kB 2*128kB 3*256kB 0*512kB 2*1024kB 2*2048kB 0*4096kB = 7632kB Normal: 469*4kB 206*8kB 67*16kB 15*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 3*2048kB 128*4096kB = 539348kB HighMem: 203*4kB 174*8kB 84*16kB 40*32kB 29*64kB 5*128kB 12*256kB 9*512kB 7*1024kB 5*2048kB 2809*4096kB = 11538076kB So, when the OOM-killer triggers, there's roughtly half of the no-load DMA memory, 96% of the no-load HighMemory, and only 2% of the no-load Normal memory. Looking where most of that memory got used, we see Normal free:14968kB min:3724kB low:4652kB high:5584kB active_anon:0kB inactive_a non:0kB active_file:48kB inactive_file:84kB unevictable:0kB isolated(anon):0kB i solated(file):0kB present:883912kB mlocked:0kB dirty:0kB writeback:0kB mapped:4k B shmem:0kB slab_reclaimable:23996kB slab_unreclaimable:528504kB kernel_stack:40 080kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 a ll_unreclaimable? no So of the total 883912kB of available normal memory, 528504kB is in unreclaimable slabs. So, the reason we are hitting this with 12Gb of RAM, is that this is a i686 system, so the kernel doesn't get to access all that. I've taken multipathd out of the equation. I disabled mutipathing, and started up 1024 iscsi devices. After doing this, the amount of Normal memory on the system was: 214400KB and the top 5 slab users are dentry 159588 159819 132 29 1 : tunables 120 60 8 : slabdata 5511 5511 0 sysfs_dir_cache 329941 330400 76 50 1 : tunables 120 60 8 : slabdata 6608 6608 0 inode_cache 109458 109549 352 11 1 : tunables 54 27 8 : slabdata 9959 9959 0 size-32 1331044 1333287 32 113 1 : tunables 120 60 8 : slabdata 11799 11799 0 size-512 148929 153160 512 8 1 : tunables 54 27 8 : slabdata 19145 19145 0 Then I used dmsetup to create a multipath device on each of these iscsi devices and eventually got the OOM-killer. Shortly before that, the Normal memory was down to: 17292KB and the top 5 slab users were dm_rq_target_io 66592 66592 248 16 1 : tunables 120 60 8 : slabdata 4162 4162 0 sysfs_dir_cache 350600 350600 76 50 1 : tunables 120 60 8 : slabdata 7012 7012 0 size-32 1340961 1340971 32 113 1 : tunables 120 60 8 : slabdata 11867 11867 0 size-512 150009 153136 512 8 1 : tunables 54 27 8 : slabdata 19142 19142 0 biovec-256 66610 66610 3072 2 2 : tunables 24 12 8 : slabdata 33305 33305 0 For reference, here's the biovec-256 data from before I started creating multipath devices: biovec-256 50 50 3072 2 2 : tunables 24 12 8 : slabdata 25 25 0 So, the increase in biovec-256 accounts for 266240KB of memory difference between when there were just iscsi devices, and when the OOM-killer started, while creating multipath devices. *** This bug has been marked as a duplicate of bug 708916 *** |
Created attachment 775344 [details] console.log Description of problem: When trying to login via iSCSI to 1024 LUNs a Call Trace is generated by the system when running on i386 system, it does not occur on x86_64. path-iosched.s invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=-17, oom_score_adj=-1000 mpath-iosched.s cpuset=/ mems_allowed=0 Pid: 6365, comm: mpath-iosched.s Not tainted 2.6.32-358.el6.i686 #1 Call Trace: [<c04e7e64>] ? dump_header+0x84/0x190 [<c04e8208>] ? oom_kill_process+0x68/0x280 [<c04e80e7>] ? oom_badness+0x37/0xf0 [<c04e8788>] ? out_of_memory+0xc8/0x1e0 [<c04f50ed>] ? __alloc_pages_nodemask+0x7fd/0x810 [<c04f5114>] ? __get_free_pages+0x14/0x30 [<c04566b3>] ? copy_process+0xb3/0x10c0 [<c05ac2fc>] ? security_file_alloc+0xc/0x10 [<c053286d>] ? get_empty_filp+0xfd/0x1a0 [<c045773a>] ? do_fork+0x7a/0x3c0 [<c053a429>] ? do_pipe_flags+0xb9/0x120 [<c046cf20>] ? sigprocmask+0x60/0x100 [<c0607433>] ? copy_to_user+0x33/0x110 [<c04081e3>] ? sys_clone+0x33/0x40 [<c04099bf>] ? sysenter_do_call+0x12/0x28 Version-Release number of selected component (if applicable): rpm -q device-mapper-multipath device-mapper-multipath-0.4.9-68.el6.i686 2.6.32-381.el6.i686 How reproducible: 100% Steps to Reproduce: 1. Configure iscsid to support 1024 luns. service iscsid stop modprobe iscsi_tcp max_lun=1024 echo 1024 > /sys/module/scsi_mod/parameters/max_report_luns 2.Connect to NetApp target that has 1024 LUN mapped to it iscsiadm -m discovery -I default -t st -p <target> iscsiadm -m node -l 3. Wait for the devices to be created 4.The Call Trace will appear on console while waiting for the devices to be created. Note: the system has 8GB of memory.