Bug 1480200 - oom & vmcore seen on gluster-block setup
Summary: oom & vmcore seen on gluster-block setup
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tcmu-runner
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Prasanna Kumar Kalever
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-10 11:51 UTC by krishnaram Karthick
Modified: 2017-10-31 15:05 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-31 15:05:50 UTC
Embargoed:


Attachments (Terms of Use)

Description krishnaram Karthick 2017-08-10 11:51:46 UTC
Description of problem:

While investigating https://bugzilla.redhat.com/show_bug.cgi?id=1476285, it was observed that there was a vmcore.

At the moment there is no clarity on what led to this issue. I'll try to see if this issue is reproducible, But it would be great if there is an analysis done by dev to understand what caused this crash.

[66195.286379] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[68164.451859] node invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=985
[68164.451865] node cpuset=docker-f2079fc686a8b230b46ea4d5013152773f8041bd2d077821bdf29b0cf772bc40.scope mems_allowed=0
[68164.451867] CPU: 26 PID: 60417 Comm: node Not tainted 3.10.0-693.el7.x86_64 #1
[68164.451869] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[68164.451870]  ffff880a0b2e0000 0000000054b18afa ffff880bacf47cb8 ffffffff816a3d91
[68164.451873]  ffff880bacf47d48 ffffffff8169f186 ffff8809cde32080 0000000000000001
[68164.451875]  0000000000000000 0000000000000000 ffff880bacf47cf8 0000000000000046
[68164.451877] Call Trace:
[68164.451887]  [<ffffffff816a3d91>] dump_stack+0x19/0x1b
[68164.451889]  [<ffffffff8169f186>] dump_header+0x90/0x229
[68164.451895]  [<ffffffff81185ee6>] ? find_lock_task_mm+0x56/0xc0
[68164.451900]  [<ffffffff811f1578>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[68164.451902]  [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
[68164.451908]  [<ffffffff812b7e0c>] ? selinux_capable+0x1c/0x40
[68164.451909]  [<ffffffff811f5296>] mem_cgroup_oom_synchronize+0x546/0x570
[68164.451911]  [<ffffffff811f4710>] ? mem_cgroup_charge_common+0xc0/0xc0
[68164.451913]  [<ffffffff81186c24>] pagefault_out_of_memory+0x14/0x90
[68164.451917]  [<ffffffff8169d54e>] mm_fault_error+0x68/0x12b
[68164.451921]  [<ffffffff816b01f1>] __do_page_fault+0x391/0x450
[68164.451924]  [<ffffffff816b02e5>] do_page_fault+0x35/0x90
[68164.451926]  [<ffffffff816ac508>] page_fault+0x28/0x30
[68164.451929] Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc8938bd2_7b8d_11e7_8464_005056a56b97.slice/docker-f2079fc686a8b230b46ea4d5013152773f8041bd2d077821bdf29b0cf772bc40.scope kill
ed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc8938bd2_7b8d_11e7_8464_005056a56b97.slice/docker-f2079fc686a8b230b46ea4d5013152773f8041bd2d077821bdf29b0cf772bc40.scope
[68164.451932] memory: usage 753664kB, limit 753664kB, failcnt 127033726
[68164.451933] memory+swap: usage 753664kB, limit 1507328kB, failcnt 0
[68164.451934] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[68164.451935] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc8938bd2_7b8d_11e7_8464_005056a56b97.slice/docker-f2079fc686a8b230b46ea4d5013152773f8041bd2d077821bdf29b0cf77
2bc40.scope: cache:2868KB rss:750796KB rss_huge:28672KB mapped_file:196KB swap:0KB inactive_anon:0KB active_anon:750796KB inactive_file:1404KB active_file:1156KB unevictable:0KB
[68164.451994] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[68164.452305] [60392] 1000080000 60392     2907       59      11        0           985 run.sh
[68164.452308] [60417] 1000080000 60417   487163   187348     808        0           985 node
[68164.452332] [15016] 1000080000 15016     2906       56      11        0           985 readiness.sh
[68164.452335] [15022] 1000080000 15022    37594      229      42        0           985 curl
[68164.452337] Memory cgroup out of memory: Kill process 60570 (node) score 1981 or sacrifice child
=================
[69455.539136] Modules linked in: fuse dm_round_robin iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_pscsi target_core_file target_core_iblock scsi_transport_iscsi dm_multipath target_core_user tar
get_core_mod uio xt_nat xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat ip_tables xt_statistic veth xt_recent vport_vxlan vxlan ip6_udp_tunnel udp_tunnel xt_comment xt_mark openvswi
tch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 nf_nat_ipv4 xt_addrtype nf_nat br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio vmw_vsock_vmci_transport vsock ipt_REJECT nf_rej
ect_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack sb_edac edac_core coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd
[69455.539163]  ppdev vmw_balloon pcspkr joydev sg parport_pc parport vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod cdrom ata_generic pata_acpi vmwgfx sd_mod crc_t10d
if crct10dif_generic drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci drm ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw vmxnet3 vmw_pvscsi i2c_core floppy 
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_conntrack]
[69455.539181] CPU: 27 PID: 45638 Comm: exe Tainted: G    B          ------------   3.10.0-693.el7.x86_64 #1
[69455.539182] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[69455.539183]  ffffea00265ff840 0000000031f8a47a ffff880257fffba0 ffffffff816a3d91
[69455.539185]  ffff880257fffbc8 ffffffff8169f40f 0000000000001000 0000000000004da5
[69455.539186]  ffffea00265ff840 ffff880257fffcd8 ffffffff8118c808 ffffea002662a700
[69455.539188] Call Trace:
[69455.539191]  [<ffffffff816a3d91>] dump_stack+0x19/0x1b
[69455.539192]  [<ffffffff8169f40f>] bad_page.part.75+0xdc/0xf9
[69455.539195]  [<ffffffff8118c808>] get_page_from_freelist+0x868/0x9e0
[69455.539197]  [<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420
[69455.539198]  [<ffffffff8118caf6>] ? __alloc_pages_nodemask+0x176/0x420
[69455.539200]  [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
[69455.539202]  [<ffffffff8106c7d7>] pte_alloc_one+0x17/0x40
[69455.539204]  [<ffffffff811aeaf3>] __pte_alloc+0x23/0x170
[69455.539206]  [<ffffffff811b2697>] handle_mm_fault+0xe17/0xfa0
[69455.539208]  [<ffffffff811b8c6e>] ? do_mmap_pgoff+0x31e/0x3e0
[69455.539210]  [<ffffffff816affb4>] __do_page_fault+0x154/0x450
[69455.539213]  [<ffffffff816b02e5>] do_page_fault+0x35/0x90
[69455.539215]  [<ffffffff816ac508>] page_fault+0x28/0x30
[69455.539216] BUG: Bad page state in process exe  pfn:6c3be4
[69455.539982] page:ffffea001b0ef900 count:-1 mapcount:0 mapping:          (null) index:0x0
[69455.540761] page flags: 0x2fffff00000000()
[69455.541544] page dumped because: nonzero _count


Version-Release number of selected component (if applicable):
rpm -qa | grep 'gluster'
glusterfs-fuse-3.8.4-35.el7rhgs.x86_64
glusterfs-server-3.8.4-35.el7rhgs.x86_64
gluster-block-0.2.1-6.el7rhgs.x86_64
glusterfs-libs-3.8.4-35.el7rhgs.x86_64
glusterfs-3.8.4-35.el7rhgs.x86_64
glusterfs-api-3.8.4-35.el7rhgs.x86_64
glusterfs-cli-3.8.4-35.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-35.el7rhgs.x86_64

How reproducible:
Yet to determine

Steps to Reproduce:
1.
2.
3.

Actual results:
vmcrash seen. This will take down any app pods running on the node

Expected results:
No crashes should be seen

Additional info:
core file shall be attached

Comment 8 Prasanna Kumar Kalever 2017-10-31 15:05:50 UTC
We did not hit this again, hence closing this for now. Please feel free to reopen if you hit this in the future.


Note You need to log in before you can comment on or make changes to this bug.