Bug 1490718 - Engine reach out of memory & fails - maybe related to ovsdb-server as its seen last in messages log
Summary: Engine reach out of memory & fails - maybe related to ovsdb-server as its see...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-12 06:26 UTC by Avihai
Modified: 2017-09-13 15:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-13 06:10:18 UTC
oVirt Team: Integration


Attachments (Terms of Use)
engine , vdsm , messages logs (1.14 MB, application/x-gzip)
2017-09-12 06:30 UTC, Avihai
no flags Details

Description Avihai 2017-09-12 06:26:03 UTC
Description of problem:
Running automation tests (qcow2_v3 storage team test)
Engine reach out of memory & fails - looks related to ovsdb-server as it seen to fail last about 17 sec's before the oom-killer occurs.

Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00005|timeval|WARN|Unreasonably long 1344ms poll interval (0ms user, 34ms system)
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00006|timeval|WARN|faults: 52 minor, 29 major
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00007|timeval|WARN|disk: 5280 reads, 0 writes
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00008|timeval|WARN|context switches: 35 voluntary, 1 involuntary
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00009|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=0bc93006:
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00010|coverage|INFO|hmap_pathological          0.0/sec     0.000/sec        0.0000/sec   total: 1
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00011|coverage|INFO|hmap_expand                0.4/sec     0.400/sec        0.3964/sec   total: 67445
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00012|coverage|INFO|lockfile_lock              0.0/sec     0.000/sec        0.0000/sec   total: 1
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00013|coverage|INFO|poll_create_node           2.0/sec     2.000/sec        1.9819/sec   total: 335146
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00014|coverage|INFO|poll_zero_timeout          0.0/sec     0.000/sec        0.0000/sec   total: 2
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00015|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 3
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00016|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 3
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00017|coverage|INFO|util_xalloc               18.0/sec    18.000/sec       17.8375/sec   total: 3024390
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00018|coverage|INFO|42 events never hit
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00005|timeval|WARN|Unreasonably long 1245ms poll interval (0ms user, 39ms system)
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00006|timeval|WARN|faults: 68 minor, 20 major
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00007|timeval|WARN|disk: 3496 reads, 0 writes
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00008|timeval|WARN|context switches: 28 voluntary, 2 involuntary
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00009|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=e5632541:
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00010|coverage|INFO|hmap_expand                0.4/sec     0.400/sec        0.3958/sec   total: 67518
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00011|coverage|INFO|lockfile_lock              0.0/sec     0.000/sec        0.0000/sec   total: 1
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00012|coverage|INFO|poll_create_node           2.0/sec     2.000/sec        1.9792/sec   total: 335151
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00013|coverage|INFO|poll_zero_timeout          0.0/sec     0.000/sec        0.0000/sec   total: 2
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00014|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 3
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00015|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 3
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00016|coverage|INFO|util_xalloc               17.6/sec    17.600/sec       17.4167/sec   total: 2958603
Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00017|coverage|INFO|43 events never hit
Sep 12 08:08:48 storage-ge-04 kernel: java invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
Sep 12 08:08:48 storage-ge-04 kernel: java cpuset=/ mems_allowed=0
Sep 12 08:08:48 storage-ge-04 kernel: CPU: 1 PID: 17554 Comm: java Not tainted 3.10.0-693.2.1.el7.x86_64 #1
Sep 12 08:08:48 storage-ge-04 kernel: Hardware name: Red Hat RHEV Hypervisor, BIOS 1.10.2-3.el7 04/01/2014
Sep 12 08:08:48 storage-ge-04 kernel: ffff88010862bf40 0000000077d7a63f ffff880090e6b960 ffffffff816a3db1
Sep 12 08:08:48 storage-ge-04 kernel: ffff880090e6b9f0 ffffffff8169f1a6 ffff880090e6b9f8 ffffffff812b7e6b
Sep 12 08:08:48 storage-ge-04 kernel: 0000000000000001 ffff880090e6b998 048404e300000206 fbfeefff00000000
Sep 12 08:08:48 storage-ge-04 kernel: Call Trace:
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff812b7e6b>] ? cred_has_capability+0x6b/0x120
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
:


Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-tools-4.2.0-0.0.master.20170907100709.git14accac.el7.centos.noarch

VDSM:
4.20.3-19.gite470a71


How reproducible:
Happened a few times(3 out of 6 times) so far on 1 stand which is rhel7.4.
each time in a different place in the test plan so there is no constant test that cause this issue.

Steps to Reproduce:
1.Run automation qcow2_v3 test plan , in this reporoduction , after TestCase18215 finishes successfully .

TO clarify in other times this occured it happened in other tests (18338/9) so there is NO clear scenario here.

Automation log just before the crash:
08:06:58 2017-09-12 08:06:58,326 INFO ================================================================================
08:06:58 2017-09-12 08:06:58,326 INFO       Test Setup   1: Create data-center dc_TestCase18215_REST_GLUS_1208065820 with cluster cl_TestCase18215_REST_GLUS_1208065820 and host host_mixed_1
08:06:58 2017-09-12 08:06:58,326 INFO       Test Step   2: Add data-center dc_TestCase18215_REST_GLUS_1208065820
08:06:58 2017-09-12 08:06:58,660 INFO       Test Step   3: Add cluster cl_TestCase18215_REST_GLUS_1208065820
08:06:59 2017-09-12 08:06:59,388 INFO       Test Step   4: Move host host_mixed_1 to cluster cl_TestCase18215_REST_GLUS_1208065820
08:06:59 2017-09-12 08:06:59,390 INFO       Test Setup   5: Switch host host_mixed_1 to different cluster cl_TestCase18215_REST_GLUS_1208065820 with {'activate': True}
08:06:59 2017-09-12 08:06:59,509 INFO       Test Setup   6: Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'}
08:07:10 2017-09-12 08:07:10,122 INFO       Test Setup   7: Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_TestCase18215_REST_GLUS_1208065820'}
08:07:11 2017-09-12 08:07:11,753 INFO       Test Setup   8: Activate host host_mixed_1 (set status to UP) with {'wait': True}
08:08:22 2017-09-12 08:08:22,517 INFO 003: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18215_REST_GLUSTERFS.test_upgrade_dc
08:08:22 2017-09-12 08:08:22,517 INFO   API: REST
08:08:22 2017-09-12 08:08:22,517 INFO   STORAGE: GLUSTERFS
08:08:22 2017-09-12 08:08:22,517 INFO       Test Setup   9: Upgrading cluster cl_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 
08:08:22 2017-09-12 08:08:22,519 INFO       Test Step  10: Update cluster cl_TestCase18215_REST_GLUS_1208065820 with {'version': '4.1'}
08:08:23 2017-09-12 08:08:22,976 INFO       Test Setup  11: Upgrading data center dc_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 
08:08:25 2017-09-12 08:08:25,726 INFO Result: PASSED
08:08:25 .2017-09-12 08:08:26,485 INFO       Test Teardown  12: Data center's dc_TestCase18215_REST_GLUS_1208065820 master domain is upgrade_4_0_to_4_1_TestCase18215_REST_GLUSTERFS0
08:08:37 2017-09-12 08:08:37,181 INFO       Test Teardown  13: Clean data-center dc_TestCase18215_REST_GLUS_1208065820 and remove it
08:08:49 2017-09-12 08:08:49,156 INFO       Test Step  14: Remove data-center dc_TestCase18215_REST_GLUS_1208065820
08:08:49 E2017-09-12 08:08:49,484 INFO ================================================================================
08:08:49 2017-09-12 08:08:49,484 INFO 004: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18303_REST_ISCSI.test_verify_new_domain_version
08:08:49 2017-09-12 08:08:49,484 INFO   API: REST
08:08:49 2017-09-12 08:08:49,484 INFO   STORAGE: ISCSI
08:08:49 2017-09-12 08:08:49,628 ERROR Result: FAILED


Actual results:
Out of memory occured & engine failed - engine did not restart its still down , you can connect to it & check it out at :
https://storage-ge-04.scl.lab.tlv.redhat.com/ovirt-engine/ 


Expected results:


Additional info:


Automation log before the crash:
08:06:58 2017-09-12 08:06:58,326 INFO ================================================================================
08:06:58 2017-09-12 08:06:58,326 INFO       Test Setup   1: Create data-center dc_TestCase18215_REST_GLUS_1208065820 with cluster cl_TestCase18215_REST_GLUS_1208065820 and host host_mixed_1
08:06:58 2017-09-12 08:06:58,326 INFO       Test Step   2: Add data-center dc_TestCase18215_REST_GLUS_1208065820
08:06:58 2017-09-12 08:06:58,660 INFO       Test Step   3: Add cluster cl_TestCase18215_REST_GLUS_1208065820
08:06:59 2017-09-12 08:06:59,388 INFO       Test Step   4: Move host host_mixed_1 to cluster cl_TestCase18215_REST_GLUS_1208065820
08:06:59 2017-09-12 08:06:59,390 INFO       Test Setup   5: Switch host host_mixed_1 to different cluster cl_TestCase18215_REST_GLUS_1208065820 with {'activate': True}
08:06:59 2017-09-12 08:06:59,509 INFO       Test Setup   6: Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'}
08:07:10 2017-09-12 08:07:10,122 INFO       Test Setup   7: Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_TestCase18215_REST_GLUS_1208065820'}
08:07:11 2017-09-12 08:07:11,753 INFO       Test Setup   8: Activate host host_mixed_1 (set status to UP) with {'wait': True}
08:08:22 2017-09-12 08:08:22,517 INFO 003: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18215_REST_GLUSTERFS.test_upgrade_dc
08:08:22 2017-09-12 08:08:22,517 INFO   API: REST
08:08:22 2017-09-12 08:08:22,517 INFO   STORAGE: GLUSTERFS
08:08:22 2017-09-12 08:08:22,517 INFO       Test Setup   9: Upgrading cluster cl_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 
08:08:22 2017-09-12 08:08:22,519 INFO       Test Step  10: Update cluster cl_TestCase18215_REST_GLUS_1208065820 with {'version': '4.1'}
08:08:23 2017-09-12 08:08:22,976 INFO       Test Setup  11: Upgrading data center dc_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 
08:08:25 2017-09-12 08:08:25,726 INFO Result: PASSED
08:08:25 .2017-09-12 08:08:26,485 INFO       Test Teardown  12: Data center's dc_TestCase18215_REST_GLUS_1208065820 master domain is upgrade_4_0_to_4_1_TestCase18215_REST_GLUSTERFS0
08:08:37 2017-09-12 08:08:37,181 INFO       Test Teardown  13: Clean data-center dc_TestCase18215_REST_GLUS_1208065820 and remove it
08:08:49 2017-09-12 08:08:49,156 INFO       Test Step  14: Remove data-center dc_TestCase18215_REST_GLUS_1208065820
08:08:49 E2017-09-12 08:08:49,484 INFO ================================================================================
08:08:49 2017-09-12 08:08:49,484 INFO 004: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18303_REST_ISCSI.test_verify_new_domain_version
08:08:49 2017-09-12 08:08:49,484 INFO   API: REST
08:08:49 2017-09-12 08:08:49,484 INFO   STORAGE: ISCSI
08:08:49 2017-09-12 08:08:49,628 ERROR Result: FAILED

Last Error seen in engine log:
2017-09-12 08:07:15,434+03 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [6804fd55] Can not run fence action on host 'host_mixed_1', no suitable proxy
 host was found.


[root@storage-ge-04 ~]# cat /var/log/messages | grep oom-killer -A 500
Sep 12 08:08:48 storage-ge-04 kernel: java invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
Sep 12 08:08:48 storage-ge-04 kernel: java cpuset=/ mems_allowed=0
Sep 12 08:08:48 storage-ge-04 kernel: CPU: 1 PID: 17554 Comm: java Not tainted 3.10.0-693.2.1.el7.x86_64 #1
Sep 12 08:08:48 storage-ge-04 kernel: Hardware name: Red Hat RHEV Hypervisor, BIOS 1.10.2-3.el7 04/01/2014
Sep 12 08:08:48 storage-ge-04 kernel: ffff88010862bf40 0000000077d7a63f ffff880090e6b960 ffffffff816a3db1
Sep 12 08:08:48 storage-ge-04 kernel: ffff880090e6b9f0 ffffffff8169f1a6 ffff880090e6b9f8 ffffffff812b7e6b
Sep 12 08:08:48 storage-ge-04 kernel: 0000000000000001 ffff880090e6b998 048404e300000206 fbfeefff00000000
Sep 12 08:08:48 storage-ge-04 kernel: Call Trace:
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff812b7e6b>] ? cred_has_capability+0x6b/0x120
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff812b803c>] ? selinux_capable+0x1c/0x40
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81186bd6>] out_of_memory+0x4b6/0x4f0
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8169fcaa>] __alloc_pages_slowpath+0x5d6/0x724
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811d4135>] alloc_pages_vma+0xb5/0x200
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81184f55>] ? filemap_fault+0x215/0x410
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811c453d>] read_swap_cache_async+0xed/0x160
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811f53a6>] ? mem_cgroup_update_page_stat+0x16/0x50
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811c4658>] swapin_readahead+0xa8/0x110
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811b235b>] handle_mm_fault+0xadb/0xfa0
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816afff4>] __do_page_fault+0x154/0x450
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816b03d6>] trace_do_page_fault+0x56/0x150
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816afa6a>] do_async_page_fault+0x1a/0xd0
Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816ac578>] async_page_fault+0x28/0x30
Sep 12 08:08:48 storage-ge-04 kernel: Mem-Info:
Sep 12 08:08:48 storage-ge-04 kernel: active_anon:648247 inactive_anon:247008 isolated_anon:0#012 active_file:82 inactive_file:353 isolated_file:23#012 unevictable:4008 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:7461 slab_unreclaimable:12399#012 mapped:7321 shmem:12332 pagetables:9625 bounce:0#012 free:21588 free_pcp:0 free_cma:0
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA free:15336kB min:276kB low:344kB high:412kB active_anon:116kB inactive_anon:120kB active_file:0kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:60kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:54 all_unreclaimable? yes
Sep 12 08:08:48 storage-ge-04 kernel: lowmem_reserve[]: 0 2814 3769 3769
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA32 free:54004kB min:50260kB low:62824kB high:75388kB active_anon:2152864kB inactive_anon:547512kB active_file:260kB inactive_file:708kB unevictable:11960kB isolated(anon):0kB isolated(file):84kB present:3129200kB managed:2884216kB mlocked:11960kB dirty:0kB writeback:0kB mapped:24360kB shmem:35620kB slab_reclaimable:19256kB slab_unreclaimable:33544kB kernel_stack:5728kB pagetables:27456kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1461 all_unreclaimable? yes
Sep 12 08:08:48 storage-ge-04 kernel: lowmem_reserve[]: 0 0 954 954
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 Normal free:17012kB min:17044kB low:21304kB high:25564kB active_anon:440008kB inactive_anon:440400kB active_file:68kB inactive_file:668kB unevictable:4072kB isolated(anon):0kB isolated(file):8kB present:1048576kB managed:977588kB mlocked:4072kB dirty:0kB writeback:0kB mapped:4924kB shmem:13708kB slab_reclaimable:10588kB slab_unreclaimable:15992kB kernel_stack:2752kB pagetables:11012kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1046 all_unreclaimable? yes
Sep 12 08:08:48 storage-ge-04 kernel: lowmem_reserve[]: 0 0 0 0
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (M) 2*16kB (U) 2*32kB (U) 2*64kB (UM) 2*128kB (UM) 0*256kB 1*512kB (M) 2*1024kB (UM) 0*2048kB 3*4096kB (EM) = 15340kB
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA32: 9*4kB (UE) 419*8kB (UEM) 610*16kB (UEM) 358*32kB (UEM) 178*64kB (UEM) 73*128kB (UEM) 28*256kB (UEM) 3*512kB (EM) 0*1024kB 0*2048kB 0*4096kB = 54044kB
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 Normal: 258*4kB (UEM) 232*8kB (UEM) 150*16kB (UEM) 129*32kB (UEM) 52*64kB (UE) 22*128kB (UEM) 4*256kB (EM) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 17096kB
Sep 12 08:08:48 storage-ge-04 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 12 08:08:48 storage-ge-04 kernel: 16465 total pagecache pages
Sep 12 08:08:48 storage-ge-04 kernel: 629 pages in swap cache
Sep 12 08:08:48 storage-ge-04 kernel: Swap cache stats: add 2302549, delete 2301920, find 2325594/2492277
Sep 12 08:08:48 storage-ge-04 kernel: Free swap  = 0kB
Sep 12 08:08:48 storage-ge-04 kernel: Total swap = 2097148kB
Sep 12 08:08:48 storage-ge-04 kernel: 1048442 pages RAM
Sep 12 08:08:48 storage-ge-04 kernel: 0 pages HighMem/MovableOnly
Sep 12 08:08:48 storage-ge-04 kernel: 79014 pages reserved
Sep 12 08:08:48 storage-ge-04 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 12 08:08:48 storage-ge-04 kernel: [  453]     0   453     9316      234      20       60             0 systemd-journal
Sep 12 08:08:48 storage-ge-04 kernel: [  477]     0   477    32389      222      31      413             0 lvmetad
Sep 12 08:08:48 storage-ge-04 kernel: [  487]     0   487    11663      230      23      480         -1000 systemd-udevd
Sep 12 08:08:48 storage-ge-04 kernel: [  583]     0   583    58236      277      46      162             0 rsyslogd
Sep 12 08:08:48 storage-ge-04 kernel: [  584]     0   584     6051      209      16       33             0 systemd-logind
Sep 12 08:08:48 storage-ge-04 kernel: [  585]     0   585     6075      200      15      138             0 smartd
Sep 12 08:08:48 storage-ge-04 kernel: [  586]     0   586     3073      125      12     1463             0 haveged
Sep 12 08:08:48 storage-ge-04 kernel: [  589]     0   589     6247      167      17       67             0 qemu-ga
Sep 12 08:08:48 storage-ge-04 kernel: [  590]     0   590    10961       17      26      107             0 rpc.idmapd
Sep 12 08:08:48 storage-ge-04 kernel: [  591]   999   591   134246      309      58     1963             0 polkitd
Sep 12 08:08:48 storage-ge-04 kernel: [  595]     0   595    54793      223      62      424             0 abrtd
Sep 12 08:08:48 storage-ge-04 kernel: [  596]     0   596    54211      207      58      341             0 abrt-watch-log
Sep 12 08:08:48 storage-ge-04 kernel: [  600]    81   600    25137      303      20       71          -900 dbus-daemon
Sep 12 08:08:48 storage-ge-04 kernel: [  608]   997   608    28910      226      28       88             0 chronyd
Sep 12 08:08:48 storage-ge-04 kernel: [  609]     0   609    48761      232      38      133             0 gssproxy
Sep 12 08:08:48 storage-ge-04 kernel: [  621]   998   621     2133      131      10       30             0 lsmd
Sep 12 08:08:48 storage-ge-04 kernel: [  622]     0   622     5402      209      15       41             0 irqbalance
Sep 12 08:08:48 storage-ge-04 kernel: [  634]     0   634    84645      749      88     6394             0 firewalld
Sep 12 08:08:48 storage-ge-04 kernel: [  648]     0   648     6464      131      17       47             0 atd
Sep 12 08:08:48 storage-ge-04 kernel: [  650]     0   650    31559      233      21      136             0 crond
Sep 12 08:08:48 storage-ge-04 kernel: [  663]     0   663    27511      152       9       31             0 agetty
Sep 12 08:08:48 storage-ge-04 kernel: [  679]     0   679   135860      668      85      292             0 NetworkManager
Sep 12 08:08:48 storage-ge-04 kernel: [  734]     0   734    13727      327      29      222             0 ovsdb-server
Sep 12 08:08:48 storage-ge-04 kernel: [  813]     0   813    14238     3946      34        0             0 ovs-vswitchd
Sep 12 08:08:48 storage-ge-04 kernel: [  852]     0   852    13659       22      29      175             0 monitor
Sep 12 08:08:48 storage-ge-04 kernel: [  853]     0   853    13711      365      30      188             0 ovsdb-server
Sep 12 08:08:48 storage-ge-04 kernel: [  866]     0   866    13659       22      27      174             0 monitor
Sep 12 08:08:48 storage-ge-04 kernel: [  867]     0   867    13694      378      28      179             0 ovsdb-server
Sep 12 08:08:48 storage-ge-04 kernel: [  878]     0   878    13708       13      27      186             0 monitor
Sep 12 08:08:48 storage-ge-04 kernel: [  879]     0   879    13740       62      28      212             0 ovn-northd
Sep 12 08:08:48 storage-ge-04 kernel: [  908]     0   908    28343      479      58     3064             0 dhclient
Sep 12 08:08:48 storage-ge-04 kernel: [ 1123]     0  1123   140604      573      92     3112             0 tuned
Sep 12 08:08:48 storage-ge-04 kernel: [ 1132]    29  1132    11115      160      26      213             0 rpc.statd
Sep 12 08:08:48 storage-ge-04 kernel: [ 1133]     0  1133    26499      541      56      218         -1000 sshd
Sep 12 08:08:48 storage-ge-04 kernel: [ 1134]     0  1134    28911      102      13       30             0 rhsmcertd
Sep 12 08:08:48 storage-ge-04 kernel: [ 1158]    32  1158    16239      131      36      132             0 rpcbind
Sep 12 08:08:48 storage-ge-04 kernel: [ 1159]     0  1159    26980      100       8       35             0 rhnsd
Sep 12 08:08:48 storage-ge-04 kernel: [ 1187]     0  1187    10110       43      23      182             0 rpc.mountd
Sep 12 08:08:48 storage-ge-04 kernel: [ 1442]     0  1442    22907      185      43      244             0 master
Sep 12 08:08:48 storage-ge-04 kernel: [ 1450]    89  1450    22979      446      44      239             0 qmgr
Sep 12 08:08:48 storage-ge-04 kernel: [ 7600]     0  7600    97535     1702      78     3036             0 python
Sep 12 08:08:48 storage-ge-04 kernel: [13195]    26 13195    59626      786      56      237         -1000 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [13196]    26 13196    48265      121      44      238             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [13198]    26 13198    59694     3694      65      240             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [13199]    26 13199    59666     3450      65      223             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [13200]    26 13200    59626      161      48      247             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [13201]    26 13201    59760      242      50      265             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [13202]    26 13202    48950      226      45      230             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17008]   108 17008    61007      808      73     1708             0 ovirt-fence-kdu
Sep 12 08:08:48 storage-ge-04 kernel: [17010]    26 17010    60481      903      68      341             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17027]   108 17027    52694      510      57     2748             0 ovirt-engine.py
Sep 12 08:08:48 storage-ge-04 kernel: [17078]   108 17078    51208      499      55     1734             0 ovirt-engine-dw
Sep 12 08:08:48 storage-ge-04 kernel: [17086]   108 17086  1123795   252488    1153   230071             0 java
Sep 12 08:08:48 storage-ge-04 kernel: [17174]   108 17174   905539    95566     321    20114             0 java
Sep 12 08:08:48 storage-ge-04 kernel: [17197]     0 17197    82076      705      81     3199             0 ovirt-imageio-p
Sep 12 08:08:48 storage-ge-04 kernel: [17207]    26 17207    60025     1127      65      335             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17208]    26 17208    61080     3753      68      430             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17209]    26 17209    61151     4396      69      349             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17211]    26 17211    60155     3606      66      339             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17212]    26 17212    60105     1487      66      326             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17241]   996 17241    26499      495      54      246             0 sshd
Sep 12 08:08:48 storage-ge-04 kernel: [17264]   108 17264    71430      672      87     4323             0 ovirt-websocket
Sep 12 08:08:48 storage-ge-04 kernel: [17324]     0 17324    62517      837     128      390             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [17435]    26 17435   107113    27568     159    23466             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17438]    26 17438   110618    27922     166    26193             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17440]    26 17440   113012    33018     170    23850             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17441]    26 17441    62947     6199      72      743             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17446]    26 17446    60066      835      65      402             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17474]    26 17474   116097    40022     176    18848             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17496]    26 17496   113236    39587     171    17601             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17497]    26 17497   114548    38477     174    18611             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17498]    26 17498   109137    36115     163    17018             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17499]    26 17499   110564    34909     166    19364             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17500]    26 17500   113074    39139     171    17887             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [17501]    26 17501   116483    44902     178    14205             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [23742]    26 23742   116889    37325     177    21865             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [21470]    48 21470    63121      764     129      373             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [21588]    26 21588    97739    32013     140     8003             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [22499]    48 22499    63104      745     129      369             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [24651]    26 24651    88466    28322     123     2332             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [25675]    26 25675    84849    27002     116      798             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [27309]    26 27309    84492    26144     115      919             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [28419]    26 28419    83911    25608     113      484             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [28420]    26 28420    84152    24315     114     1801             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [29076]    48 29076    63078      729     129      358             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [29077]    48 29077    63113      742     129      350             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 4300]    89  4300    22933      627      45        0             0 pickup
Sep 12 08:08:48 storage-ge-04 kernel: [ 5477]    48  5477    63078      712     129      348             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 5478]    48  5478    63113      725     129      348             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 5479]    48  5479    63113      765     129      348             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 5510]    48  5510    63078      763     129      348             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 5511]    48  5511    63078      712     129      348             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 5512]    48  5512    63078      710     129      348             0 httpd
Sep 12 08:08:48 storage-ge-04 kernel: [ 6172]    26  6172    61572     2768      70      143             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6173]    26  6173    61424     2533      70      143             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6174]    26  6174    62253     3363      72      143             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6175]    26  6175    62120     3066      71      143             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6176]    26  6176    62287     2935      72      143             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6177]    26  6177    60757     1363      68      143             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6178]    26  6178    60758     1419      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6179]    26  6179    60757     1472      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6180]    26  6180    60757     1390      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6181]    26  6181    60758     1386      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6182]    26  6182    61067     2023      69      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6183]    26  6183    60758     1378      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6184]    26  6184    64462     5552      76      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6185]    26  6185    60758     1398      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6186]    26  6186    63402     3938      74      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6187]    26  6187    60758     1452      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6188]    26  6188    61636     2817      70      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6189]    26  6189    60758     1464      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: [ 6190]    26  6190    60758     1427      68      142             0 postgres
Sep 12 08:08:48 storage-ge-04 kernel: Out of memory: Kill process 17086 (java) score 323 or sacrifice child
Sep 12 08:08:48 storage-ge-04 kernel: Killed process 17086 (java) total-vm:4495180kB, anon-rss:1009952kB, file-rss:0kB, shmem-rss:0kB
Sep 12 08:08:48 storage-ge-04 dhclient[908]: DHCPREQUEST on eth0 to 10.35.28.1 port 67 (xid=0x7a373217)
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00019|timeval|WARN|Unreasonably long 1031ms poll interval (0ms user, 24ms system)
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00020|timeval|WARN|faults: 71 minor, 12 major
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00021|timeval|WARN|disk: 1712 reads, 0 writes
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00022|timeval|WARN|context switches: 19 voluntary, 2 involuntary
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00023|coverage|INFO|Skipping details of duplicate event coverage for hash=0bc93006
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00018|timeval|WARN|Unreasonably long 4243ms poll interval (0ms user, 175ms system)
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00019|timeval|WARN|faults: 63 minor, 28 major
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00020|timeval|WARN|disk: 5256 reads, 0 writes
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00021|timeval|WARN|context switches: 41 voluntary, 13 involuntary
Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00022|coverage|INFO|Skipping details of duplicate event coverage for hash=e5632541
Sep 12 08:08:49 storage-ge-04 journal: 2017-09-12 08:08:49,887+0300 ovirt-engine: ERROR run:555 Error: process terminated with status code -9
Sep 12 08:08:50 storage-ge-04 systemd: ovirt-engine.service: main process exited, code=exited, status=1/FAILURE
Sep 12 08:08:50 storage-ge-04 systemd: Unit ovirt-engine.service entered failed state.
Sep 12 08:08:50 storage-ge-04 systemd: ovirt-engine.service failed.

Comment 1 Avihai 2017-09-12 06:28:51 UTC
Engine storage-ge-04.scl.lab.tlv.redhat.com is currently down & I'm freezing it for your further debug/ log extraction.

Please notify me if you do not need it anymore/ at all.

Comment 2 Avihai 2017-09-12 06:30:09 UTC
Created attachment 1324713 [details]
engine , vdsm  , messages logs

Comment 3 Avihai 2017-09-12 12:53:23 UTC
Hi Guys(Rnd), is anyone looking at this issue?

The stand is still in freeze for your debug .

If you do not need it please notify me.

Comment 4 Michal Skrivanek 2017-09-12 13:54:02 UTC
the host seem to have only 4G, that's not enough to run an engine

Comment 5 Avihai 2017-09-12 14:39:21 UTC
According to https://www.ovirt.org/documentation/install-guide/chap-System_Requirements/ 4G  is the minimum amount but it should still is supported ,right ?

Comment 6 Michal Skrivanek 2017-09-13 05:59:22 UTC
No, read the caveat about dwh and other processes
Moving to integration to consider changes, but from my perspective this is NOTABUG

Comment 7 Avihai 2017-09-13 06:10:18 UTC
(In reply to Michal Skrivanek from comment #6)
> No, read the caveat about dwh and other processes
> Moving to integration to consider changes, but from my perspective this is
> NOTABUG

I checked and this engines (which is a VM on rhevm-3.qa.lab.tlv.redhat.com) memory & I noticed it was defined with 2G although checking the memory via bash commands 'free' & '/proc/meminfo' yielded 4G memory .

1) Closing the bug as 'NOTABUG'

2) I increase the memory to 4G & restarted the engine/VM .
I do not have a dwh install , should I still expect OOM ?

Comment 8 Yaniv Kaul 2017-09-13 15:08:06 UTC
(In reply to Avihai from comment #7)
> (In reply to Michal Skrivanek from comment #6)
> > No, read the caveat about dwh and other processes
> > Moving to integration to consider changes, but from my perspective this is
> > NOTABUG
> 
> I checked and this engines (which is a VM on rhevm-3.qa.lab.tlv.redhat.com)
> memory & I noticed it was defined with 2G although checking the memory via
> bash commands 'free' & '/proc/meminfo' yielded 4G memory .
> 
> 1) Closing the bug as 'NOTABUG'
> 
> 2) I increase the memory to 4G & restarted the engine/VM .
> I do not have a dwh install , should I still expect OOM ?

You always have DWH installed - though perhap not in full mode. Otherwise, the dashboard won't be with data.


Note You need to log in before you can comment on or make changes to this bug.