Description of problem: Running automation tests (qcow2_v3 storage team test) Engine reach out of memory & fails - looks related to ovsdb-server as it seen to fail last about 17 sec's before the oom-killer occurs. Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00005|timeval|WARN|Unreasonably long 1344ms poll interval (0ms user, 34ms system) Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00006|timeval|WARN|faults: 52 minor, 29 major Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00007|timeval|WARN|disk: 5280 reads, 0 writes Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00008|timeval|WARN|context switches: 35 voluntary, 1 involuntary Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00009|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=0bc93006: Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00010|coverage|INFO|hmap_pathological 0.0/sec 0.000/sec 0.0000/sec total: 1 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00011|coverage|INFO|hmap_expand 0.4/sec 0.400/sec 0.3964/sec total: 67445 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00012|coverage|INFO|lockfile_lock 0.0/sec 0.000/sec 0.0000/sec total: 1 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00013|coverage|INFO|poll_create_node 2.0/sec 2.000/sec 1.9819/sec total: 335146 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00014|coverage|INFO|poll_zero_timeout 0.0/sec 0.000/sec 0.0000/sec total: 2 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00015|coverage|INFO|seq_change 0.0/sec 0.000/sec 0.0000/sec total: 3 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00016|coverage|INFO|pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 3 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00017|coverage|INFO|util_xalloc 18.0/sec 18.000/sec 17.8375/sec total: 3024390 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00018|coverage|INFO|42 events never hit Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00005|timeval|WARN|Unreasonably long 1245ms poll interval (0ms user, 39ms system) Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00006|timeval|WARN|faults: 68 minor, 20 major Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00007|timeval|WARN|disk: 3496 reads, 0 writes Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00008|timeval|WARN|context switches: 28 voluntary, 2 involuntary Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00009|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=e5632541: Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00010|coverage|INFO|hmap_expand 0.4/sec 0.400/sec 0.3958/sec total: 67518 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00011|coverage|INFO|lockfile_lock 0.0/sec 0.000/sec 0.0000/sec total: 1 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00012|coverage|INFO|poll_create_node 2.0/sec 2.000/sec 1.9792/sec total: 335151 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00013|coverage|INFO|poll_zero_timeout 0.0/sec 0.000/sec 0.0000/sec total: 2 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00014|coverage|INFO|seq_change 0.0/sec 0.000/sec 0.0000/sec total: 3 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00015|coverage|INFO|pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 3 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00016|coverage|INFO|util_xalloc 17.6/sec 17.600/sec 17.4167/sec total: 2958603 Sep 12 08:08:31 storage-ge-04 ovsdb-server: ovs|00017|coverage|INFO|43 events never hit Sep 12 08:08:48 storage-ge-04 kernel: java invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 Sep 12 08:08:48 storage-ge-04 kernel: java cpuset=/ mems_allowed=0 Sep 12 08:08:48 storage-ge-04 kernel: CPU: 1 PID: 17554 Comm: java Not tainted 3.10.0-693.2.1.el7.x86_64 #1 Sep 12 08:08:48 storage-ge-04 kernel: Hardware name: Red Hat RHEV Hypervisor, BIOS 1.10.2-3.el7 04/01/2014 Sep 12 08:08:48 storage-ge-04 kernel: ffff88010862bf40 0000000077d7a63f ffff880090e6b960 ffffffff816a3db1 Sep 12 08:08:48 storage-ge-04 kernel: ffff880090e6b9f0 ffffffff8169f1a6 ffff880090e6b9f8 ffffffff812b7e6b Sep 12 08:08:48 storage-ge-04 kernel: 0000000000000001 ffff880090e6b998 048404e300000206 fbfeefff00000000 Sep 12 08:08:48 storage-ge-04 kernel: Call Trace: Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff812b7e6b>] ? cred_has_capability+0x6b/0x120 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0 : Version-Release number of selected component (if applicable): Engine: ovirt-engine-tools-4.2.0-0.0.master.20170907100709.git14accac.el7.centos.noarch VDSM: 4.20.3-19.gite470a71 How reproducible: Happened a few times(3 out of 6 times) so far on 1 stand which is rhel7.4. each time in a different place in the test plan so there is no constant test that cause this issue. Steps to Reproduce: 1.Run automation qcow2_v3 test plan , in this reporoduction , after TestCase18215 finishes successfully . TO clarify in other times this occured it happened in other tests (18338/9) so there is NO clear scenario here. Automation log just before the crash: 08:06:58 2017-09-12 08:06:58,326 INFO ================================================================================ 08:06:58 2017-09-12 08:06:58,326 INFO Test Setup 1: Create data-center dc_TestCase18215_REST_GLUS_1208065820 with cluster cl_TestCase18215_REST_GLUS_1208065820 and host host_mixed_1 08:06:58 2017-09-12 08:06:58,326 INFO Test Step 2: Add data-center dc_TestCase18215_REST_GLUS_1208065820 08:06:58 2017-09-12 08:06:58,660 INFO Test Step 3: Add cluster cl_TestCase18215_REST_GLUS_1208065820 08:06:59 2017-09-12 08:06:59,388 INFO Test Step 4: Move host host_mixed_1 to cluster cl_TestCase18215_REST_GLUS_1208065820 08:06:59 2017-09-12 08:06:59,390 INFO Test Setup 5: Switch host host_mixed_1 to different cluster cl_TestCase18215_REST_GLUS_1208065820 with {'activate': True} 08:06:59 2017-09-12 08:06:59,509 INFO Test Setup 6: Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'} 08:07:10 2017-09-12 08:07:10,122 INFO Test Setup 7: Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_TestCase18215_REST_GLUS_1208065820'} 08:07:11 2017-09-12 08:07:11,753 INFO Test Setup 8: Activate host host_mixed_1 (set status to UP) with {'wait': True} 08:08:22 2017-09-12 08:08:22,517 INFO 003: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18215_REST_GLUSTERFS.test_upgrade_dc 08:08:22 2017-09-12 08:08:22,517 INFO API: REST 08:08:22 2017-09-12 08:08:22,517 INFO STORAGE: GLUSTERFS 08:08:22 2017-09-12 08:08:22,517 INFO Test Setup 9: Upgrading cluster cl_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 08:08:22 2017-09-12 08:08:22,519 INFO Test Step 10: Update cluster cl_TestCase18215_REST_GLUS_1208065820 with {'version': '4.1'} 08:08:23 2017-09-12 08:08:22,976 INFO Test Setup 11: Upgrading data center dc_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 08:08:25 2017-09-12 08:08:25,726 INFO Result: PASSED 08:08:25 .2017-09-12 08:08:26,485 INFO Test Teardown 12: Data center's dc_TestCase18215_REST_GLUS_1208065820 master domain is upgrade_4_0_to_4_1_TestCase18215_REST_GLUSTERFS0 08:08:37 2017-09-12 08:08:37,181 INFO Test Teardown 13: Clean data-center dc_TestCase18215_REST_GLUS_1208065820 and remove it 08:08:49 2017-09-12 08:08:49,156 INFO Test Step 14: Remove data-center dc_TestCase18215_REST_GLUS_1208065820 08:08:49 E2017-09-12 08:08:49,484 INFO ================================================================================ 08:08:49 2017-09-12 08:08:49,484 INFO 004: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18303_REST_ISCSI.test_verify_new_domain_version 08:08:49 2017-09-12 08:08:49,484 INFO API: REST 08:08:49 2017-09-12 08:08:49,484 INFO STORAGE: ISCSI 08:08:49 2017-09-12 08:08:49,628 ERROR Result: FAILED Actual results: Out of memory occured & engine failed - engine did not restart its still down , you can connect to it & check it out at : https://storage-ge-04.scl.lab.tlv.redhat.com/ovirt-engine/ Expected results: Additional info: Automation log before the crash: 08:06:58 2017-09-12 08:06:58,326 INFO ================================================================================ 08:06:58 2017-09-12 08:06:58,326 INFO Test Setup 1: Create data-center dc_TestCase18215_REST_GLUS_1208065820 with cluster cl_TestCase18215_REST_GLUS_1208065820 and host host_mixed_1 08:06:58 2017-09-12 08:06:58,326 INFO Test Step 2: Add data-center dc_TestCase18215_REST_GLUS_1208065820 08:06:58 2017-09-12 08:06:58,660 INFO Test Step 3: Add cluster cl_TestCase18215_REST_GLUS_1208065820 08:06:59 2017-09-12 08:06:59,388 INFO Test Step 4: Move host host_mixed_1 to cluster cl_TestCase18215_REST_GLUS_1208065820 08:06:59 2017-09-12 08:06:59,390 INFO Test Setup 5: Switch host host_mixed_1 to different cluster cl_TestCase18215_REST_GLUS_1208065820 with {'activate': True} 08:06:59 2017-09-12 08:06:59,509 INFO Test Setup 6: Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'} 08:07:10 2017-09-12 08:07:10,122 INFO Test Setup 7: Update properties of host host_mixed_1 (provided in parameters) with {'cluster': 'cl_TestCase18215_REST_GLUS_1208065820'} 08:07:11 2017-09-12 08:07:11,753 INFO Test Setup 8: Activate host host_mixed_1 (set status to UP) with {'wait': True} 08:08:22 2017-09-12 08:08:22,517 INFO 003: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18215_REST_GLUSTERFS.test_upgrade_dc 08:08:22 2017-09-12 08:08:22,517 INFO API: REST 08:08:22 2017-09-12 08:08:22,517 INFO STORAGE: GLUSTERFS 08:08:22 2017-09-12 08:08:22,517 INFO Test Setup 9: Upgrading cluster cl_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 08:08:22 2017-09-12 08:08:22,519 INFO Test Step 10: Update cluster cl_TestCase18215_REST_GLUS_1208065820 with {'version': '4.1'} 08:08:23 2017-09-12 08:08:22,976 INFO Test Setup 11: Upgrading data center dc_TestCase18215_REST_GLUS_1208065820 from version 4.0 to version 4.1 08:08:25 2017-09-12 08:08:25,726 INFO Result: PASSED 08:08:25 .2017-09-12 08:08:26,485 INFO Test Teardown 12: Data center's dc_TestCase18215_REST_GLUS_1208065820 master domain is upgrade_4_0_to_4_1_TestCase18215_REST_GLUSTERFS0 08:08:37 2017-09-12 08:08:37,181 INFO Test Teardown 13: Clean data-center dc_TestCase18215_REST_GLUS_1208065820 and remove it 08:08:49 2017-09-12 08:08:49,156 INFO Test Step 14: Remove data-center dc_TestCase18215_REST_GLUS_1208065820 08:08:49 E2017-09-12 08:08:49,484 INFO ================================================================================ 08:08:49 2017-09-12 08:08:49,484 INFO 004: storage/rhevmtests.storage.storage_qcow2_v3.test_qcow2_v3.TestCase18303_REST_ISCSI.test_verify_new_domain_version 08:08:49 2017-09-12 08:08:49,484 INFO API: REST 08:08:49 2017-09-12 08:08:49,484 INFO STORAGE: ISCSI 08:08:49 2017-09-12 08:08:49,628 ERROR Result: FAILED Last Error seen in engine log: 2017-09-12 08:07:15,434+03 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (EE-ManagedThreadFactory-engineScheduled-Thread-94) [6804fd55] Can not run fence action on host 'host_mixed_1', no suitable proxy host was found. [root@storage-ge-04 ~]# cat /var/log/messages | grep oom-killer -A 500 Sep 12 08:08:48 storage-ge-04 kernel: java invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 Sep 12 08:08:48 storage-ge-04 kernel: java cpuset=/ mems_allowed=0 Sep 12 08:08:48 storage-ge-04 kernel: CPU: 1 PID: 17554 Comm: java Not tainted 3.10.0-693.2.1.el7.x86_64 #1 Sep 12 08:08:48 storage-ge-04 kernel: Hardware name: Red Hat RHEV Hypervisor, BIOS 1.10.2-3.el7 04/01/2014 Sep 12 08:08:48 storage-ge-04 kernel: ffff88010862bf40 0000000077d7a63f ffff880090e6b960 ffffffff816a3db1 Sep 12 08:08:48 storage-ge-04 kernel: ffff880090e6b9f0 ffffffff8169f1a6 ffff880090e6b9f8 ffffffff812b7e6b Sep 12 08:08:48 storage-ge-04 kernel: 0000000000000001 ffff880090e6b998 048404e300000206 fbfeefff00000000 Sep 12 08:08:48 storage-ge-04 kernel: Call Trace: Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff812b7e6b>] ? cred_has_capability+0x6b/0x120 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff812b803c>] ? selinux_capable+0x1c/0x40 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81186bd6>] out_of_memory+0x4b6/0x4f0 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8169fcaa>] __alloc_pages_slowpath+0x5d6/0x724 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811d4135>] alloc_pages_vma+0xb5/0x200 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff81184f55>] ? filemap_fault+0x215/0x410 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811c453d>] read_swap_cache_async+0xed/0x160 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811f53a6>] ? mem_cgroup_update_page_stat+0x16/0x50 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811c4658>] swapin_readahead+0xa8/0x110 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff811b235b>] handle_mm_fault+0xadb/0xfa0 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816afff4>] __do_page_fault+0x154/0x450 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816b03d6>] trace_do_page_fault+0x56/0x150 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816afa6a>] do_async_page_fault+0x1a/0xd0 Sep 12 08:08:48 storage-ge-04 kernel: [<ffffffff816ac578>] async_page_fault+0x28/0x30 Sep 12 08:08:48 storage-ge-04 kernel: Mem-Info: Sep 12 08:08:48 storage-ge-04 kernel: active_anon:648247 inactive_anon:247008 isolated_anon:0#012 active_file:82 inactive_file:353 isolated_file:23#012 unevictable:4008 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:7461 slab_unreclaimable:12399#012 mapped:7321 shmem:12332 pagetables:9625 bounce:0#012 free:21588 free_pcp:0 free_cma:0 Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA free:15336kB min:276kB low:344kB high:412kB active_anon:116kB inactive_anon:120kB active_file:0kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:60kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:54 all_unreclaimable? yes Sep 12 08:08:48 storage-ge-04 kernel: lowmem_reserve[]: 0 2814 3769 3769 Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA32 free:54004kB min:50260kB low:62824kB high:75388kB active_anon:2152864kB inactive_anon:547512kB active_file:260kB inactive_file:708kB unevictable:11960kB isolated(anon):0kB isolated(file):84kB present:3129200kB managed:2884216kB mlocked:11960kB dirty:0kB writeback:0kB mapped:24360kB shmem:35620kB slab_reclaimable:19256kB slab_unreclaimable:33544kB kernel_stack:5728kB pagetables:27456kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1461 all_unreclaimable? yes Sep 12 08:08:48 storage-ge-04 kernel: lowmem_reserve[]: 0 0 954 954 Sep 12 08:08:48 storage-ge-04 kernel: Node 0 Normal free:17012kB min:17044kB low:21304kB high:25564kB active_anon:440008kB inactive_anon:440400kB active_file:68kB inactive_file:668kB unevictable:4072kB isolated(anon):0kB isolated(file):8kB present:1048576kB managed:977588kB mlocked:4072kB dirty:0kB writeback:0kB mapped:4924kB shmem:13708kB slab_reclaimable:10588kB slab_unreclaimable:15992kB kernel_stack:2752kB pagetables:11012kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1046 all_unreclaimable? yes Sep 12 08:08:48 storage-ge-04 kernel: lowmem_reserve[]: 0 0 0 0 Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (M) 2*16kB (U) 2*32kB (U) 2*64kB (UM) 2*128kB (UM) 0*256kB 1*512kB (M) 2*1024kB (UM) 0*2048kB 3*4096kB (EM) = 15340kB Sep 12 08:08:48 storage-ge-04 kernel: Node 0 DMA32: 9*4kB (UE) 419*8kB (UEM) 610*16kB (UEM) 358*32kB (UEM) 178*64kB (UEM) 73*128kB (UEM) 28*256kB (UEM) 3*512kB (EM) 0*1024kB 0*2048kB 0*4096kB = 54044kB Sep 12 08:08:48 storage-ge-04 kernel: Node 0 Normal: 258*4kB (UEM) 232*8kB (UEM) 150*16kB (UEM) 129*32kB (UEM) 52*64kB (UE) 22*128kB (UEM) 4*256kB (EM) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 17096kB Sep 12 08:08:48 storage-ge-04 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Sep 12 08:08:48 storage-ge-04 kernel: 16465 total pagecache pages Sep 12 08:08:48 storage-ge-04 kernel: 629 pages in swap cache Sep 12 08:08:48 storage-ge-04 kernel: Swap cache stats: add 2302549, delete 2301920, find 2325594/2492277 Sep 12 08:08:48 storage-ge-04 kernel: Free swap = 0kB Sep 12 08:08:48 storage-ge-04 kernel: Total swap = 2097148kB Sep 12 08:08:48 storage-ge-04 kernel: 1048442 pages RAM Sep 12 08:08:48 storage-ge-04 kernel: 0 pages HighMem/MovableOnly Sep 12 08:08:48 storage-ge-04 kernel: 79014 pages reserved Sep 12 08:08:48 storage-ge-04 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Sep 12 08:08:48 storage-ge-04 kernel: [ 453] 0 453 9316 234 20 60 0 systemd-journal Sep 12 08:08:48 storage-ge-04 kernel: [ 477] 0 477 32389 222 31 413 0 lvmetad Sep 12 08:08:48 storage-ge-04 kernel: [ 487] 0 487 11663 230 23 480 -1000 systemd-udevd Sep 12 08:08:48 storage-ge-04 kernel: [ 583] 0 583 58236 277 46 162 0 rsyslogd Sep 12 08:08:48 storage-ge-04 kernel: [ 584] 0 584 6051 209 16 33 0 systemd-logind Sep 12 08:08:48 storage-ge-04 kernel: [ 585] 0 585 6075 200 15 138 0 smartd Sep 12 08:08:48 storage-ge-04 kernel: [ 586] 0 586 3073 125 12 1463 0 haveged Sep 12 08:08:48 storage-ge-04 kernel: [ 589] 0 589 6247 167 17 67 0 qemu-ga Sep 12 08:08:48 storage-ge-04 kernel: [ 590] 0 590 10961 17 26 107 0 rpc.idmapd Sep 12 08:08:48 storage-ge-04 kernel: [ 591] 999 591 134246 309 58 1963 0 polkitd Sep 12 08:08:48 storage-ge-04 kernel: [ 595] 0 595 54793 223 62 424 0 abrtd Sep 12 08:08:48 storage-ge-04 kernel: [ 596] 0 596 54211 207 58 341 0 abrt-watch-log Sep 12 08:08:48 storage-ge-04 kernel: [ 600] 81 600 25137 303 20 71 -900 dbus-daemon Sep 12 08:08:48 storage-ge-04 kernel: [ 608] 997 608 28910 226 28 88 0 chronyd Sep 12 08:08:48 storage-ge-04 kernel: [ 609] 0 609 48761 232 38 133 0 gssproxy Sep 12 08:08:48 storage-ge-04 kernel: [ 621] 998 621 2133 131 10 30 0 lsmd Sep 12 08:08:48 storage-ge-04 kernel: [ 622] 0 622 5402 209 15 41 0 irqbalance Sep 12 08:08:48 storage-ge-04 kernel: [ 634] 0 634 84645 749 88 6394 0 firewalld Sep 12 08:08:48 storage-ge-04 kernel: [ 648] 0 648 6464 131 17 47 0 atd Sep 12 08:08:48 storage-ge-04 kernel: [ 650] 0 650 31559 233 21 136 0 crond Sep 12 08:08:48 storage-ge-04 kernel: [ 663] 0 663 27511 152 9 31 0 agetty Sep 12 08:08:48 storage-ge-04 kernel: [ 679] 0 679 135860 668 85 292 0 NetworkManager Sep 12 08:08:48 storage-ge-04 kernel: [ 734] 0 734 13727 327 29 222 0 ovsdb-server Sep 12 08:08:48 storage-ge-04 kernel: [ 813] 0 813 14238 3946 34 0 0 ovs-vswitchd Sep 12 08:08:48 storage-ge-04 kernel: [ 852] 0 852 13659 22 29 175 0 monitor Sep 12 08:08:48 storage-ge-04 kernel: [ 853] 0 853 13711 365 30 188 0 ovsdb-server Sep 12 08:08:48 storage-ge-04 kernel: [ 866] 0 866 13659 22 27 174 0 monitor Sep 12 08:08:48 storage-ge-04 kernel: [ 867] 0 867 13694 378 28 179 0 ovsdb-server Sep 12 08:08:48 storage-ge-04 kernel: [ 878] 0 878 13708 13 27 186 0 monitor Sep 12 08:08:48 storage-ge-04 kernel: [ 879] 0 879 13740 62 28 212 0 ovn-northd Sep 12 08:08:48 storage-ge-04 kernel: [ 908] 0 908 28343 479 58 3064 0 dhclient Sep 12 08:08:48 storage-ge-04 kernel: [ 1123] 0 1123 140604 573 92 3112 0 tuned Sep 12 08:08:48 storage-ge-04 kernel: [ 1132] 29 1132 11115 160 26 213 0 rpc.statd Sep 12 08:08:48 storage-ge-04 kernel: [ 1133] 0 1133 26499 541 56 218 -1000 sshd Sep 12 08:08:48 storage-ge-04 kernel: [ 1134] 0 1134 28911 102 13 30 0 rhsmcertd Sep 12 08:08:48 storage-ge-04 kernel: [ 1158] 32 1158 16239 131 36 132 0 rpcbind Sep 12 08:08:48 storage-ge-04 kernel: [ 1159] 0 1159 26980 100 8 35 0 rhnsd Sep 12 08:08:48 storage-ge-04 kernel: [ 1187] 0 1187 10110 43 23 182 0 rpc.mountd Sep 12 08:08:48 storage-ge-04 kernel: [ 1442] 0 1442 22907 185 43 244 0 master Sep 12 08:08:48 storage-ge-04 kernel: [ 1450] 89 1450 22979 446 44 239 0 qmgr Sep 12 08:08:48 storage-ge-04 kernel: [ 7600] 0 7600 97535 1702 78 3036 0 python Sep 12 08:08:48 storage-ge-04 kernel: [13195] 26 13195 59626 786 56 237 -1000 postgres Sep 12 08:08:48 storage-ge-04 kernel: [13196] 26 13196 48265 121 44 238 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [13198] 26 13198 59694 3694 65 240 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [13199] 26 13199 59666 3450 65 223 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [13200] 26 13200 59626 161 48 247 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [13201] 26 13201 59760 242 50 265 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [13202] 26 13202 48950 226 45 230 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17008] 108 17008 61007 808 73 1708 0 ovirt-fence-kdu Sep 12 08:08:48 storage-ge-04 kernel: [17010] 26 17010 60481 903 68 341 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17027] 108 17027 52694 510 57 2748 0 ovirt-engine.py Sep 12 08:08:48 storage-ge-04 kernel: [17078] 108 17078 51208 499 55 1734 0 ovirt-engine-dw Sep 12 08:08:48 storage-ge-04 kernel: [17086] 108 17086 1123795 252488 1153 230071 0 java Sep 12 08:08:48 storage-ge-04 kernel: [17174] 108 17174 905539 95566 321 20114 0 java Sep 12 08:08:48 storage-ge-04 kernel: [17197] 0 17197 82076 705 81 3199 0 ovirt-imageio-p Sep 12 08:08:48 storage-ge-04 kernel: [17207] 26 17207 60025 1127 65 335 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17208] 26 17208 61080 3753 68 430 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17209] 26 17209 61151 4396 69 349 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17211] 26 17211 60155 3606 66 339 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17212] 26 17212 60105 1487 66 326 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17241] 996 17241 26499 495 54 246 0 sshd Sep 12 08:08:48 storage-ge-04 kernel: [17264] 108 17264 71430 672 87 4323 0 ovirt-websocket Sep 12 08:08:48 storage-ge-04 kernel: [17324] 0 17324 62517 837 128 390 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [17435] 26 17435 107113 27568 159 23466 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17438] 26 17438 110618 27922 166 26193 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17440] 26 17440 113012 33018 170 23850 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17441] 26 17441 62947 6199 72 743 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17446] 26 17446 60066 835 65 402 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17474] 26 17474 116097 40022 176 18848 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17496] 26 17496 113236 39587 171 17601 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17497] 26 17497 114548 38477 174 18611 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17498] 26 17498 109137 36115 163 17018 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17499] 26 17499 110564 34909 166 19364 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17500] 26 17500 113074 39139 171 17887 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [17501] 26 17501 116483 44902 178 14205 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [23742] 26 23742 116889 37325 177 21865 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [21470] 48 21470 63121 764 129 373 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [21588] 26 21588 97739 32013 140 8003 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [22499] 48 22499 63104 745 129 369 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [24651] 26 24651 88466 28322 123 2332 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [25675] 26 25675 84849 27002 116 798 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [27309] 26 27309 84492 26144 115 919 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [28419] 26 28419 83911 25608 113 484 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [28420] 26 28420 84152 24315 114 1801 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [29076] 48 29076 63078 729 129 358 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [29077] 48 29077 63113 742 129 350 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 4300] 89 4300 22933 627 45 0 0 pickup Sep 12 08:08:48 storage-ge-04 kernel: [ 5477] 48 5477 63078 712 129 348 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 5478] 48 5478 63113 725 129 348 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 5479] 48 5479 63113 765 129 348 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 5510] 48 5510 63078 763 129 348 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 5511] 48 5511 63078 712 129 348 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 5512] 48 5512 63078 710 129 348 0 httpd Sep 12 08:08:48 storage-ge-04 kernel: [ 6172] 26 6172 61572 2768 70 143 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6173] 26 6173 61424 2533 70 143 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6174] 26 6174 62253 3363 72 143 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6175] 26 6175 62120 3066 71 143 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6176] 26 6176 62287 2935 72 143 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6177] 26 6177 60757 1363 68 143 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6178] 26 6178 60758 1419 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6179] 26 6179 60757 1472 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6180] 26 6180 60757 1390 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6181] 26 6181 60758 1386 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6182] 26 6182 61067 2023 69 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6183] 26 6183 60758 1378 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6184] 26 6184 64462 5552 76 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6185] 26 6185 60758 1398 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6186] 26 6186 63402 3938 74 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6187] 26 6187 60758 1452 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6188] 26 6188 61636 2817 70 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6189] 26 6189 60758 1464 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: [ 6190] 26 6190 60758 1427 68 142 0 postgres Sep 12 08:08:48 storage-ge-04 kernel: Out of memory: Kill process 17086 (java) score 323 or sacrifice child Sep 12 08:08:48 storage-ge-04 kernel: Killed process 17086 (java) total-vm:4495180kB, anon-rss:1009952kB, file-rss:0kB, shmem-rss:0kB Sep 12 08:08:48 storage-ge-04 dhclient[908]: DHCPREQUEST on eth0 to 10.35.28.1 port 67 (xid=0x7a373217) Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00019|timeval|WARN|Unreasonably long 1031ms poll interval (0ms user, 24ms system) Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00020|timeval|WARN|faults: 71 minor, 12 major Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00021|timeval|WARN|disk: 1712 reads, 0 writes Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00022|timeval|WARN|context switches: 19 voluntary, 2 involuntary Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00023|coverage|INFO|Skipping details of duplicate event coverage for hash=0bc93006 Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00018|timeval|WARN|Unreasonably long 4243ms poll interval (0ms user, 175ms system) Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00019|timeval|WARN|faults: 63 minor, 28 major Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00020|timeval|WARN|disk: 5256 reads, 0 writes Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00021|timeval|WARN|context switches: 41 voluntary, 13 involuntary Sep 12 08:08:48 storage-ge-04 ovsdb-server: ovs|00022|coverage|INFO|Skipping details of duplicate event coverage for hash=e5632541 Sep 12 08:08:49 storage-ge-04 journal: 2017-09-12 08:08:49,887+0300 ovirt-engine: ERROR run:555 Error: process terminated with status code -9 Sep 12 08:08:50 storage-ge-04 systemd: ovirt-engine.service: main process exited, code=exited, status=1/FAILURE Sep 12 08:08:50 storage-ge-04 systemd: Unit ovirt-engine.service entered failed state. Sep 12 08:08:50 storage-ge-04 systemd: ovirt-engine.service failed.
Engine storage-ge-04.scl.lab.tlv.redhat.com is currently down & I'm freezing it for your further debug/ log extraction. Please notify me if you do not need it anymore/ at all.
Created attachment 1324713 [details] engine , vdsm , messages logs
Hi Guys(Rnd), is anyone looking at this issue? The stand is still in freeze for your debug . If you do not need it please notify me.
the host seem to have only 4G, that's not enough to run an engine
According to https://www.ovirt.org/documentation/install-guide/chap-System_Requirements/ 4G is the minimum amount but it should still is supported ,right ?
No, read the caveat about dwh and other processes Moving to integration to consider changes, but from my perspective this is NOTABUG
(In reply to Michal Skrivanek from comment #6) > No, read the caveat about dwh and other processes > Moving to integration to consider changes, but from my perspective this is > NOTABUG I checked and this engines (which is a VM on rhevm-3.qa.lab.tlv.redhat.com) memory & I noticed it was defined with 2G although checking the memory via bash commands 'free' & '/proc/meminfo' yielded 4G memory . 1) Closing the bug as 'NOTABUG' 2) I increase the memory to 4G & restarted the engine/VM . I do not have a dwh install , should I still expect OOM ?
(In reply to Avihai from comment #7) > (In reply to Michal Skrivanek from comment #6) > > No, read the caveat about dwh and other processes > > Moving to integration to consider changes, but from my perspective this is > > NOTABUG > > I checked and this engines (which is a VM on rhevm-3.qa.lab.tlv.redhat.com) > memory & I noticed it was defined with 2G although checking the memory via > bash commands 'free' & '/proc/meminfo' yielded 4G memory . > > 1) Closing the bug as 'NOTABUG' > > 2) I increase the memory to 4G & restarted the engine/VM . > I do not have a dwh install , should I still expect OOM ? You always have DWH installed - though perhap not in full mode. Otherwise, the dashboard won't be with data.