Created attachment 995030 [details] engine log Description of problem: After few hours of runnig automated tests, the engine service suddenly died - we get error code 503. Nothing appears in engine.log, but when looking into /var/log/messages we see below output: Feb 25 07:49:53 jenkins-vm-33 kernel: Out of memory: Kill process 16014 (java) score 245 or sacrifice child Feb 25 07:49:53 jenkins-vm-33 kernel: Killed process 16014, UID 108, (java) total-vm:2908156kB, anon-rss:971900kB, file-rss:880kB Feb 25 07:50:01 jenkins-vm-33 2015-02-25 07:50:00,890 ovirt-engine: ERROR run:532 Error: process terminated with status code -9 Version-Release number of selected component (if applicable): el6.6 vt13.11 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 995031 [details] messages
from dmesg output: Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 61 CPU 1: hi: 186, btch: 31 usd: 59 Node 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 172 CPU 1: hi: 186, btch: 31 usd: 161 active_anon:723263 inactive_anon:184266 isolated_anon:0 active_file:1485 inactive_file:2271 isolated_file:0 unevictable:6116 dirty:78 writeback:6 unstable:0 free:21254 slab_reclaimable:4135 slab_unreclaimable:16500 mapped:3515 shmem:2100 pagetables:7636 bounce:0 Node 0 DMA free:15724kB min:248kB low:308kB high:372kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15320kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 3512 4017 4017 Node 0 DMA32 free:60896kB min:58868kB low:73584kB high:88300kB active_anon:2683416kB inactive_anon:527328kB active_file:4088kB inactive_file:6980kB unevictable:22272kB isolated(anon):0kB isolated(file):0kB present:3596496kB mlocked:5920kB dirty:212kB writeback:24kB mapped:8828kB shmem:6092kB slab_reclaimable:7624kB slab_unreclaimable:40096kB kernel_stack:2232kB pagetables:18812kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:23360 all_unreclaimable? yes lowmem_reserve[]: 0 0 505 505 Node 0 Normal free:8396kB min:8464kB low:10580kB high:12696kB active_anon:209636kB inactive_anon:209736kB active_file:1852kB inactive_file:2104kB unevictable:2192kB isolated(anon):0kB isolated(file):0kB present:517120kB mlocked:2192kB dirty:100kB writeback:0kB mapped:5232kB shmem:2308kB slab_reclaimable:8916kB slab_unreclaimable:25904kB kernel_stack:1216kB pagetables:11732kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:10540 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 3*4kB 2*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15724kB Node 0 DMA32: 1414*4kB 1065*8kB 632*16kB 284*32kB 106*64kB 42*128kB 24*256kB 12*512kB 1*1024kB 1*2048kB 0*4096kB = 60896kB Node 0 Normal: 437*4kB 189*8kB 47*16kB 35*32kB 5*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 8396kB 10897 total pagecache pages 4396 pages in swap cache Swap cache stats: add 1858415, delete 1854019, find 1525435/1649361 Free swap = 0kB Total swap = 2097148kB 1048575 pages RAM 68326 pages reserved 34277 pages shared 950558 pages non-shared [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 399] 0 399 2849 64 0 -17 -1000 udevd [ 945] 0 945 2280 93 1 0 0 dhclient [ 997] 0 997 62272 218 0 0 0 rsyslogd [ 1027] 0 1027 2707 122 0 0 0 irqbalance [ 1048] 32 1048 4744 134 0 0 0 rpcbind [ 1068] 29 1068 5837 164 0 0 0 rpc.statd [ 1199] 81 1199 24343 149 0 0 0 dbus-daemon [ 1244] 0 1244 1020 124 1 0 0 acpid [ 1254] 68 1254 9441 265 0 0 0 hald [ 1255] 0 1255 5099 132 0 0 0 hald-runner [ 1287] 0 1287 5629 119 0 0 0 hald-addon-inpu [ 1298] 68 1298 4501 146 1 0 0 hald-addon-acpi [ 1318] 0 1318 96533 184 0 0 0 automount [ 1341] 0 1341 16554 86 0 -17 -1000 sshd [ 1387] 497 1387 10366 155 0 0 0 nrpe [ 1464] 0 1464 20214 222 0 0 0 master [ 1473] 89 1473 20278 207 0 0 0 qmgr [ 1489] 0 1489 28660 141 0 0 0 abrtd [ 1499] 0 1499 29214 163 0 0 0 crond [ 1512] 0 1512 5276 69 0 0 0 atd [ 1521] 0 1521 27083 109 0 0 0 rhsmcertd [ 1540] 0 1540 45806 8479 0 0 0 puppet [ 1561] 0 1561 1016 122 0 0 0 mingetty [ 1563] 0 1563 1016 122 0 0 0 mingetty [ 1565] 0 1565 1016 122 1 0 0 mingetty [ 1567] 0 1567 1016 122 0 0 0 mingetty [ 1569] 0 1569 1016 122 0 0 0 mingetty [ 1571] 0 1571 1016 122 1 0 0 mingetty [ 1580] 0 1580 3079 98 0 -17 -1000 udevd [ 1916] 0 1916 27267 250 1 0 0 sshd [ 2005] 0 2005 2848 46 0 -17 -1000 udevd [ 2025] 0 2025 26514 152 0 0 0 bash [ 2047] 0 2047 635456 11247 0 0 0 java [ 2075] 0 2075 7237 5699 0 0 -17 iscsiuio [ 2081] 0 2081 1232 114 1 0 0 iscsid [ 2082] 0 2082 1358 845 1 0 -17 iscsid [ 3706] 26 3706 54311 345 0 -17 -1000 postmaster [ 3708] 26 3708 44722 91 0 -17 -1000 postmaster [ 3710] 26 3710 54376 531 0 -17 -1000 postmaster [ 3711] 26 3711 54311 122 1 -17 -1000 postmaster [ 3712] 26 3712 54411 206 0 -17 -1000 postmaster [ 3713] 26 3713 44792 159 0 -17 -1000 postmaster [14382] 0 14382 26300 30 1 0 0 rpc.rquotad [14387] 0 14387 5417 80 1 0 0 rpc.mountd [14429] 0 14429 6291 74 0 0 0 rpc.idmapd [14472] 108 14472 55859 579 0 0 0 ovirt-fence-kdu [14475] 26 14475 55387 649 0 -17 -1000 postmaster [14560] 0 14560 50195 309 0 0 0 httpd [14566] 48 14566 50395 434 0 0 0 httpd [14567] 48 14567 50300 637 0 0 0 httpd [14568] 48 14568 50331 680 1 0 0 httpd [14569] 48 14569 50294 393 0 0 0 httpd [14570] 48 14570 50300 618 1 0 0 httpd [14571] 48 14571 50287 643 1 0 0 httpd [14572] 48 14572 50294 630 1 0 0 httpd [14573] 48 14573 50294 628 1 0 0 httpd [14603] 108 14603 67024 298 0 0 0 ovirt-websocket [16003] 108 16003 47844 82 1 0 0 ovirt-engine.py [16014] 108 16014 724983 266534 1 0 0 java [16107] 26 16107 83475 27670 1 -17 -1000 postmaster [16228] 26 16228 84160 28633 1 -17 -1000 postmaster [16451] 26 16451 83077 27161 1 -17 -1000 postmaster [16890] 26 16890 80665 25667 1 -17 -1000 postmaster [17116] 26 17116 84503 28078 1 -17 -1000 postmaster [18546] 26 18546 81736 27379 1 -17 -1000 postmaster [18547] 26 18547 82215 27142 1 -17 -1000 postmaster [18548] 26 18548 82869 27290 1 -17 -1000 postmaster [18666] 48 18666 50311 659 1 0 0 httpd [27424] 48 27424 50307 644 1 0 0 httpd [ 668] 26 668 83157 28224 1 -17 -1000 postmaster [ 669] 26 669 80970 25352 1 -17 -1000 postmaster [ 670] 26 670 82476 27899 1 -17 -1000 postmaster [21828] 48 21828 50307 633 0 0 0 httpd [21829] 48 21829 50307 407 1 0 0 httpd [21830] 48 21830 50309 412 0 0 0 httpd [21834] 48 21834 50307 417 0 0 0 httpd [22088] 48 22088 50310 406 0 0 0 httpd [22089] 48 22089 50310 411 0 0 0 httpd [22090] 48 22090 50294 394 0 0 0 httpd [23268] 48 23268 50310 641 0 0 0 httpd [23269] 48 23269 50311 672 0 0 0 httpd [23270] 48 23270 50307 647 1 0 0 httpd [ 3491] 0 3491 26526 13 0 0 0 sh [ 3517] 0 3517 90295 107 0 0 0 python [ 3619] 0 3619 678353 302219 0 0 0 python [ 4374] 0 4374 24993 76 0 0 0 sshd [22611] 26 22611 75483 19908 1 -17 -1000 postmaster [ 379] 38 379 7683 45 0 0 0 ntpd [10769] 89 10769 20234 29 0 0 0 pickup [16765] 0 16765 25235 43 1 0 0 tail [17142] 497 17142 10884 411 0 0 0 nrpe [17144] 497 17144 10884 411 0 0 0 nrpe [17146] 497 17146 10884 411 1 0 0 nrpe [17147] 497 17147 10885 143 1 0 0 nrpe [17148] 497 17148 10885 143 0 0 0 nrpe [17149] 497 17149 33299 1277 0 0 0 check_linux_sta [17150] 497 17150 33299 1298 0 0 0 check_linux_sta [17151] 497 17151 10885 143 1 0 0 nrpe [17152] 497 17152 33299 1269 0 0 0 check_linux_sta Out of memory: Kill process 3619 (python) score 307 or sacrifice child Killed process 3619, UID 0, (python) total-vm:2713412kB, anon-rss:1208320kB, file-rss:556kB SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts SELinux: initialized (dev 0:18, type nfs), uses genfs_contexts SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts SELinux: initialized (dev 0:18, type nfs), uses genfs_contexts SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts SELinux: initialized (dev 0:18, type nfs), uses genfs_contexts SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts SELinux: initialized (dev 0:18, type nfs), uses genfs_contexts puppet invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 puppet cpuset=/ mems_allowed=0 Pid: 2312, comm: puppet Not tainted 2.6.32-504.8.1.el6.x86_64 #1 Call Trace: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [<ffffffff81127300>] ? dump_header+0x90/0x1b0 [<ffffffff810d5161>] ? cpuset_mems_allowed_intersects+0x21/0x30 [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0 [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120 [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0 [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0 [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90 [<ffffffff811240de>] ? find_get_page+0x1e/0xa0 [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500 [<ffffffff8114eae4>] ? __do_fault+0x54/0x530 [<ffffffff810b33c6>] ? futex_wait+0x1e6/0x310 [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00 [<ffffffff810a2dd0>] ? hrtimer_wakeup+0x0/0x30 [<ffffffff810a3da4>] ? hrtimer_start_range_ns+0x14/0x20 [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300 [<ffffffff81063bf3>] ? perf_event_task_sched_out+0x33/0x70 [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480 [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff81041e98>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff8152ffde>] ? do_page_fault+0x3e/0xa0 [<ffffffff8152d395>] ? page_fault+0x25/0x30 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Node 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 active_anon:801151 inactive_anon:109094 isolated_anon:0 active_file:34 inactive_file:58 isolated_file:0 unevictable:6116 dirty:0 writeback:0 unstable:0 free:21264 slab_reclaimable:4154 slab_unreclaimable:16711 mapped:2203 shmem:3091 pagetables:8270 bounce:0 Node 0 DMA free:15724kB min:248kB low:308kB high:372kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15320kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 3512 4017 4017 Node 0 DMA32 free:60872kB min:58868kB low:73584kB high:88300kB active_anon:2991752kB inactive_anon:223432kB active_file:116kB inactive_file:208kB unevictable:22272kB isolated(anon):0kB isolated(file):0kB present:3596496kB mlocked:5920kB dirty:0kB writeback:0kB mapped:2972kB shmem:7960kB slab_reclaimable:7764kB slab_unreclaimable:41044kB kernel_stack:2368kB pagetables:22016kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:960 all_unreclaimable? no lowmem_reserve[]: 0 0 505 505 Node 0 Normal free:8460kB min:8464kB low:10580kB high:12696kB active_anon:212852kB inactive_anon:212944kB active_file:20kB inactive_file:0kB unevictable:2192kB isolated(anon):0kB isolated(file):0kB present:517120kB mlocked:2192kB dirty:0kB writeback:0kB mapped:5840kB shmem:4404kB slab_reclaimable:8852kB slab_unreclaimable:25800kB kernel_stack:1216kB pagetables:11064kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:143 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 3*4kB 2*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15724kB Node 0 DMA32: 1290*4kB 996*8kB 632*16kB 246*32kB 106*64kB 44*128kB 24*256kB 12*512kB 3*1024kB 1*2048kB 0*4096kB = 60936kB Node 0 Normal: 261*4kB 179*8kB 70*16kB 22*32kB 7*64kB 3*128kB 3*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 8460kB 13679 total pagecache pages 9835 pages in swap cache Swap cache stats: add 5819923, delete 5810088, find 5522866/6015792 Free swap = 0kB Total swap = 2097148kB 1048575 pages RAM 68326 pages reserved 37837 pages shared 952554 pages non-shared [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 399] 0 399 2849 64 0 -17 -1000 udevd [ 945] 0 945 2280 88 1 0 0 dhclient [ 997] 0 997 62272 221 0 0 0 rsyslogd [ 1027] 0 1027 2707 115 0 0 0 irqbalance [ 1048] 32 1048 4744 134 0 0 0 rpcbind [ 1068] 29 1068 5837 165 0 0 0 rpc.statd [ 1199] 81 1199 24343 148 0 0 0 dbus-daemon [ 1244] 0 1244 1020 124 1 0 0 acpid [ 1254] 68 1254 9441 264 0 0 0 hald [ 1255] 0 1255 5099 131 0 0 0 hald-runner [ 1287] 0 1287 5629 119 0 0 0 hald-addon-inpu [ 1298] 68 1298 4501 146 1 0 0 hald-addon-acpi [ 1318] 0 1318 96533 184 0 0 0 automount [ 1341] 0 1341 16554 113 1 -17 -1000 sshd [ 1387] 497 1387 10366 153 0 0 0 nrpe [ 1464] 0 1464 20214 221 0 0 0 master [ 1473] 89 1473 20278 206 0 0 0 qmgr [ 1489] 0 1489 28660 140 0 0 0 abrtd [ 1499] 0 1499 29214 162 0 0 0 crond [ 1512] 0 1512 5276 68 0 0 0 atd [ 1521] 0 1521 27083 107 0 0 0 rhsmcertd [ 1540] 0 1540 45806 2960 0 0 0 puppet [ 1561] 0 1561 1016 121 0 0 0 mingetty [ 1563] 0 1563 1016 121 0 0 0 mingetty [ 1565] 0 1565 1016 121 1 0 0 mingetty [ 1567] 0 1567 1016 121 0 0 0 mingetty [ 1569] 0 1569 1016 121 0 0 0 mingetty [ 1571] 0 1571 1016 121 1 0 0 mingetty [ 1580] 0 1580 3079 98 1 -17 -1000 udevd [ 1916] 0 1916 27267 247 1 0 0 sshd [ 2005] 0 2005 2848 46 0 -17 -1000 udevd [ 2025] 0 2025 26514 151 0 0 0 bash [ 2047] 0 2047 635456 7158 0 0 0 java [ 2075] 0 2075 7237 5699 0 0 -17 iscsiuio [ 2081] 0 2081 1232 114 0 0 0 iscsid [ 2082] 0 2082 1358 845 0 0 -17 iscsid [ 3706] 26 3706 54311 310 0 -17 -1000 postmaster [ 3708] 26 3708 44722 88 0 -17 -1000 postmaster [ 3710] 26 3710 54376 892 0 -17 -1000 postmaster [ 3711] 26 3711 54311 114 1 -17 -1000 postmaster [ 3712] 26 3712 54409 190 1 -17 -1000 postmaster [ 3713] 26 3713 44792 149 1 -17 -1000 postmaster [14382] 0 14382 26300 30 1 0 0 rpc.rquotad [14387] 0 14387 5417 80 1 0 0 rpc.mountd [14429] 0 14429 6291 73 1 0 0 rpc.idmapd [14472] 108 14472 55859 430 0 0 0 ovirt-fence-kdu [14475] 26 14475 55387 439 0 -17 -1000 postmaster [14560] 0 14560 50195 254 0 0 0 httpd [14566] 48 14566 50395 424 0 0 0 httpd [14567] 48 14567 50300 364 1 0 0 httpd [14568] 48 14568 50331 407 0 0 0 httpd [14569] 48 14569 50294 388 0 0 0 httpd [14570] 48 14570 50286 389 1 0 0 httpd [14571] 48 14571 50287 392 1 0 0 httpd [14572] 48 14572 50294 412 1 0 0 httpd [14573] 48 14573 50294 389 0 0 0 httpd [14603] 108 14603 67024 226 0 0 0 ovirt-websocket [16003] 108 16003 47844 80 1 0 0 ovirt-engine.py [16014] 108 16014 727039 243195 1 0 0 java [16107] 26 16107 85429 29032 0 -17 -1000 postmaster [16228] 26 16228 84776 27820 0 -17 -1000 postmaster [16451] 26 16451 84613 25970 1 -17 -1000 postmaster [16890] 26 16890 84369 27955 1 -17 -1000 postmaster [17116] 26 17116 85549 28879 1 -17 -1000 postmaster [18546] 26 18546 85363 27447 1 -17 -1000 postmaster [18547] 26 18547 83189 27153 1 -17 -1000 postmaster [18548] 26 18548 81105 23263 0 -17 -1000 postmaster [18666] 48 18666 50311 398 0 0 0 httpd [27424] 48 27424 50307 403 0 0 0 httpd [ 668] 26 668 83509 24314 0 -17 -1000 postmaster [ 669] 26 669 81226 25677 0 -17 -1000 postmaster [ 670] 26 670 83970 26284 0 -17 -1000 postmaster [21828] 48 21828 50307 429 0 0 0 httpd [21829] 48 21829 50307 365 0 0 0 httpd [21830] 48 21830 50309 363 1 0 0 httpd [21834] 48 21834 50307 429 0 0 0 httpd [22088] 48 22088 50310 432 0 0 0 httpd [22089] 48 22089 50310 410 0 0 0 httpd [22090] 48 22090 50294 402 0 0 0 httpd [23268] 48 23268 50310 401 0 0 0 httpd [23269] 48 23269 50311 402 0 0 0 httpd [23270] 48 23270 50307 427 1 0 0 httpd [22611] 26 22611 81449 24325 1 -17 -1000 postmaster [16765] 0 16765 25236 12 1 0 0 tail [ 9388] 0 9388 25236 13 1 0 0 tail [15425] 0 15425 25236 13 0 0 0 tail [14722] 26 14722 80939 24366 1 -17 -1000 postmaster [14723] 26 14723 79952 23725 0 -17 -1000 postmaster [14724] 26 14724 80298 20724 1 -17 -1000 postmaster [14725] 26 14725 79149 23090 0 -17 -1000 postmaster [14726] 26 14726 82188 25619 1 -17 -1000 postmaster [20504] 0 20504 26526 1 0 0 0 sh [20531] 0 20531 90296 1 0 0 0 python [20598] 0 20598 469821 174785 1 0 0 python [21440] 0 21440 24993 62 0 0 0 sshd [ 3119] 38 3119 7683 32 0 0 0 ntpd [12810] 26 12810 73247 17433 0 -17 -1000 postmaster [12811] 26 12811 73708 18315 1 -17 -1000 postmaster [21721] 89 21721 20234 17 0 0 0 pickup [23941] 0 23941 25235 22 1 0 0 tail [24301] 0 24301 24994 245 1 0 0 sshd [24311] 0 24311 27084 84 0 0 0 bash [24388] 26 24388 56410 1940 0 -17 -1000 postmaster [24389] 26 24389 57066 2694 1 -17 -1000 postmaster [24390] 26 24390 60104 5576 1 -17 -1000 postmaster [24391] 26 24391 56361 1936 0 -17 -1000 postmaster [24392] 26 24392 57351 3156 1 -17 -1000 postmaster [24393] 26 24393 56962 2642 0 -17 -1000 postmaster Out of memory: Kill process 16014 (java) score 245 or sacrifice child Killed process 16014, UID 108, (java) total-vm:2908156kB, anon-rss:971900kB, file-rss:880kB
What version was it tested with?
Version-Release number of selected component: el6.6 vt13.11
How reproducible is this?
It reproduced when running bunch of tests - around 100 cases
Please share heapdump of the server from when service was killed.
Liarn, it seems the system did not took an heap dump when the service was killed [root@jenkins-vm-33 ~]# ls -l /var/log/ovirt-engine/dump/ total 0
Can you take a snapshot regularly so we'll have it before the crash?
Hi Liran, Are you talking about snapshot for the vm that holds the engine? and if so, why do you need it?
No, I'm talking about taking snapshots of the java process. It can be done using jmap tool. Gil/Eldad are doing it regularly.
IMHO the BZ title should be changed - there is nothing nothing wrong that OOM kills a process to make kernel survive. The BZ title should probably be - ovirt-engine using a lot of memory...
Gil - I can't access the server. Can you provide full URL?
Gil seems like the file is in an incorrect format. Can you retake it?
Hi Gil, Only one of those dumps work (the 140MB one. The other is probably malformed). It reflects memory usage of 600MB. While there is a large object, 70MB in size, that reflects data being sent to the client (parsed to XML) - it still doesn't explain the root cause. How much memory was allocated to the engine? What's the amount of the machine memory?
(In reply to Liran Zelkha from comment #19) > Hi Gil, > > Only one of those dumps work (the 140MB one. The other is probably > malformed). > It reflects memory usage of 600MB. > While there is a large object, 70MB in size, that reflects data being sent > to the client (parsed to XML) - it still doesn't explain the root cause. > How much memory was allocated to the engine? Seems to be the default 1GB ovirt 1594 24.9 23.0 2895816 902660 ? Sl 11:37 2:45 ovirt-engine -server -XX:+TieredCompilation -Xms1g -Xmx1g -XX:PermSize=256m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.awt.headless=true -Djsse.enableSNIExtension=false -Djava.security.krb5.conf=/etc/ovirt-engine/krb5.conf -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/ovirt-engine/dump -Djava.util.logging.manager=org.jboss.logmanager -Dlogging.configuration=file:///var/lib/ovirt-engine/jboss_runtime/config/ovirt-engine-logging.properties -Dorg.jboss.resolver.warning=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djboss.modules.write-indexes=false -Djboss.server.default.config=ovirt-engine -Djboss.home.dir=/usr/share/jbossas -Djboss.server.base.dir=/usr/share/ovirt-engine -Djboss.server.data.dir=/var/lib/ovirt-engine -Djboss.server.log.dir=/var/log/ovirt-engine -Djboss.server.config.dir=/var/lib/ovirt-engine/jboss_runtime/config -Djboss.server.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp -Djboss.controller.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp -jar /usr/share/jbossas/jboss-modules.jar -mp /var/lib/ovirt-engine/jboss_runtime/modules/00-ovirt-engine-modules:/var/lib/ovirt-engine/jboss_runtime/modules/01-ovirt-engine-extension-aaa-ldap-modules:/var/lib/ovirt-engine/jboss_runtime/modules/02-jbossas-modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -c ovirt-engine.xml >What's the amount of the machine memory? # free -m total used free shared buffers cached Mem: 3829 1853 1975 16 24 417 -/+ buffers/cache: 1411 2417 Swap: 2047 0 2047
So probably not enough. I know that on Yuri's environment we're using at least 2GB. Can you increase max heap size?
(In reply to Liran Zelkha from comment #21) > So probably not enough. I know that on Yuri's environment we're using at > least 2GB. Can you increase max heap size? I can do it, but currently all customers are automatically configured with 1gb by default. We didn't see any problems when running this environment with 3.4. And this environment is not doing any scale testing (Like Yuri does). I have a good reason to suspect that this is a regression that was introduced in 3.5. Can you point out what is this large object and why is it growing so large?
It's not a single object. I see VM, snapshots, files, capabilities. But since it's 600MB and your heap is 1GB I'm not sure it's related. Can you take several snapshots (say one every 30 minutes) so I can see the overall memory behavior?
I looked at all the files from March-1st. All take ~220MB of RAM. I can't see any reason why this number would grow. Could it be related to a specific test?
Might be. Raz, did you saw this issue again on your environments ?
AFAIK we haven't seen it (Raz?) but we've increased the max heap size. > ovirt-engine -server -XX:+TieredCompilation -Xms2g -Xmx4g -XX:PermSize=256m -XX:MaxPermSize=256m... So we can either: 1) set the value to the default again and see if it fails 2) keep the max/min heap size we have now, and take snapshots every 30 minutes as suggested before, can that work?
Yes, Carlos is right we havn't saw this again after the heap size increament
Let's do both - decrease memory size, and have a cron that will take a snapshot every 30 minutes. Can that work?