Description of problem: Several VMs were doing cold reboot treatment short time from each other. But they were never started again. 1. Reboot 2018-10-22 03:40:55,311+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) moved from 'Up' --> 'RebootInProgress' 2. Destroy 2018-10-22 03:40:58,488+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-5) [] START, DestroyVDSCommand(HostName = <removed>, DestroyVmVDSCommandParameters:{hostId='b662414b-d7b7-4a93-8093-5a2c3086e7a1', vmId='af567438-41b6-47a4-bf6a-e8099b08792f', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 36ad9ff9 2018-10-22 03:40:58,496+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-5) [] VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) moved from 'RebootInProgress' --> 'Down' 3. Down: 2018-10-22 03:40:58,570+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-5) [] EVENT_ID: VM_DOWN(61), VM <removed> is down. Exit message: Down as a part of the reboot process 4. Cold Reboot: 2018-10-22 03:40:58,571+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-5) [] add VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) to cold reboot treatment 2018-10-22 03:40:58,579+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-5) [] EVENT_ID: COLD_REBOOT_VM_DOWN(9,611), VM <removed> is down as a part of cold reboot process 2018-10-22 03:40:58,579+02 INFO [org.ovirt.engine.core.bll.VdsEventListener] (ForkJoinPool-1-worker-5) [] VM is down as a part of cold reboot process. Attempting to restart. VM Name '<removed>', VM Id 'af567438-41b6-47a4-bf6a-e8099b08792f 5. UpdateVm 2018-10-22 03:40:58,619+02 INFO [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [62524a71] Running command: ProcessDownVmCommand internal: true. 2018-10-22 03:40:58,868+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [651462f2] Running command: UpdateVmCommand internal: true. Entities affected : ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2018-10-22 03:40:58,952+02 INFO [org.ovirt.engine.core.bll.UpdateGraphicsDeviceCommand] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] Running command: UpdateGraphicsDeviceCommand internal: true. Entities affected : ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group EDIT_VM_PROPERTIES with role type USER 2018-10-22 03:40:58,972+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] EVENT_ID: SYSTEM_UPDATE_VM(253), VM <removed> configuration was updated by system. 2018-10-22 03:40:58,977+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] Lock freed to object 'EngineLock:{exclusiveLocks='[<removed>=VM_NAME]', sharedLocks='[af567438-41b6-47a4-bf6a-e8099b08792f=VM]'}' 6. Nothing happens, later the user attempts to start it manually, gets stuck: 2018-10-22 05:45:57,537+02 INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-101) [a91813db-5ad2-4d6e-aa65-64b990422981] Running command: RunVmCommand internal: false. Entities affected : ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group RUN_VM with role type USER job_id | action_type | description | status | correlation_id --------------------------------------+-------------+-------------------------+---------+-------------------------------------- 291f6a6d-6030-4f99-858e-b4781ff75021 | RunVm | Launching VM <removed> | STARTED | a91813db-5ad2-4d6e-aa65-64b990422981 task-101 is doing nothing: "default task-101" #467 prio=5 os_prio=0 tid=0x00000000090d9800 nid=0x208c waiting on condition [0x00007f5872d63000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006c7755630> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Version-Release number of selected component (if applicable): rhvm-4.2.6.4-0.1.el7ev.noarch How reproducible: Unknown at this point, this is initial investigation
Created attachment 1496370 [details] Logs from 21th from engine
Verified on: ovirt-engine-4.3.0-0.5.alpha1.el7.noarch vdsm-4.30.4-1.el7ev.x86_64 Steps: 1. Run amount of VMs. 2. Change VMs property for next run change(ballooning for example). 3. Reboot the VMs using SSH (cold reboot) in the same time. # The idea of rebooting in the same time is to create a peak to the engine. Result: After doing the above steps in a loop, the issue did not reproduce.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1085