Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1647384

Summary: [downstream clone - 4.2.8] During cold reboot treatment, RunVm did not run for some VMs
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-engineAssignee: Ravi Nori <rnori>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.6CC: ahadas, bcholler, bzlotnik, gveitmic, mgoldboi, michal.skrivanek, mkalinin, mperina, ratamir, Rhev-m-bugs, rhodain, rnori, sfroemer, sigbjorn, tnisan
Target Milestone: ovirt-4.2.8Keywords: ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.8.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1641430 Environment:
Last Closed: 2019-01-22 12:44:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1641430    
Bug Blocks:    

Description RHV bug bot 2018-11-07 11:38:09 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1641430 +++
======================================================================

Description of problem:

Several VMs were doing cold reboot treatment short time from each other. But they were never started again.

1. Reboot

2018-10-22 03:40:55,311+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) moved from 'Up' --> 'RebootInProgress'

2. Destroy

2018-10-22 03:40:58,488+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-5) [] START, DestroyVDSCommand(HostName = <removed>, DestroyVmVDSCommandParameters:{hostId='b662414b-d7b7-4a93-8093-5a2c3086e7a1', vmId='af567438-41b6-47a4-bf6a-e8099b08792f', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 36ad9ff9

2018-10-22 03:40:58,496+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-5) [] VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) moved from 'RebootInProgress' --> 'Down'

3. Down:

2018-10-22 03:40:58,570+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-5) [] EVENT_ID: VM_DOWN(61), VM <removed> is down. Exit message: Down as a part of the reboot process

4. Cold Reboot:

2018-10-22 03:40:58,571+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-5) [] add VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) to cold reboot treatment

2018-10-22 03:40:58,579+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-5) [] EVENT_ID: COLD_REBOOT_VM_DOWN(9,611), VM <removed> is down as a part of cold reboot process

2018-10-22 03:40:58,579+02 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (ForkJoinPool-1-worker-5) [] VM is down as a part of cold reboot process. Attempting to restart. VM Name '<removed>', VM Id 'af567438-41b6-47a4-bf6a-e8099b08792f

5. UpdateVm

2018-10-22 03:40:58,619+02 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [62524a71] Running command: ProcessDownVmCommand internal: true.

2018-10-22 03:40:58,868+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [651462f2] Running command: UpdateVmCommand internal: true. Entities affected :  ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group EDIT_VM_PROPERTIES with role type USER

2018-10-22 03:40:58,952+02 INFO  [org.ovirt.engine.core.bll.UpdateGraphicsDeviceCommand] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] Running command: UpdateGraphicsDeviceCommand internal: true. Entities affected :  ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group EDIT_VM_PROPERTIES with role type USER

2018-10-22 03:40:58,972+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] EVENT_ID: SYSTEM_UPDATE_VM(253), VM <removed> configuration was updated by system.

2018-10-22 03:40:58,977+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] Lock freed to object 'EngineLock:{exclusiveLocks='[<removed>=VM_NAME]', sharedLocks='[af567438-41b6-47a4-bf6a-e8099b08792f=VM]'}'

6. Nothing happens, later the user attempts to start it manually, gets stuck:

2018-10-22 05:45:57,537+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-101) [a91813db-5ad2-4d6e-aa65-64b990422981] Running command: RunVmCommand internal: false. Entities affected :  ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group RUN_VM with role type USER


                job_id                | action_type |       description       | status  |            correlation_id            
--------------------------------------+-------------+-------------------------+---------+--------------------------------------
 291f6a6d-6030-4f99-858e-b4781ff75021 | RunVm       | Launching VM <removed> | STARTED | a91813db-5ad2-4d6e-aa65-64b990422981


task-101 is doing nothing:

"default task-101" #467 prio=5 os_prio=0 tid=0x00000000090d9800 nid=0x208c waiting on condition [0x00007f5872d63000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c7755630> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Version-Release number of selected component (if applicable):
rhvm-4.2.6.4-0.1.el7ev.noarch

How reproducible:
Unknown at this point, this is initial investigation

(Originally by Germano Veit Michel)

Comment 15 RHV bug bot 2018-11-07 11:39:30 UTC
Created attachment 1496370 [details]
Logs from 21th from engine

(Originally by Roman Hodain)

Comment 23 Nikolai Sednev 2018-11-19 16:32:52 UTC
Did not reproduced on latest 4.2.8.
ovirt-hosted-engine-setup-2.2.33-0.0.master.20181105145335.git470d6c6.el7.noarch
ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch
ovirt-engine-appliance-4.2-20181118.1.el7.noarch
Moving to verified.
Please reopen if you still getting the issue.

In order to reproduce I've created 6 different VMs using VM pool and tried to reboot them and poweroff them one by one and during this changed VM pool's settings and applied and then started all 6 VMs just fine.
I returned to the same steps several iterations and did not hit the issue.
Moving to verified.

Comment 25 Raz Tamir 2018-11-28 15:30:26 UTC
QE verification bot: the bug was verified upstream

Comment 27 errata-xmlrpc 2019-01-22 12:44:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0121

Comment 28 Daniel Gur 2019-08-28 13:12:31 UTC
sync2jira

Comment 29 Daniel Gur 2019-08-28 13:16:43 UTC
sync2jira