Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1641430

Summary: During cold reboot treatment, RunVm did not run for some VMs
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Ravi Nori <rnori>
Status: CLOSED ERRATA QA Contact: Liran Rotenberg <lrotenbe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.6CC: ahadas, bcholler, bjarolim, bzlotnik, gveitmic, mavital, mgoldboi, michal.skrivanek, mkalinin, mperina, Rhev-m-bugs, rhodain, rnori, sfroemer, sigbjorn, tnisan
Target Milestone: ovirt-4.3.0Keywords: ZStream
Target Release: 4.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.0_alpha Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1647384 (view as bug list) Environment:
Last Closed: 2019-05-08 12:38:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1647384    
Attachments:
Description Flags
Logs from 21th from engine none

Description Germano Veit Michel 2018-10-22 05:16:56 UTC
Description of problem:

Several VMs were doing cold reboot treatment short time from each other. But they were never started again.

1. Reboot

2018-10-22 03:40:55,311+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-14) [] VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) moved from 'Up' --> 'RebootInProgress'

2. Destroy

2018-10-22 03:40:58,488+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-5) [] START, DestroyVDSCommand(HostName = <removed>, DestroyVmVDSCommandParameters:{hostId='b662414b-d7b7-4a93-8093-5a2c3086e7a1', vmId='af567438-41b6-47a4-bf6a-e8099b08792f', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 36ad9ff9

2018-10-22 03:40:58,496+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-5) [] VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) moved from 'RebootInProgress' --> 'Down'

3. Down:

2018-10-22 03:40:58,570+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-5) [] EVENT_ID: VM_DOWN(61), VM <removed> is down. Exit message: Down as a part of the reboot process

4. Cold Reboot:

2018-10-22 03:40:58,571+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-5) [] add VM 'af567438-41b6-47a4-bf6a-e8099b08792f'(<removed>) to cold reboot treatment

2018-10-22 03:40:58,579+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-5) [] EVENT_ID: COLD_REBOOT_VM_DOWN(9,611), VM <removed> is down as a part of cold reboot process

2018-10-22 03:40:58,579+02 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (ForkJoinPool-1-worker-5) [] VM is down as a part of cold reboot process. Attempting to restart. VM Name '<removed>', VM Id 'af567438-41b6-47a4-bf6a-e8099b08792f

5. UpdateVm

2018-10-22 03:40:58,619+02 INFO  [org.ovirt.engine.core.bll.ProcessDownVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [62524a71] Running command: ProcessDownVmCommand internal: true.

2018-10-22 03:40:58,868+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [651462f2] Running command: UpdateVmCommand internal: true. Entities affected :  ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group EDIT_VM_PROPERTIES with role type USER

2018-10-22 03:40:58,952+02 INFO  [org.ovirt.engine.core.bll.UpdateGraphicsDeviceCommand] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] Running command: UpdateGraphicsDeviceCommand internal: true. Entities affected :  ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group EDIT_VM_PROPERTIES with role type USER

2018-10-22 03:40:58,972+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] EVENT_ID: SYSTEM_UPDATE_VM(253), VM <removed> configuration was updated by system.

2018-10-22 03:40:58,977+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-389) [551097ef] Lock freed to object 'EngineLock:{exclusiveLocks='[<removed>=VM_NAME]', sharedLocks='[af567438-41b6-47a4-bf6a-e8099b08792f=VM]'}'

6. Nothing happens, later the user attempts to start it manually, gets stuck:

2018-10-22 05:45:57,537+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-101) [a91813db-5ad2-4d6e-aa65-64b990422981] Running command: RunVmCommand internal: false. Entities affected :  ID: af567438-41b6-47a4-bf6a-e8099b08792f Type: VMAction group RUN_VM with role type USER


                job_id                | action_type |       description       | status  |            correlation_id            
--------------------------------------+-------------+-------------------------+---------+--------------------------------------
 291f6a6d-6030-4f99-858e-b4781ff75021 | RunVm       | Launching VM <removed> | STARTED | a91813db-5ad2-4d6e-aa65-64b990422981


task-101 is doing nothing:

"default task-101" #467 prio=5 os_prio=0 tid=0x00000000090d9800 nid=0x208c waiting on condition [0x00007f5872d63000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c7755630> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Version-Release number of selected component (if applicable):
rhvm-4.2.6.4-0.1.el7ev.noarch

How reproducible:
Unknown at this point, this is initial investigation

Comment 14 Roman Hodain 2018-10-22 11:42:50 UTC
Created attachment 1496370 [details]
Logs from 21th from engine

Comment 23 Liran Rotenberg 2019-01-07 08:59:01 UTC
Verified on:
ovirt-engine-4.3.0-0.5.alpha1.el7.noarch
vdsm-4.30.4-1.el7ev.x86_64

Steps:
1. Run amount of VMs.
2. Change VMs property for next run change(ballooning for example).
3. Reboot the VMs using SSH (cold reboot) in the same time.

# The idea of rebooting in the same time is to create a peak to the engine.

Result:
After doing the above steps in a loop, the issue did not reproduce.

Comment 25 errata-xmlrpc 2019-05-08 12:38:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085