Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1139261

Summary: Add disk takes 3 hours when stoping and starting ovirt-engine
Product: [Retired] oVirt Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: ovirt-engine-coreAssignee: Ravi Nori <rnori>
Status: CLOSED WORKSFORME QA Contact: Petr Kubica <pkubica>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5CC: acanan, amureini, bazulay, bugs, ecohen, gklein, iheim, kgoldbla, mgoldboi, oourfali, rbalakri, rnori, yeylon
Target Milestone: ---   
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-06 13:06:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine vdsm and server logs none

Description Kevin Alon Goldblatt 2014-09-08 13:37:17 UTC
Created attachment 935346 [details]
engine vdsm and server logs

Description of problem:
Stoping a and then starting the ovirt-engine during and add disk to VM operation takes 3 hours to complete

Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch
vdsm-4.16.2-1.gite8cba75.el6.x86_64

How reproducible:
Not always

Steps to Reproduce:
1. Create VM with 1 preallocated ISCI disk of 1gb on a block domain
2. Once the createVolume Task is initiated by vdsm I stop the enigine, wait 60 seconds then start the engine again.
3.Once the engine comes up I stop the engine again, wait 80 seconds and start the engine again. The asynchroneous Task in the vdsm is continuously reported as 'Running'. This continues for 3 hours and then the task is finally completed 

Performing a similar create VM and add disk without restarting the engine on the same storage domain takes about 40 seconds

Actual results:
The add disk operation takes 3 hours to return as completed

Expected results:
The operation should complete within a minute or 2 at most

Additional info:
FROM THE ENGINE LOG >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>.

[root@kevin-rhevm ~]# less  /var/log/ovirt-engine/engine.log-20140908.gz |grep 5519a22c
2014-09-07 18:02:52,194 INFO  [org.ovirt.engine.core.bll.AddDiskCommand] (ajp--127.0.0.1-8702-3) [5519a22c] Lock Acquired to object EngineLock [exclusiveLocks= key: ebf1ede1-12d6-4940-91d5-474a54d14100 value: VM_DISK_BOOT
2014-09-07 18:02:52,902 INFO  [org.ovirt.engine.core.bll.AddDiskCommand] (ajp--127.0.0.1-8702-3) [5519a22c] Running command: AddDiskCommand internal: false. Entities affected :  ID: ebf1ede1-12d6-4940-91d5-474a54d14100 Type: VMAction group CONFIGURE_VM_STORAGE with role type USER,  ID: b240d926-3617-4790-8d77-bca4e52776f4 Type: StorageAction group CREATE_DISK with role type USER
2014-09-07 18:02:53,665 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-3) [2d4d736] Correlation ID: 5519a22c, Job ID: a24a440f-3f44-4c14-978d-29dcddc3873a, Call Stack: null, Custom Event ID: -1, Message: Add-Disk operation of vm6_Disk1 was initiated on VM vm6 by admin.
2014-09-07 18:05:05,116 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-8-thread-21) [5519a22c] Adding task 1c880df0-46f9-4077-bf1a-50837fb1bd5f (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2014-09-07 18:05:05,920 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-8-thread-21) [5519a22c] BaseAsyncTask::startPollingTask: Starting to poll task 1c880df0-46f9-4077-bf1a-50837fb1bd5f.
2014-09-07 18:05:05,931 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-8-thread-21) [5519a22c] Discovered 1 tasks on Storage Pool dc_1, 1 added to manager.
2014-09-07 18:07:11,350 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-8-thread-21) [5519a22c] Adding task 1c880df0-46f9-4077-bf1a-50837fb1bd5f (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2014-09-07 18:07:11,890 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-8-thread-21) [5519a22c] BaseAsyncTask::startPollingTask: Starting to poll task 1c880df0-46f9-4077-bf1a-50837fb1bd5f.
2014-09-07 18:07:11,891 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-8-thread-21) [5519a22c] Discovered 1 tasks on Storage Pool dc_1, 1 added to manager.
2014-09-07 20:59:15,728 INFO  [org.ovirt.engine.core.bll.AddDiskCommand] (org.ovirt.thread.pool-8-thread-4) [5519a22c] Ending command successfully: org.ovirt.engine.core.bll.AddDiskCommand
2014-09-07 20:59:19,722 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-4) Correlation ID: 5519a22c, Call Stack: null, Custom Event ID: -1, Message: The disk vm6_Disk1 was successfully added to VM vm6.




FROM THE VDSM LOG >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
c880df0-46f9-4077-bf1a-50837fb1bd5f::DEBUG::2014-09-07 18:02:57,428::task::905::Storage.TaskManager.Task::(_runJobs) Task=`1c880df0-46f9-4077-bf1a-50837fb1bd5f`::Task.run: running job 0: createVolume: <bound method StoragePool.createVolume of <storage.sp.StoragePool object at 0x7f54f80e1850>> (args: ('b240d926-3617-4790-8d77-bca4e52776f4', '7500162a-0b1e-4931-b85d-4a3784d93903', 2097152, 5, 1, 2, 'a66fb59e-040c-4451-92bc-047cd4571358', '', '00000000-0000-0000-0000-000000000000', '00000000-0000-0000-0000-000000000000') kwargs: {})
1c880df0-46f9-4077-bf1a-50837fb1bd5f::DEBUG::2014-09-07 18:02:57,428::task::331::Storage.TaskManager.Task::(run) Task=`1c880df0-46f9-4077-bf1a-50837fb1bd5f`::Job.run: running createVolume: <bound method StoragePool.createVolume of <storage.sp.StoragePool object at 0x7f54f80e1850>> (args: ('b240d926-3617-4790-8d77-bca4e52776f4', '7500162a-0b1e-4931-b85d-4a3784d93903', 2097152, 5, 1, 2, 'a66fb59e-040c-4451-92bc-047cd4571358', '', '00000000-0000-0000-0000-000000000000', '00000000-0000-0000-0000-000000000000') kwargs: {}) callback None
Thread-135::DEBUG::2014-09-07 18:05:08,741::task::1191::Storage.TaskManager.Task::(prepare) Task=`fb5c40dc-5b02-4499-ae88-b7b47976d94b`::finished: {'allTasksInfo': {'1c880df0-46f9-4077-bf1a-50837fb1bd5f': {'verb': 'createVolume', 'id': '1c880df0-46f9-4077-bf1a-50837fb1bd5f'}}}
Thread-137::DEBUG::2014-09-07 18:07:15,069::task::1191::Storage.TaskManager.Task::(prepare) Task=`17ecb732-6e6b-4238-86fe-35ea48cc1d19`::finished: {'allTasksInfo': {'1c880df0-46f9-4077-bf1a-50837fb1bd5f': {'verb': 'createVolume', 'id': '1c880df0-46f9-4077-bf1a-50837fb1bd5f'}}}
Thread-147::DEBUG::2014-09-07 18:26:53,258::task::1191::Storage.TaskManager.Task::(prepare) Task=`29358927-d19a-4de5-bd35-54b9e1717953`::finished: {'tasks': {'1c880df0-46f9-4077-bf1a-50837fb1bd5f': {'code': 0, 'tag': u'spm', 'state': 'running', 'verb': 'createVolume', 'result': '', 'message': 'running job 1 of 1', 'id': '1c880df0-46f9-4077-bf1a-50837fb1bd5f'}}}
Thread-197::DEBUG::2014-09-07 20:59:09,066::task::1191::Storage.TaskManager.Task::(prepare) Task=`6662097f-3e98-430a-b646-4012c761790d`::finished: {'allTasksInfo': {'1c880df0-46f9-4077-bf1a-50837fb1bd5f': {'verb': 'createVolume', 'id': '1c880df0-46f9-4077-bf1a-50837fb1bd5f'}}}

Comment 1 Allon Mureinik 2014-09-08 17:29:33 UTC
If the task is running uninterrupted this is an infra issue of when/how the task is polled.
Oved - can your guys take lead on this?

Comment 2 Oved Ourfali 2014-09-09 06:12:06 UTC
Sure. Assigned it to Ravi.

Comment 3 Ravi Nori 2014-09-09 15:31:45 UTC
I was unable to reproduce this, works on my end. Please check if this is reproducible.

From the logs it looks like the host went into Non operation status at 2014-09-07 18:15:21 and came back up at 2014-09-07 20:59:05,775. So it works as expected. Please retest

Comment 4 Kevin Alon Goldblatt 2014-09-18 15:30:20 UTC
The problem here is:
Why did the host become non operational as a result of stopping and then restarting the engine for a short while during the Add Volume operation. Is this the expected behavior. If the engine goes down for 60 seconds, your host enters the non operational state for 3 hours, then comes up and resumes the operation?

Comment 5 Ravi Nori 2014-09-19 14:37:28 UTC
The host going into non-operational is unrelated to server restart.

The expected behavior I mentioned above is regarding the Add Volume Operation resuming after host comes back up.

Comment 6 Oved Ourfali 2014-09-22 13:18:45 UTC
Kevin - are you able to reproduce it? If not we'll have to close it as WORKSFORME.

Comment 7 Oved Ourfali 2014-10-06 13:06:07 UTC
Please reopen if the issue reproduces.

Comment 8 Aharon Canan 2014-10-27 11:41:45 UTC
Hi Ravi, 

Can you please answer comment #4?

Comment 9 Ravi Nori 2014-11-10 13:17:47 UTC
Hi Aharon, I already answered in Kevin's question in #4 in comment #5. I think the host going into non-operational state is unrelated to anything happening on the engine side. So it is not the expected behavior that the host goes into non-operation state