Bug 1105211

Summary: Executing multiple "template.delete" commands in parallel to "vm.delete" commands, creates a race condition which cause the Blank template to be removed from Data Center
Product: Red Hat Enterprise Virtualization Manager Reporter: Ori Gofen <ogofen>
Component: ovirt-engineAssignee: Shahar Havivi <shavivi>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acanan, gklein, iheim, lpeer, michal.skrivanek, mlipchuk, ofrenkel, rbalakri, Rhev-m-bugs, shavivi, sherold, tnisan, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: org.ovirt.engine-root-3.5.0-13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1130887 (view as bug list) Environment:
Last Closed: 2015-02-17 08:26:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1113256, 1118249    
Bug Blocks: 1130887, 1156162    
Attachments:
Description Flags
images and logs none

Description Ori Gofen 2014-06-05 15:26:11 UTC
Created attachment 902576 [details]
images and logs

Description of problem:

Commencing multiple "vm.delete" operations in parallel to multiple template.delete operations and disk.remove cause somehow a race condition which ends with the Blank template removal.

In the middle of the procedure,when the async_task table still has tasks that wait for their turn to be sent to vdsm,we restart the engine.

when the ui comes back blank template has been deleted (view image)

vm_static does not report on the existence of any template:

engine=# SELECT * FROM vm_static;
 vm_guid | vm_name | mem_size_mb | vmt_guid | os | description | vds_group_id | creation_date | num_of_monitors | is_initialized | is_auto_suspend | num_of_sockets | cpu_per_socket | usb_po
licy | time_zone | is_stateless | fail_back | _create_date | _update_date | dedicated_vm_for_vds | auto_startup | vm_type | nice_level | default_boot_sequence | default_display_type | prior
ity | iso_path | origin | initrd_url | kernel_url | kernel_params | migration_support | userdefined_properties | predefined_properties | min_allocated_mem | entity_type | child_count | temp
late_status | quota_id | allow_console_reconnect | cpu_pinning | is_smartcard_enabled | host_cpu_flags | db_generation | is_delete_protected | is_disabled | is_run_and_pause | created_by_us
er_id | tunnel_migration | free_text_comment | single_qxl_pci | cpu_shares | vnc_keyboard_layout | instance_type_id | image_type_id | sso_method | original_template_id | original_template_n
ame | migration_downtime | template_version_number | template_version_name 
---------+---------+-------------+----------+----+-------------+--------------+---------------+-----------------+----------------+-----------------+----------------+----------------+-------
-----+-----------+--------------+-----------+--------------+--------------+----------------------+--------------+---------+------------+-----------------------+----------------------+------
----+----------+--------+------------+------------+---------------+-------------------+------------------------+-----------------------+-------------------+-------------+-------------+-----
------------+----------+-------------------------+-------------+----------------------+----------------+---------------+---------------------+-------------+------------------+--------------
------+------------------+-------------------+----------------+------------+---------------------+------------------+---------------+------------+----------------------+--------------------
----+--------------------+-------------------------+-----------------------
(0 rows)


Version-Release number of selected component (if applicable):

rhevm-3.4.0-0.21.el6ev.noarch
vdsm-4.14.7-3.el6ev.x86_64

How reproducible:
36.66% (4 out of 11)

Steps to Reproduce:
1.create 8 vms+disks on nfs and iscsi
2.make templates from all of them
3.copy all template to block/file domain
4.create 8 vm's from template
5.remove all the vm's by selecting all of them
6.remove all disks by selection all of them
7.remove all templates by selecting all of them(if the selection of the Blank template greys out the remove button,just select all templates beside the blabk template)
8.wait a minute and restart the engine

Actual results:

When engine comes back,the Blank template is missing

Expected results:

A removal of Blank template shouldn't be possible

Additional info:

Comment 1 Shahar Havivi 2014-06-17 10:52:18 UTC
Ori,
Can you attach the vdsm and engine logs from the relevant time?

Comment 2 Ori Gofen 2014-06-17 15:27:44 UTC
Shahar,
They are updated as far as I know,
Unfortunately I don't have anything else to offer right now,
I'll reproduce and attach new logs soon.

Comment 4 Michal Skrivanek 2014-07-08 08:33:39 UTC
after discussion with Oved there's a bigger infra issue to be addressed, possibly in 3.5; we also need a workaround in 3.4.z as the consequence might be really bad

Comment 5 Oved Ourfali 2014-07-08 09:03:46 UTC
(In reply to Michal Skrivanek from comment #4)
> after discussion with Oved there's a bigger infra issue to be addressed,
> possibly in 3.5; we also need a workaround in 3.4.z as the consequence might
> be really bad

Hi

You're right.
However, the z-stream fix shouldn't be the same one as the 3.5.0 one.
We've already suggested a z-stream fix to Shahar.
I think that the right way here is to split to two bugs:
One on infra, for 3.5.0, that will handle this.
The other one on virt, for 3.4.Z, that will do the workaround.

Michal - thoughts on that?

Comment 6 Michal Skrivanek 2014-07-08 15:06:36 UTC
for the workaround - It's still an infra bug though, but it may be easier/more suitable for someone from virt fix it…up to Omer based on his best judgment based on complexity of the implementation.

Comment 8 Oved Ourfali 2014-07-28 08:20:22 UTC
Hi Michal

We've provided the infra for this one, as part of handling Bug 1118249.
Would be best if you sync through it and do the relevant logic for this bug.
Changing to virt.
Please consult Ravi if you have questions regarding the infra work done here.

Comment 9 Eyal Edri 2014-08-04 11:09:12 UTC
these bugs are candidates for z-stream, but not ready yet.
they were not included in 3.4.2 bug tracker [1] for critical bugs by gss,
and out of of scope for the 3.4.2 build.
moving to 3.4.3.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1123858

Comment 11 Eyal Edri 2014-09-28 11:29:34 UTC
this bug was moved to MODIFIED before vt4 build date thus moving to ON_QA.
if you belive this bug isn't in vt4, please report to rhev-integ

Comment 12 Lukas Svaty 2014-10-02 13:19:22 UTC
tested this 10 times - unable to reproduce, seems to be fixed, if you have better verification steps then those in Description please provide them and I'll re-verify

Comment 13 Omer Frenkel 2015-02-17 08:26:24 UTC
RHEV-M 3.5.0 has been released