Created attachment 902576 [details] images and logs Description of problem: Commencing multiple "vm.delete" operations in parallel to multiple template.delete operations and disk.remove cause somehow a race condition which ends with the Blank template removal. In the middle of the procedure,when the async_task table still has tasks that wait for their turn to be sent to vdsm,we restart the engine. when the ui comes back blank template has been deleted (view image) vm_static does not report on the existence of any template: engine=# SELECT * FROM vm_static; vm_guid | vm_name | mem_size_mb | vmt_guid | os | description | vds_group_id | creation_date | num_of_monitors | is_initialized | is_auto_suspend | num_of_sockets | cpu_per_socket | usb_po licy | time_zone | is_stateless | fail_back | _create_date | _update_date | dedicated_vm_for_vds | auto_startup | vm_type | nice_level | default_boot_sequence | default_display_type | prior ity | iso_path | origin | initrd_url | kernel_url | kernel_params | migration_support | userdefined_properties | predefined_properties | min_allocated_mem | entity_type | child_count | temp late_status | quota_id | allow_console_reconnect | cpu_pinning | is_smartcard_enabled | host_cpu_flags | db_generation | is_delete_protected | is_disabled | is_run_and_pause | created_by_us er_id | tunnel_migration | free_text_comment | single_qxl_pci | cpu_shares | vnc_keyboard_layout | instance_type_id | image_type_id | sso_method | original_template_id | original_template_n ame | migration_downtime | template_version_number | template_version_name ---------+---------+-------------+----------+----+-------------+--------------+---------------+-----------------+----------------+-----------------+----------------+----------------+------- -----+-----------+--------------+-----------+--------------+--------------+----------------------+--------------+---------+------------+-----------------------+----------------------+------ ----+----------+--------+------------+------------+---------------+-------------------+------------------------+-----------------------+-------------------+-------------+-------------+----- ------------+----------+-------------------------+-------------+----------------------+----------------+---------------+---------------------+-------------+------------------+-------------- ------+------------------+-------------------+----------------+------------+---------------------+------------------+---------------+------------+----------------------+-------------------- ----+--------------------+-------------------------+----------------------- (0 rows) Version-Release number of selected component (if applicable): rhevm-3.4.0-0.21.el6ev.noarch vdsm-4.14.7-3.el6ev.x86_64 How reproducible: 36.66% (4 out of 11) Steps to Reproduce: 1.create 8 vms+disks on nfs and iscsi 2.make templates from all of them 3.copy all template to block/file domain 4.create 8 vm's from template 5.remove all the vm's by selecting all of them 6.remove all disks by selection all of them 7.remove all templates by selecting all of them(if the selection of the Blank template greys out the remove button,just select all templates beside the blabk template) 8.wait a minute and restart the engine Actual results: When engine comes back,the Blank template is missing Expected results: A removal of Blank template shouldn't be possible Additional info:
Ori, Can you attach the vdsm and engine logs from the relevant time?
Shahar, They are updated as far as I know, Unfortunately I don't have anything else to offer right now, I'll reproduce and attach new logs soon.
after discussion with Oved there's a bigger infra issue to be addressed, possibly in 3.5; we also need a workaround in 3.4.z as the consequence might be really bad
(In reply to Michal Skrivanek from comment #4) > after discussion with Oved there's a bigger infra issue to be addressed, > possibly in 3.5; we also need a workaround in 3.4.z as the consequence might > be really bad Hi You're right. However, the z-stream fix shouldn't be the same one as the 3.5.0 one. We've already suggested a z-stream fix to Shahar. I think that the right way here is to split to two bugs: One on infra, for 3.5.0, that will handle this. The other one on virt, for 3.4.Z, that will do the workaround. Michal - thoughts on that?
for the workaround - It's still an infra bug though, but it may be easier/more suitable for someone from virt fix it…up to Omer based on his best judgment based on complexity of the implementation.
Hi Michal We've provided the infra for this one, as part of handling Bug 1118249. Would be best if you sync through it and do the relevant logic for this bug. Changing to virt. Please consult Ravi if you have questions regarding the infra work done here.
these bugs are candidates for z-stream, but not ready yet. they were not included in 3.4.2 bug tracker [1] for critical bugs by gss, and out of of scope for the 3.4.2 build. moving to 3.4.3. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1123858
this bug was moved to MODIFIED before vt4 build date thus moving to ON_QA. if you belive this bug isn't in vt4, please report to rhev-integ
tested this 10 times - unable to reproduce, seems to be fixed, if you have better verification steps then those in Description please provide them and I'll re-verify
RHEV-M 3.5.0 has been released