Bug 1755632
| Summary: | Retry interval is not set in VMCheckTransformed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Fabien Dupont <fdupont> | ||||
| Component: | V2V | Assignee: | Fabien Dupont <fdupont> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Shveta <sshveta> | ||||
| Severity: | urgent | Docs Contact: | Red Hat CloudForms Documentation <cloudforms-docs> | ||||
| Priority: | urgent | ||||||
| Version: | 5.10.8 | CC: | bthurber, dagur, dmetzger, gekis, istein, mfeifer, mshriver, rjones, simaishi, smallamp, tuado | ||||
| Target Milestone: | GA | Keywords: | ZStream | ||||
| Target Release: | 5.10.12 | Flags: | mfeifer:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | 5.10.12.0 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-11-06 08:58:33 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | Bug | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | V2V | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1755030 | ||||||
| Attachments: |
|
||||||
|
Description
Fabien Dupont
2019-09-25 21:51:30 UTC
*** Bug 1755460 has been marked as a duplicate of this bug. *** This bug was reproduced by v2v migration of a RHEL-7.7 VM, with 1 TB 66% usage. (See also bug 1755460, that was marked as duplicate, on this current bug). Transport method: VDDK. The disk transfer failed after 94% of it was copied. CFME-5.11.0.24 RHV-4.3.5.4-0.1.el7 Logs: https://drive.google.com/drive/u/0/folders/1gX4wwdi79pBz356Q6xbr3pQMP6Lu4dC5 journal.log analysis by Richard W.M. Jones 2019-10-02 15:37:39 UTC: " This is the same as the cezdata bug, because in the journal we can see that CFME is logging in and sending a kill signal to virt-v2v (or perhaps to a related process): Sep 25 02:04:49 leopard03.qa.lab.tlv.redhat.com sshd[36961]: Accepted publickey for root from 10.35.161.44 port 44844 ssh2: RSA SHA256:g5Bu+bCcBH/JcIKTqU/bwBNOk ydRI04Stgrxk2CygOY Sep 25 02:04:50 leopard03.qa.lab.tlv.redhat.com systemd-logind[3281]: New sessio n 1977 of user root. Sep 25 02:04:50 leopard03.qa.lab.tlv.redhat.com systemd[1]: Started Session 1977 of user root. Sep 25 02:04:50 leopard03.qa.lab.tlv.redhat.com sshd[36961]: pam_unix(sshd:session): session opened for user root by (uid=0) Sep 25 02:04:50 leopard03.qa.lab.tlv.redhat.com sudo[36981]: root : TTY=unknown ; PWD=/root ; USER=root ; COMMAND=/bin/kill -s TERM 14115 and we can see from the wrapper log that virt-v2v is reported dead about a second later: 2019-09-25 02:04:51,424:INFO: virt-v2v terminated with return code 15 (virt-v2v-wrapper:1914) " evm.log analysis by Nenad Peric, Oct 03 2019: " you have a retry time of 25 -12 seconds => 13 seconds here's the error which finally kills it (evm.log): [----] I, [2019-09-24T19:04:28.727377 #6882:2ae53265c5c4] INFO -- : Q-task_id([r9_service_template_transformation_plan_task_28]) MIQ(MiqAeEngine.deliver) Requeuing {:object_type=>"ServiceTemplateTransformationPlanTask", :object_id=>28, :namespace=>"Transformation/StateMachines", :class_name=>"VMTransformation", :instance_name=>"Transformation", :automate_message=>"create", :attrs=>{"request"=>"transformation_plan", :service_action=>"Provision"}, :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :miq_task_id=>nil, :state=>"State11", :ae_fsm_started=>nil, :ae_state_started=>"2019-09-24 19:45:55 UTC", :ae_state_retries=>1501, :ae_state_data=>"--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess\nae_state_progress: \nvm_shutdown_in_progress: true\n", :ae_state_previous=>"---\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transformation\":\n ae_state: State11\n ae_state_retries: 1501\n ae_state_started: 2019-09-24 19:45:55 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 1501\n ae_state_started: 2019-09-24 19:46:00 UTC\n ae_state_max_retries: 1500\n"} for object [ServiceTemplateTransformationPlanTask.28] with state [State11] to Automate for delivery in [0] seconds The important part is ae_state: State11\n ae_state_retries: 1501\n ae_state_started: 2019-09-24 19:45:55 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 1501\n ae_state_started: 2019-09-24 19:46:00 UTC\n ae_state_max_retries: 1500\n"} yes, evm.log the point is these retries were much further apart in the past. In your log I can see that: 1st: 2019-09-24T19:04:18.634285 2nd: 2019-09-24T19:04:25.176280 meaning you have around 7 seconds between retries.. Which is 10 times faster (almost) than presumed 60 seconds. It hit this part: ae_state_retries: 1501\n ae_state_started: 2019-09-24 19:46:00 UTC\n ae_state_max_retries: 1500\n" that is it hit the retry number 1501, and the max is 1500 It then initiates: virtv2v_graceful_kill and finally: [----] I, [2019-09-24T19:04:49.536282 #6882:2ae541ca1b78] INFO -- : MIQ(ServiceTemplateTransformationPlanTask#get_conversion_state) InfraConversionJob get_conversion_state to update_options: {:virtv2v_pid=>14115, :virtv2v_disks=>[{:path= >"[HP-iscsi-1] vm_1tb/vm_1tb.vmdk", :size=>1099511627776, :percent=>94.02, :weight=>100.0}]} [----] I, [2019-09-24T19:04:49.574309 #6882:2ae541ca1b78] INFO -- : MIQ(ServiceTemplateTransformationPlanTask#kill_virtv2v) Killing virt-v2v (PID: 14115) with TERM signal. so we see here that percent is 94.02 but the next command at that time is Killing virt-v2v (PID: 14115) with TERM signal. In essence it can be fixed manually in CFME UI here: see CFME_UI_fix.png " Created attachment 1622259 [details]
CFME_UI_fix.png
Hi Fabien , Just to confirm , To verify this BZ we need to check that retries are 1 min apart right ? (Without making any change to automate method ) Thanks, Shveta Exactly. EVm Logs show retries are minimum 1 min apart :
[----] I, [2019-10-08T14:10:39.860569 #16378:1040f50] INFO -- : Q-task_id([r1_service_template_transformation_plan_task_1]) MIQ(MiqQueue.put) Message id: [1534], id: [], Zone: [default], Role: [automate], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [100], State: [ready], Deliver On: [2019-10-08 18:11:39 UTC], Data: [], Args: [{:object_type=>"ServiceTemplateTransformationPlanTask", :object_id=>1, :namespace=>"Transformation/StateMachines", :class_name=>"VMTransformation", :instance_name=>"Transformation", :automate_message=>"create", :attrs=>{"request"=>"transformation_plan", :service_action=>"Provision"}, :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :miq_task_id=>nil, :state=>"State11", :ae_fsm_started=>nil, :ae_state_started=>"2019-10-08 18:10:34 UTC", :ae_state_retries=>1, :ae_state_data=>"--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess\nae_state_progress: \nvm_shutdown_in_progress: true\n", :ae_state_previous=>"---\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transformation\":\n ae_state: State11\n ae_state_retries: 1\n ae_state_started: 2019-10-08 18:10:34 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 1\n ae_state_started: 2019-10-08 18:10:37 UTC\n ae_state_max_retries: 1500\n"}]
[----] I, [2019-10-08T14:11:49.202259 #16386:1040f50] INFO -- : Q-task_id([r1_service_template_transformation_plan_task_1]) MIQ(MiqAeEngine.deliver) Requeuing {:object_type=>"ServiceTemplateTransformationPlanTask", :object_id=>1, :namespace=>"Transformation/StateMachines", :class_name=>"VMTransformation", :instance_name=>"Transformation", :automate_message=>"create", :attrs=>{"request"=>"transformation_plan", :service_action=>"Provision"}, :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :miq_task_id=>nil, :state=>"State11", :ae_fsm_started=>nil, :ae_state_started=>"2019-10-08 18:10:34 UTC", :ae_state_retries=>2, :ae_state_data=>"--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess\nae_state_progress: \nvm_shutdown_in_progress: true\n", :ae_state_previous=>"---\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transformation\":\n ae_state: State11\n ae_state_retries: 2\n ae_state_started: 2019-10-08 18:10:34 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 2\n ae_state_started: 2019-10-08 18:10:37 UTC\n ae_state_max_retries: 1500\n"} for object [ServiceTemplateTransformationPlanTask.1] with state [State11] to Automate for delivery in [60] seconds
[----] I, [2019-10-08T14:12:57.822583 #16378:1040f50] INFO -- : Q-task_id([r1_service_template_transformation_plan_task_1]) MIQ(MiqAeEngine.deliver) Requeuing {:object_type=>"ServiceTemplateTransformationPlanTask", :object_id=>1, :namespace=>"Transformation/StateMachines", :class_name=>"VMTransformation", :instance_name=>"Transformation", :automate_message=>"create", :attrs=>{"request"=>"transformation_plan", :service_action=>"Provision"}, :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :miq_task_id=>nil, :state=>"State11", :ae_fsm_started=>nil, :ae_state_started=>"2019-10-08 18:10:34 UTC", :ae_state_retries=>3, :ae_state_data=>"--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess\nae_state_progress: \nvm_shutdown_in_progress: true\n", :ae_state_previous=>"---\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transformation\":\n ae_state: State11\n ae_state_retries: 3\n ae_state_started: 2019-10-08 18:10:34 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 3\n ae_state_started: 2019-10-08 18:10:37 UTC\n ae_state_max_retries: 1500\n"} for object [ServiceTemplateTransformationPlanTask.1] with state [State11] to Automate for delivery in [60] seconds
[----] I, [2019-10-08T14:14:02.625858 #16386:1040f50] INFO -- : Q-task_id([r1_service_template_transformation_plan_task_1]) MIQ(MiqAeEngine.deliver) Requeuing {:object_type=>"ServiceTemplateTransformationPlanTask", :object_id=>1, :namespace=>"Transformation/StateMachines", :class_name=>"VMTransformation", :instance_name=>"Transformation", :automate_message=>"create", :attrs=>{"request"=>"transformation_plan", :service_action=>"Provision"}, :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :miq_task_id=>nil, :state=>"State11", :ae_fsm_started=>nil, :ae_state_started=>"2019-10-08 18:10:34 UTC", :ae_state_retries=>4, :ae_state_data=>"--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess\nae_state_progress: \nvm_shutdown_in_progress: true\n", :ae_state_previous=>"---\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transformation\":\n ae_state: State11\n ae_state_retries: 4\n ae_state_started: 2019-10-08 18:10:34 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 4\n ae_state_started: 2019-10-08 18:10:37 UTC\n ae_state_max_retries: 1500\n"} for object [ServiceTemplateTransformationPlanTask.1] with state [State11] to Automate for delivery in [60] seconds
Please check if they need to be exactly 1 min apart or this looks good ?
There can be some slight delay for the retry to be scheduled, so it's normal that it's not exactly 1 minute. What is important is that it says "[State11] to Automate for delivery in [60] seconds". Verified in 5.10.12.0.20191007204014_0d55c84 Hi Fabien ,
I am seeing 'Automate for delivery in [0] seconds'
in build 5.11.0.27.20191001204742_d9cf38b
This bug needs to be cloned for 5.11 ?
[----] I, [2019-10-23T13:55:43.169808 #13785:2b1af97445d4] INFO -- : Q-task_id([r11_service_template_transformation_plan_task_11]) MIQ(MiqAeEngine.deliver) Requeuing {:object_type=>"ServiceTemplateTransformationPlanTask", :object_id=>11, :namespace=>"Transformation/StateMachines", :class_name=>"VMTransformation", :instance_name=>"Transformation", :automate_message=>"create", :attrs=>{"request"=>"transformation_plan", :service_action=>"Provision"}, :user_id=>1, :miq_group_id=>2, :tenant_id=>1, :miq_task_id=>nil, :state=>"State11", :ae_fsm_started=>nil, :ae_state_started=>"2019-10-23 17:55:16 UTC", :ae_state_retries=>3, :ae_state_data=>"--- !ruby/hash:ActiveSupport::HashWithIndifferentAccess\nae_state_progress: \nvm_shutdown_in_progress: true\n", :ae_state_previous=>"---\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transformation\":\n ae_state: State11\n ae_state_retries: 3\n ae_state_started: 2019-10-23 17:55:16 UTC\n ae_state_max_retries: 0\n\"/ManageIQ/Transformation/StateMachines/VMTransformation/Transform\":\n ae_state: State5\n ae_state_retries: 3\n ae_state_started: 2019-10-23 17:55:21 UTC\n ae_state_max_retries: 1500\n"} for object [ServiceTemplateTransformationPlanTask.11] with state [State11] to Automate for delivery in [0] seconds
It's already fixed upstream, and the associated BZ are targeted for 5.11.1 because they remove Automate from the equation. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3268 *** Bug 1771043 has been marked as a duplicate of this bug. *** *** Bug 1755030 has been marked as a duplicate of this bug. *** |