Description of problem: We are using OSP6 with CF 3.2 with Ceph as a backend storage. Creating a snapshot in this particular storage configuration takes about 20 minutes. This triggers a timeout in CloudForms. Since the snapshot creation fails, the SmartState does not complete, also. Version-Release number of selected component (if applicable): cfme-5.4.1.0-1.el6cf.x86_64 How reproducible: always Steps to Reproduce: 1. create an instance in OSP and make sure that snapshot creation takes more than 5 minutes 2. run a SmartState Analysis task 3. task will fail Actual results: [----] I, [2015-08-17T15:48:36.779019 #37612:fc1ea8] INFO -- : MIQ(MiqQueue.get_via_drb) Message id: [49000001671049], MiqWorker id: [49000000001775], Zone: [default], Role: [smartstate], Server: [], Ident: [generic], Target id: [], Insta nce id: [49000000000023], Task id: [], Command: [Job.signal], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [:abort, "job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds]", "error"], Dequeued in: [5.638601999] seconds [----] I, [2015-08-17T15:48:36.779148 #37612:fc1ea8] INFO -- : MIQ(MiqQueue.deliver) Message id: [49000001671049], Delivering... [----] I, [2015-08-17T15:48:36.791850 #37612:fc1ea8] INFO -- : MIQ(Event.raise_evm_event): Event Raised [vm_scan_abort] [----] I, [2015-08-17T15:48:36.860858 #37612:fc1ea8] INFO -- : MIQ(Event.raise_evm_event): Alert for Event [vm_scan_abort] [----] I, [2015-08-17T15:48:36.861076 #37612:fc1ea8] INFO -- : MIQ(MiqAlert.evaluate_alerts) [vm_scan_abort] Target: VmOpenstack Name: [cfme167], Id: [49000000000374] [----] E, [2015-08-17T15:48:36.867684 #37612:fc1ea8] ERROR -- : action-abort: job aborting, job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds] [----] I, [2015-08-17T15:48:36.878189 #37612:fc1ea8] INFO -- : action-finished: job finished, job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds] Expected results: Additional info:
changed to use correct timeout value for fleecing job on openstack provider https://github.com/ManageIQ/manageiq/pull/4126
HiHo, I applied the provided fix to my test appliance. Now I see a different error message: Failed to create evm snapshot with EMS. Error: []: [execution expired] I'll upload the full logs. Regards, Christian
*** Bug 1259024 has been marked as a duplicate of this bug. ***
Good to go. Verified and working fine in 5.5.0.8-beta1.4.20151027164951_4ab7fea. This was verified through the commit change but in terms of actually verifying that it handles slow snapshots, we cannot test that at the moment. QE is investigating ways to freeze the snapshot process to force the issue but we are not there yet, scanning works, and do not view this as a blocker to release. Procedure followed to timeout scenario: ======================================= Disabled the smart proxy role and performed a SSA. Got timeout exception after 3012.133582419 seconds of inactivity. Please find the log snippet below. evm.log snippet: ================ [----] I, [2015-11-02T07:09:34.858221 #3065:41f988] INFO -- : MIQ(MiqQueue.put) Message id: [4136], id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Openstack::CloudManager::Vm", :object_id=>121, :attrs=>{:event_type=>"vm_scan_abort", "VmOrTemplate::vm"=>121, :vm_id=>121, :host=>nil, "MiqEvent::miq_event"=>303, :miq_event_id=>303, "EventStream::event_stream"=>303, :event_stream_id=>303}, :instance_name=>"Event", :user_id=>1, :miq_group_id=>1, :tenant_id=>1, :automate_message=>nil}] [----] E, [2015-11-02T07:09:34.858518 #3065:41f988] ERROR -- : MIQ(VmScan#process_abort) job aborting, job timed out after 3012.133582419 seconds of inactivity. Inactivity threshold [3000 seconds] [----] I, [2015-11-02T07:09:34.947919 #3065:41f988] INFO -- : MIQ(VmScan#process_finished) job finished, job timed out after 3012.133582419 seconds of inactivity. Inactivity threshold [3000 seconds] [----] I, [2015-11-02T07:09:34.993475 #3065:41f988] INFO -- : MIQ(VmScan#dispatch_finish) Dispatch Status is 'finished'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2551