Bug 1254564 - SmartState times out if snapshot creation takes too long
SmartState times out if snapshot creation takes too long
Status: CLOSED ERRATA
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: SmartState Analysis (Show other bugs)
5.4.0
Unspecified Unspecified
high Severity high
: GA
: 5.5.0
Assigned To: Hui Song
Ramesh A
: ZStream
: 1259024 (view as bug list)
Depends On:
Blocks: 1259809
  Show dependency treegraph
 
Reported: 2015-08-18 08:01 EDT by Christian Jung
Modified: 2015-12-08 08:27 EST (History)
9 users (show)

See Also:
Fixed In Version: 5.5.0.1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1259809 (view as bug list)
Environment:
Last Closed: 2015-12-08 08:27:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Christian Jung 2015-08-18 08:01:28 EDT
Description of problem:
We are using OSP6 with CF 3.2 with Ceph as a backend storage. Creating a snapshot in this particular storage configuration takes about 20 minutes.

This triggers a timeout in CloudForms. Since the snapshot creation fails, the SmartState does not complete, also.


Version-Release number of selected component (if applicable):
cfme-5.4.1.0-1.el6cf.x86_64

How reproducible:
always

Steps to Reproduce:
1. create an instance in OSP and make sure that snapshot creation takes more than 5 minutes
2. run a SmartState Analysis task
3. task will fail

Actual results:
[----] I, [2015-08-17T15:48:36.779019 #37612:fc1ea8] INFO -- : MIQ(MiqQueue.get_via_drb) Message id: [49000001671049], MiqWorker id: [49000000001775], Zone: [default], Role: [smartstate], Server: [], Ident: [generic], Target id: [], Insta
nce id: [49000000000023], Task id: [], Command: [Job.signal], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [:abort, "job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000
seconds]", "error"], Dequeued in: [5.638601999] seconds
[----] I, [2015-08-17T15:48:36.779148 #37612:fc1ea8] INFO -- : MIQ(MiqQueue.deliver) Message id: [49000001671049], Delivering...
[----] I, [2015-08-17T15:48:36.791850 #37612:fc1ea8] INFO -- : MIQ(Event.raise_evm_event): Event Raised [vm_scan_abort]
[----] I, [2015-08-17T15:48:36.860858 #37612:fc1ea8] INFO -- : MIQ(Event.raise_evm_event): Alert for Event [vm_scan_abort]
[----] I, [2015-08-17T15:48:36.861076 #37612:fc1ea8] INFO -- : MIQ(MiqAlert.evaluate_alerts) [vm_scan_abort] Target: VmOpenstack Name: [cfme167], Id: [49000000000374]
[----] E, [2015-08-17T15:48:36.867684 #37612:fc1ea8] ERROR -- : action-abort: job aborting, job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds]
[----] I, [2015-08-17T15:48:36.878189 #37612:fc1ea8] INFO -- : action-finished: job finished, job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds]

Expected results:


Additional info:
Comment 5 Rich Oliveri 2015-09-03 10:55:55 EDT
changed to use correct timeout value for fleecing job on openstack provider

https://github.com/ManageIQ/manageiq/pull/4126
Comment 6 Christian Jung 2015-09-09 11:46:49 EDT
HiHo,
I applied the provided fix to my test appliance. Now I see a different error message:
Failed to create evm snapshot with EMS. Error: []: [execution expired] 

I'll upload the full logs.

Regards,
Christian
Comment 10 Rich Oliveri 2015-10-26 12:26:16 EDT
*** Bug 1259024 has been marked as a duplicate of this bug. ***
Comment 11 Ramesh A 2015-11-02 11:31:01 EST
Good to go.  Verified and working fine in 5.5.0.8-beta1.4.20151027164951_4ab7fea.

This was verified through the commit change but in terms of actually verifying that it handles slow snapshots, we cannot test that at the moment.  QE is investigating ways to freeze the snapshot process to force the issue but we are not there yet, scanning works, and do not view this as a blocker to release. 

Procedure followed to timeout scenario:
=======================================
Disabled the smart proxy role and performed a SSA.  Got timeout exception after 3012.133582419 seconds of inactivity.  Please find the log snippet below.


evm.log snippet:
================
[----] I, [2015-11-02T07:09:34.858221 #3065:41f988]  INFO -- : MIQ(MiqQueue.put) Message id: [4136],  id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Openstack::CloudManager::Vm", :object_id=>121, :attrs=>{:event_type=>"vm_scan_abort", "VmOrTemplate::vm"=>121, :vm_id=>121, :host=>nil, "MiqEvent::miq_event"=>303, :miq_event_id=>303, "EventStream::event_stream"=>303, :event_stream_id=>303}, :instance_name=>"Event", :user_id=>1, :miq_group_id=>1, :tenant_id=>1, :automate_message=>nil}]
[----] E, [2015-11-02T07:09:34.858518 #3065:41f988] ERROR -- : MIQ(VmScan#process_abort) job aborting, job timed out after 3012.133582419 seconds of inactivity.  Inactivity threshold [3000 seconds]
[----] I, [2015-11-02T07:09:34.947919 #3065:41f988]  INFO -- : MIQ(VmScan#process_finished) job finished, job timed out after 3012.133582419 seconds of inactivity.  Inactivity threshold [3000 seconds]
[----] I, [2015-11-02T07:09:34.993475 #3065:41f988]  INFO -- : MIQ(VmScan#dispatch_finish) Dispatch Status is 'finished'
Comment 13 errata-xmlrpc 2015-12-08 08:27:43 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2551

Note You need to log in before you can comment on or make changes to this bug.