Bug 1254564 - SmartState times out if snapshot creation takes too long
Summary: SmartState times out if snapshot creation takes too long
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: SmartState Analysis
Version: 5.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.5.0
Assignee: Hui Song
QA Contact: Ramesh A
URL:
Whiteboard:
: 1259024 (view as bug list)
Depends On:
Blocks: 1259809
TreeView+ depends on / blocked
 
Reported: 2015-08-18 12:01 UTC by Christian Jung
Modified: 2015-12-08 13:27 UTC (History)
9 users (show)

Fixed In Version: 5.5.0.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1259809 (view as bug list)
Environment:
Last Closed: 2015-12-08 13:27:43 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:2551 0 normal SHIPPED_LIVE Moderate: CFME 5.5.0 bug fixes and enhancement update 2015-12-08 17:58:09 UTC

Description Christian Jung 2015-08-18 12:01:28 UTC
Description of problem:
We are using OSP6 with CF 3.2 with Ceph as a backend storage. Creating a snapshot in this particular storage configuration takes about 20 minutes.

This triggers a timeout in CloudForms. Since the snapshot creation fails, the SmartState does not complete, also.


Version-Release number of selected component (if applicable):
cfme-5.4.1.0-1.el6cf.x86_64

How reproducible:
always

Steps to Reproduce:
1. create an instance in OSP and make sure that snapshot creation takes more than 5 minutes
2. run a SmartState Analysis task
3. task will fail

Actual results:
[----] I, [2015-08-17T15:48:36.779019 #37612:fc1ea8] INFO -- : MIQ(MiqQueue.get_via_drb) Message id: [49000001671049], MiqWorker id: [49000000001775], Zone: [default], Role: [smartstate], Server: [], Ident: [generic], Target id: [], Insta
nce id: [49000000000023], Task id: [], Command: [Job.signal], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [:abort, "job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000
seconds]", "error"], Dequeued in: [5.638601999] seconds
[----] I, [2015-08-17T15:48:36.779148 #37612:fc1ea8] INFO -- : MIQ(MiqQueue.deliver) Message id: [49000001671049], Delivering...
[----] I, [2015-08-17T15:48:36.791850 #37612:fc1ea8] INFO -- : MIQ(Event.raise_evm_event): Event Raised [vm_scan_abort]
[----] I, [2015-08-17T15:48:36.860858 #37612:fc1ea8] INFO -- : MIQ(Event.raise_evm_event): Alert for Event [vm_scan_abort]
[----] I, [2015-08-17T15:48:36.861076 #37612:fc1ea8] INFO -- : MIQ(MiqAlert.evaluate_alerts) [vm_scan_abort] Target: VmOpenstack Name: [cfme167], Id: [49000000000374]
[----] E, [2015-08-17T15:48:36.867684 #37612:fc1ea8] ERROR -- : action-abort: job aborting, job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds]
[----] I, [2015-08-17T15:48:36.878189 #37612:fc1ea8] INFO -- : action-finished: job finished, job timed out after 346.399386357 seconds of inactivity. Inactivity threshold [3000 seconds]

Expected results:


Additional info:

Comment 5 Rich Oliveri 2015-09-03 14:55:55 UTC
changed to use correct timeout value for fleecing job on openstack provider

https://github.com/ManageIQ/manageiq/pull/4126

Comment 6 Christian Jung 2015-09-09 15:46:49 UTC
HiHo,
I applied the provided fix to my test appliance. Now I see a different error message:
Failed to create evm snapshot with EMS. Error: []: [execution expired] 

I'll upload the full logs.

Regards,
Christian

Comment 10 Rich Oliveri 2015-10-26 16:26:16 UTC
*** Bug 1259024 has been marked as a duplicate of this bug. ***

Comment 11 Ramesh A 2015-11-02 16:31:01 UTC
Good to go.  Verified and working fine in 5.5.0.8-beta1.4.20151027164951_4ab7fea.

This was verified through the commit change but in terms of actually verifying that it handles slow snapshots, we cannot test that at the moment.  QE is investigating ways to freeze the snapshot process to force the issue but we are not there yet, scanning works, and do not view this as a blocker to release. 

Procedure followed to timeout scenario:
=======================================
Disabled the smart proxy role and performed a SSA.  Got timeout exception after 3012.133582419 seconds of inactivity.  Please find the log snippet below.


evm.log snippet:
================
[----] I, [2015-11-02T07:09:34.858221 #3065:41f988]  INFO -- : MIQ(MiqQueue.put) Message id: [4136],  id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Openstack::CloudManager::Vm", :object_id=>121, :attrs=>{:event_type=>"vm_scan_abort", "VmOrTemplate::vm"=>121, :vm_id=>121, :host=>nil, "MiqEvent::miq_event"=>303, :miq_event_id=>303, "EventStream::event_stream"=>303, :event_stream_id=>303}, :instance_name=>"Event", :user_id=>1, :miq_group_id=>1, :tenant_id=>1, :automate_message=>nil}]
[----] E, [2015-11-02T07:09:34.858518 #3065:41f988] ERROR -- : MIQ(VmScan#process_abort) job aborting, job timed out after 3012.133582419 seconds of inactivity.  Inactivity threshold [3000 seconds]
[----] I, [2015-11-02T07:09:34.947919 #3065:41f988]  INFO -- : MIQ(VmScan#process_finished) job finished, job timed out after 3012.133582419 seconds of inactivity.  Inactivity threshold [3000 seconds]
[----] I, [2015-11-02T07:09:34.993475 #3065:41f988]  INFO -- : MIQ(VmScan#dispatch_finish) Dispatch Status is 'finished'

Comment 13 errata-xmlrpc 2015-12-08 13:27:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2551


Note You need to log in before you can comment on or make changes to this bug.