Bug 1256821 - CFME Snapshot Request to OpenStack fails for SSL error leaving Snapshot File orphaned in OpenStack environment
CFME Snapshot Request to OpenStack fails for SSL error leaving Snapshot File ...
Status: CLOSED EOL
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: SmartState Analysis (Show other bugs)
5.4.0
x86_64 Linux
high Severity high
: GA
: cfme-future
Assigned To: Tzu-Mainn Chen
Satyajit Bulage
openstack:smartstate
:
Depends On:
Blocks: 1290164
  Show dependency treegraph
 
Reported: 2015-08-25 10:09 EDT by Thomas Hennessy
Modified: 2017-07-17 14:09 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1290164 (view as bug list)
Environment:
Last Closed: 2017-07-17 14:09:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas Hennessy 2015-08-25 10:09:29 EDT
Description of problem: Request for snapshot creation from CFME to OpenStack provider fails in CFME but creates a snapshot in OpenStack that is never deleted.


Version-Release number of selected component (if applicable): 5.4.1.0


How reproducible: reported in cutomer environment.  all vm scan attempts result in this error.


Steps to Reproduce:
1.
2.
3.

Actual results: message from which the snapshot request is initiated timesout after 10 minutes (600 seconds) after fog.log reports error.  Snapshot has been created in Openstack.


Expected results: request to create snapshot should return in less than 10 minutes and provide information about status of request.


Additional info:
full extract of vm scan sequence follows:
=====
[----] I, [2015-08-25T11:22:36.036921 #32936:67beac]  INFO -- : MIQ(MiqQueue.get_via_drb) Message id: [40000092261174], MiqWorker id: [40000000323821], Zone: [default], Role: [smartproxy], Server: [e5e712f6-bb6e-11e4-8fcf-005056af2400], Ident: [generic], Target id: [], Instance id: [40000000378199], Task id: [c6a5bf60-4b0a-11e5-8700-005056af2400], Command: [Job.signal], Timeout: [600], Priority: [20], State: [dequeue], Deliver On: [], Data: [], Args: ["start"], Dequeued in: [5.529006752] seconds
[----] I, [2015-08-25T11:22:36.037250 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(MiqQueue.deliver)    Message id: [40000092261174], Delivering...
[----] I, [2015-08-25T11:22:36.048967 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) action-call_snapshot: Enter
[----] I, [2015-08-25T11:22:36.056187 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(Event.raise_evm_event): Event Raised [vm_scan_start]
[----] I, [2015-08-25T11:22:36.183036 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(Event.raise_evm_event): Alert for Event [vm_scan_start]
[----] I, [2015-08-25T11:22:36.183302 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(MiqAlert.evaluate_alerts) [vm_scan_start] Target: VmOpenstack Name: [ISLEGENDS01], Id: [40000000006790]
[----] I, [2015-08-25T11:22:36.198889 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(scan-action-call_snapshot_create) Creating snapshot, description: [Snapshot for scan job: c6a5bf60-4b0a-11e5-8700-005056af2400, EVM Server build: 20150717083323_6ed7e1c  Server Time: 2015-08-25T09:22:36Z]
[----] E, [2015-08-25T11:22:36.253397 #32936:67beac] ERROR -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) <Fog> excon.error     #<Excon::Errors::SocketError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unknown protocol (OpenSSL::SSL::SSLError)>
[----] E, [2015-08-25T11:22:37.550945 #32936:67beac] ERROR -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) <Fog> excon.error     #<Excon::Errors::SocketError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unknown protocol (OpenSSL::SSL::SSLError)>
[----] E, [2015-08-25T11:22:39.213883 #32936:67beac] ERROR -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) <Fog> excon.error     #<Excon::Errors::NotFound: Expected([200, 203]) <=> Actual(404 Not Found)
[----] E, [2015-08-25T11:22:42.576866 #32936:67beac] ERROR -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) <Fog> excon.error     #<Excon::Errors::SocketError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unknown protocol (OpenSSL::SSL::SSLError)>
[----] E, [2015-08-25T11:32:36.044520 #32936:67beac] ERROR -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(scan-call_snapshot_create Failed to create evm snapshot with EMS. Error: []: [execution expired]
[----] I, [2015-08-25T11:32:36.057110 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(Event.raise_evm_event): Event Raised [vm_scan_abort]
[----] I, [2015-08-25T11:32:36.096615 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(Event.raise_evm_event): Alert for Event [vm_scan_abort]
[----] I, [2015-08-25T11:32:36.096914 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(MiqAlert.evaluate_alerts) [vm_scan_abort] Target: VmOpenstack Name: [ISLEGENDS01], Id: [40000000006790]
[----] E, [2015-08-25T11:32:36.109229 #32936:67beac] ERROR -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) action-abort: job aborting, Failed to create evm snapshot with EMS. Error: []: [execution expired]
[----] I, [2015-08-25T11:32:36.119572 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) action-finished: job finished, Failed to create evm snapshot with EMS. Error: []: [execution expired]
[----] I, [2015-08-25T11:32:36.124056 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) dispatch_finish: Dispatch Status is 'finished'
[----] I, [2015-08-25T11:32:36.128566 #32936:67beac]  INFO -- : Q-task_id([c6a5bf60-4b0a-11e5-8700-005056af2400]) MIQ(MiqQueue.delivered)  Message id: [40000092261174], State: [ok], Delivered in [600.091313667] seconds  
========

Notice FOG error occurs in about 3 seconds, but is never retured to the requesting process which eventally timesout after the timeout value (600 seconds) has expired.

So several errors:
1- SSL error in creating snapshot
2- failure of FOG component to return error or anything to requesting process resulting in requesting process timing-out and hanging that process as inert for 10 minutes, adversly impacting other potentially successful activity.
3- failure to generate a fail-safe snapshot delete message to prevent the provider storage areas from being littered with snapshot files.
Comment 2 Thomas Hennessy 2015-08-25 10:27:35 EDT
this case appears to be a duplicate of BZ 1222642, however it was not recognized in that case that the snapshot was actually created.  this creates a much larger exposure for the customer and should result in a higher severity based on the adverse impact this has on the customer environment, In My Humble Opinion(IMHO).
Comment 3 Edu Alcaniz 2016-11-16 09:30:00 EST
could you give an update about this bug?
Comment 4 Satoe Imaishi 2016-11-16 09:53:10 EST
No update I can provide from the build side, as this hasn't been fixed yet. Based on the flags set on this bug, it's not targeted for cfme 5.7.
Comment 5 Satoe Imaishi 2016-11-16 09:54:03 EST
Oops... didn't mean to clear all needinfo.
Comment 6 Rich Oliveri 2016-11-17 14:30:44 EST
From a CFME perspective, there doesn't seem to be much we can do to address this issue. The API call to create the snapshot fails, but the snapshot does indeed get created. Due to the failure of the request, CFME assumes the snapshot wasn't created. Given the asynchronous nature of snapshot creation, there's no way to determine that the snapshot does get created at some point in the future.

At its core, this appears to be an issue on the OpenStack side - when a snapshot creation request fails, the snapshot shouldn't be created.

For us to address this in CFME, we would have to periodically sweep all snapshots, and somehow determine if any of them were orphaned SSA snapshots, deleting any found. However, this would not address the core issue of the failure.
Comment 10 Tzu-Mainn Chen 2017-02-21 11:05:46 EST
Re-assigning for further investigation
Comment 11 Marianne Feifer 2017-06-06 17:52:30 EDT
As this BZ in on a CF version that is EOL and the SF case is closed, can this be closed?

Note You need to log in before you can comment on or make changes to this bug.