Bug 1488967 - Need to verify that SSA works with Azure Managed Storage [NEEDINFO]
Summary: Need to verify that SSA works with Azure Managed Storage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: SmartState Analysis
Version: 5.8.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: GA
: 5.8.2
Assignee: Jerry Keselman
QA Contact: Satyajit Bulage
URL:
Whiteboard:
Depends On: 1475540
Blocks: 1503797
TreeView+ depends on / blocked
 
Reported: 2017-09-06 14:14 UTC by Satoe Imaishi
Modified: 2017-10-24 00:41 UTC (History)
12 users (show)

Fixed In Version: 5.8.2.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1475540
Environment:
Last Closed: 2017-10-24 00:41:44 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
tachoi: needinfo? (djoo)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3005 normal SHIPPED_LIVE Important: Red Hat CloudForms security, bug fix, and enhancement update 2017-10-24 04:15:49 UTC

Comment 2 Jerry Keselman 2017-09-06 15:48:03 UTC
https://github.com/ManageIQ/manageiq-gems-pending/pull/267 pulls the changes for manageiq-smartstate into manageiq-gems-pending for the Fine release.

Comment 3 CFME Bot 2017-09-06 18:01:17 UTC
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/35a1dfa5ff2dfaa79568c9bff42ec6055d77988d

commit 35a1dfa5ff2dfaa79568c9bff42ec6055d77988d
Author:     Richard Oliveri <oliveri.richard.github@gmail.com>
AuthorDate: Mon Aug 28 16:35:43 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Wed Sep 6 13:54:49 2017 -0400

    Merge pull request #15865 from jerryk55/snapshot_azure_for_ssa
    
    Create Snapshot for Azure
    (cherry picked from commit ab36e54eadec6fdfdfdb31367c4d2a69df584b67)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 app/models/vm_scan.rb | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comment 4 CFME Bot 2017-09-06 18:04:09 UTC
New commit detected on ManageIQ/manageiq-providers-azure/fine:
https://github.com/ManageIQ/manageiq-providers-azure/commit/2dd69ef21edfc14356aca44bcbc6a254052e2d5a

commit 2dd69ef21edfc14356aca44bcbc6a254052e2d5a
Author:     Greg Blomquist <blomquisg@gmail.com>
AuthorDate: Fri Sep 1 10:15:38 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Wed Sep 6 13:57:57 2017 -0400

    Merge pull request #117 from jerryk55/managed_disk_snapshot_support
    
    Add Snapshot Code for Azure Managed Disks
    (cherry picked from commit 4492103ec7bfd198e184e144221cb2b398d023b6)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 .../manageiq/providers/azure/cloud_manager.rb      | 54 ++++++++++++++++++++++
 .../manageiq/providers/azure/cloud_manager/vm.rb   |  5 --
 .../azure/cloud_manager/vm_or_template_shared.rb   |  5 ++
 .../vm_or_template_shared/scanning.rb              | 20 ++++++--
 4 files changed, 76 insertions(+), 8 deletions(-)

Comment 5 CFME Bot 2017-09-06 18:13:55 UTC
New commit detected on ManageIQ/manageiq-gems-pending/fine:
https://github.com/ManageIQ/manageiq-gems-pending/commit/db8dadd367cd34e7d7acc1d95f5637e6c1edb224

commit db8dadd367cd34e7d7acc1d95f5637e6c1edb224
Author:     Bronagh Sorota <bsorota@redhat.com>
AuthorDate: Tue Sep 5 10:01:05 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Wed Sep 6 14:05:06 2017 -0400

    Merge pull request #120 from djberg96/gemspec
    
    Bump azure-armrest dependency to 0.8.2
    (cherry picked from commit 91afa868336b337fd2d73bd8de8e44c3b20c4d4f)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 manageiq-gems-pending.gemspec | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 6 CFME Bot 2017-09-06 19:08:12 UTC
New commit detected on ManageIQ/manageiq-providers-azure/fine:
https://github.com/ManageIQ/manageiq-providers-azure/commit/8154df65c75616bdeda9f13d4972edbeaacbcfaa

commit 8154df65c75616bdeda9f13d4972edbeaacbcfaa
Author:     Bronagh Sorota <bsorota@redhat.com>
AuthorDate: Tue Sep 5 10:01:05 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Wed Sep 6 15:01:28 2017 -0400

    Merge pull request #120 from djberg96/gemspec
    
    Bump azure-armrest dependency to 0.8.2
    (cherry picked from commit 91afa868336b337fd2d73bd8de8e44c3b20c4d4f)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 manageiq-providers-azure.gemspec | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 8 Jerry Keselman 2017-09-11 21:45:20 UTC
Unfortunately the hot fix has failed because the following pre-existing PR created back in April was not back ported to Fine.  We need https://github.com/ManageIQ/manageiq-providers-azure/pull/62 back ported and then a new hot fix will need to be created which includes the PR.

Comment 9 CFME Bot 2017-09-12 15:23:14 UTC
New commit detected on ManageIQ/manageiq-providers-azure/fine:
https://github.com/ManageIQ/manageiq-providers-azure/commit/b75fb9b42252ac693710b7314b7e0d99462943fb

commit b75fb9b42252ac693710b7314b7e0d99462943fb
Author:     Bronagh Sorota <bsorota@redhat.com>
AuthorDate: Fri Apr 28 16:02:04 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Tue Sep 12 11:20:59 2017 -0400

    Merge pull request #62 from djberg96/instance_location
    
    [REFACTOR] Set instance location to region location instead of using uid value
    (cherry picked from commit 04e380b10436073bbc291dc75124bd5e031e5f0d)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 app/models/manageiq/providers/azure/cloud_manager/refresh_parser.rb  | 2 +-
 spec/models/manageiq/providers/azure/cloud_manager/refresher_spec.rb | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

Comment 14 CFME Bot 2017-09-14 20:38:11 UTC
New commit detected on ManageIQ/manageiq-providers-azure/fine:
https://github.com/ManageIQ/manageiq-providers-azure/commit/26ddc4fb61cfdaa1fcb697c47b23da2216ed3b02

commit 26ddc4fb61cfdaa1fcb697c47b23da2216ed3b02
Author:     Bronagh Sorota <bsorota@redhat.com>
AuthorDate: Thu Sep 14 16:18:51 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Thu Sep 14 16:25:18 2017 -0400

    Merge pull request #125 from jerryk55/wait_for_snapshot_success
    
    Wait for SSA Snapshot Success
    (cherry picked from commit 9217377b32db9ad7251b30cba442ff8a0e0fb305)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967
    https://bugzilla.redhat.com/show_bug.cgi?id=1491310

 app/models/manageiq/providers/azure/cloud_manager.rb | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comment 16 CFME Bot 2017-09-15 21:18:20 UTC
New commit detected on ManageIQ/manageiq-providers-azure/fine:
https://github.com/ManageIQ/manageiq-providers-azure/commit/6160a41e26d3e5c9bb64c3bddc314aa895684d78

commit 6160a41e26d3e5c9bb64c3bddc314aa895684d78
Author:     Bronagh Sorota <bsorota@redhat.com>
AuthorDate: Fri Sep 15 17:05:25 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Fri Sep 15 17:11:55 2017 -0400

    Merge pull request #126 from roliveri/snapshots
    
    Changes to wait for snapshot completion.
    (cherry picked from commit 2ecc27f6a81c6c9477e7aed2fd2979cfa322195a)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1491310
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 app/models/manageiq/providers/azure/cloud_manager.rb | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comment 19 Jerry Keselman 2017-09-19 11:39:49 UTC
We are continuing to work this issue.  It is not clear why the "needinfo" flag is set.
There is a resolution available for the unmanaged storage SSA that will require adding a PR for Fine only due to https://github.com/ManageIQ/manageiq-providers-azure/pull/115 not being back-ported to Fine.  We are in contact with Microsoft to try to work through the issues facing us in the field for the Managed Storage SSA issues which do not occur on our systems.

Comment 20 Jerry Keselman 2017-09-20 19:57:09 UTC
Current status:
1) Two PRs for azure-armrest - one is 
https://github.com/ManageIQ/azure-armrest/pull/308 which is merged, and one is 
https://github.com/ManageIQ/azure-armrest/pull/309 - which has been reworked by Dan Berger - I am testing the reworked version right now and it is successful.

2) Once the above is tested and merged Dan has to release the gem and we have to update the manageiq-providers-azure.gemspec to include the new gem version and
we have to put in a PR for that and back port it to Fine.

3) Two PRs need to be added which will essentially undo the code put in assuming that https://github.com/ManageIQ/manageiq-providers-azure/pull/115
was back ported to Fine (it wasn't). PRs https://github.com/ManageIQ/manageiq-providers-azure/pull/122  and https://github.com/ManageIQ/manageiq-smartstate/pull/26, which were part of the last HotFix made the incorrect assumption.  Code needs to be added to the provider repo and gems-pending repo to fix this and add it only for Fine.  This change is basically undoing a dereference of a ResourceGroup object to get the name since in Fine we use the String name value instead of the Object.    We can't use the same PR as above under 2) for the Gem version since that needs to go to upstream and then get back-ported to Fine.

4) We have encountered a timeout issue with the length of the jobs running that is causing the SmartProxyWorker to be killed while it is busy performing the scan.
There is code already in the MiqServer::ServerSmartProxy module that extends this timeout for OpenStack and SCVMM because of the length of those jobs.  I am
going to add in something for Azure based on this issue and test it.  I've been testing by commenting out the code that kills the Worker but obviously that's not the best option. So new PR for ManageIQ main repo that has to be back ported as well.

Currently the code on the lab appliance I have been given access to by David Joo is in (relatively good) working condition.  SSA works on non-managed disks.  SSA works on (most) managed disks.  We are seeing an issue on some (but not all) Windows instances where the root drive letter is inaccessible - we do not believe this is related to any of the Azure issues we are looking at.

Comment 21 Jerry Keselman 2017-09-20 19:57:58 UTC
Oh and as a final note we are shooting to be finished with the above by end of business on Friday EST, fingers-crossed.

Comment 23 Jerry Keselman 2017-09-22 03:37:32 UTC
https://github.com/ManageIQ/manageiq-smartstate/pull/29 has been added to increase the MiqDiskCache usage for AzureManagedDisk by doubling the number of Hash entries to 200 and quadrupling the size of the minimum read buffers to 512 blocks.  This has shown to speed the SSA in tests on the slow appliance in Australia exhibiting these problems.

A new PR against the main ManageIQ repo will be added to address the two timeout issues.  We may also need to either modify the default config for the SmartProxyWorker or document config changes when Azure Managed Disk SSA is performed.

Comment 24 Jerry Keselman 2017-09-22 13:07:10 UTC
https://github.com/ManageIQ/manageiq/pull/16016 has been added to address 3 different timeout values as well as SmartProxyWorker default size.  This addresses the second paragraph of Comment 23 above.

Comment 25 Jerry Keselman 2017-09-22 13:13:22 UTC
At this point we are simply awaiting PR reviews, merges, gem updates and distributions, etc.

Comment 26 Jerry Keselman 2017-09-22 15:02:52 UTC
The two PRs for 1) in Comment 20 above have been merged and pushed out as azure-armrest gem version 0.8.3.  
PR https://github.com/ManageIQ/manageiq-providers-azure/pull/131 has been added to allow the provider to use the new gem version.  This will need to be merged and back-ported as well.

Comment 27 CFME Bot 2017-09-22 19:43:19 UTC
New commit detected on ManageIQ/manageiq-gems-pending/fine:
https://github.com/ManageIQ/manageiq-gems-pending/commit/8d1985026be5c40cc4a8191a96b5010ea290f408

commit 8d1985026be5c40cc4a8191a96b5010ea290f408
Author:     Richard Oliveri <oliveri.richard.github@gmail.com>
AuthorDate: Fri Sep 22 11:42:44 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Fri Sep 22 15:42:31 2017 -0400

    Merge pull request #29 from jerryk55/increase_azure_managed_disk_caching
    
    Increase Disk Caching for Azure Managed Disks
    (cherry picked from commit 9e2ddac41dbb2c79d7c3541230dc71fd2b0271d6)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 lib/gems/pending/MiqVm/miq_azure_vm.rb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 28 CFME Bot 2017-09-22 19:51:18 UTC
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/8770490a135c6ee0c93872873f3a941f8cfa1198

commit 8770490a135c6ee0c93872873f3a941f8cfa1198
Author:     Richard Oliveri <oliveri.richard.github@gmail.com>
AuthorDate: Fri Sep 22 12:09:43 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Fri Sep 22 15:49:10 2017 -0400

    Merge pull request #16016 from jerryk55/increase_msg_timeout_for_azure_ssa
    
    Increase Timeouts and Worker Memory for Azure SSA
    (cherry picked from commit a3c9213171c1283eebe243c6355a334c4f21e085)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 app/models/job.rb                           | 3 +++
 app/models/miq_server/server_smart_proxy.rb | 3 +++
 config/settings.yml                         | 4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

Comment 29 CFME Bot 2017-09-22 19:53:20 UTC
New commit detected on ManageIQ/manageiq-providers-azure/fine:
https://github.com/ManageIQ/manageiq-providers-azure/commit/14a84745745a29b52a6e6bba5ff62feb346d2842

commit 14a84745745a29b52a6e6bba5ff62feb346d2842
Author:     Daniel Berger <djberg96@gmail.com>
AuthorDate: Fri Sep 22 10:59:20 2017 -0600
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Fri Sep 22 15:51:15 2017 -0400

    Merge pull request #131 from jerryk55/use_azure-armrest_0.8.3
    
    Use azure-armrest gem version 0.8.3
    (cherry picked from commit 2cde3c9291bf1e3e24b13a9dadc9652b31b343e3)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 manageiq-providers-azure.gemspec | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 30 CFME Bot 2017-09-22 19:58:23 UTC
New commit detected on ManageIQ/manageiq-gems-pending/fine:
https://github.com/ManageIQ/manageiq-gems-pending/commit/ab51254ea46e039ddf68e15171f7bdb0d18a93f7

commit ab51254ea46e039ddf68e15171f7bdb0d18a93f7
Author:     Daniel Berger <djberg96@gmail.com>
AuthorDate: Fri Sep 22 10:59:20 2017 -0600
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Fri Sep 22 15:53:12 2017 -0400

    Merge pull request #131 from jerryk55/use_azure-armrest_0.8.3
    
    Use azure-armrest gem version 0.8.3
    (cherry picked from commit 2cde3c9291bf1e3e24b13a9dadc9652b31b343e3)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967

 manageiq-gems-pending.gemspec | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 34 Jerry Keselman 2017-09-26 11:27:00 UTC
Run SSA on an Azure instance with Managed Disk.  It should succeed.

Comment 37 Satyajit Bulage 2017-10-03 18:04:53 UTC
Performed SSA on Azure Managed Disk instance which gathers all required information without any error.

Verified Version: 5.8.2.1.20170925150507_8770490

Comment 40 Jerry Keselman 2017-10-15 22:50:51 UTC
David,

A couple of things.  
1) It is our understanding that all Managed Disks created after June 9 of this year are encrypted, and that the encryption occurs under the covers - there should be nothing different about reading an encrypted disk and and an unencrypted disk.  See the following Microsoft site for more info - 
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/faq-for-disks#managed-disks-and-storage-service-encryption-sse.  We have only tested against Managed Disks created for our testing after that date.


2) There may be certain cases for specific Instances that are may not end up being parsed correctly due to other issues that have nothing to do with
whether they are Managed or not.  We have not yet been able to drill down to determine
the specific cases that occur but we are seeing certain Windows builds with this issue.
Again this has nothing to do with whether this is a managed disk or not.
  
3) Can you test against other instances and/or provide more feedback as to whether
this occurred for only one specific instance or an entire class / set of instances?

4) Yes, logs would be good.

5) Taeho - we support SSA on Encrypted Managed Disks.  See above link for more info.

Comment 41 Jerry Keselman 2017-10-16 12:31:23 UTC
I'm pasting info from David Joo from an offline email that explains why some of the customer's instances are failing to run SSA successfully.

"Just had a meeting with the customer regarding the Encryption.

Now I believe, I understand why SSA didn't work at the customer site.

Apparently, customer's OS images are encrypted with 2 things;

1. Azure's Storage Account Level encryption <- which came in July this year

2. File level encryption using Azure's encryption extension;
for Windows -> Bitlocker, Linux DM-crypt, and keys are available in Azure key vault.
https://docs.microsoft.com/en-us/azure/security/azure-security-disk-encryption

Almost all images will be encrypted with both layers of encryption."

David is correct - we do not support file level encryption.

Comment 46 Jerry Keselman 2017-10-18 21:13:35 UTC
Failure of Metrics and Refresh have nothing to do with Smartstate.
Since the SSA failure message is the same as the Metrics and Refresh errors I will also add that the SSA failure has nothing to do with enabling support for SSA on Azure Managed Disks VMs.  If there are issues like these a new BZ should be opened and assigned to the relevant team.

Comment 53 errata-xmlrpc 2017-10-24 00:41:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3005


Note You need to log in before you can comment on or make changes to this bug.