Bug 1415544 - Testing of Cloudforms 4.2.x appliance as guest on standard RHEL 7.3 w/KVM [NEEDINFO]
Summary: Testing of Cloudforms 4.2.x appliance as guest on standard RHEL 7.3 w/KVM
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: GA
: cfme-future
Assignee: Gregg Tanzillo
QA Contact: Dave Johnson
URL:
Whiteboard:
Depends On: 1414377 1420536 1420919 1421729
Blocks: 1305654
TreeView+ depends on / blocked
 
Reported: 2017-01-23 02:31 UTC by arkady kanevsky
Modified: 2017-06-08 14:04 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1420536 1421729 (view as bug list)
Environment:
Last Closed: 2017-06-08 14:04:27 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
arkady_kanevsky: needinfo? (dajohnso)


Attachments (Terms of Use)

Description arkady kanevsky 2017-01-23 02:31:39 UTC
Description of problem:
The request is to support CF as KVM VMs on top of SAH RHEL 7.3 node.
Part of Dell Red Hat joint openstack solution.
For min 10 node POC I expect that CF will be a single VM (of appropriate size) with Ceph block attached to it. (that is what we are testing now).
For larger offering CF will be split into multiple VMs on SAH. One VM for PostGres DB, and other VMs are clients of it that collected data from nodes under management (VMs in overcloud and infrastructure overcloud nodes). Again CF VMs sizes and # of VMs subject to RH CF team recommendations.


Version-Release number of selected component (if applicable):
CF 4.3

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 arkady kanevsky 2017-01-23 02:38:32 UTC
Sean,
let you drive it.

Comment 3 arkady kanevsky 2017-01-23 20:48:30 UTC
Thomas.
Adding you for visibility.

As I stated we need CF outside the openstack to manage it to avoid security issues that happens when you try to manage infrastructure of openstack from a user VM in openstack.

We will test it as part of JS but we need support for it just for our joint solution.

Comment 4 Dave Johnson 2017-01-26 23:58:05 UTC
During bug triage you mentioned you were going to take this one and determine severity/priority.

Comment 5 Sean Merrow 2017-02-03 21:05:13 UTC
Here is an update on where we are with regard to a support exception. This BZ can be used for any QE testing, etc.

Support Exception: https://tools.apps.cee.redhat.com/dashboard/#/support-exceptions/id/222
 
- The Cloudforms team has no intentions of supporting CF appliance on KVM as a long term offering.
- The Cloudforms PM team is on-board with accepting a request for support exception for the Jetstream 6.0.1 solution.
- However, the PM needs to get both the QE team and Support team on-board with the exception. Otherwise stated, the ball is rolling, but it still needs to weave its way through the process.
- PM insists that Red Hat QE must do initial testing for performance, scale and general smoke tests.
- Dell will own any further testing, including use-cases, etc.
- Colin Devine and I explained your timeline and that this is needed ASAP. They seemed okay with that, but expressed if any bugs are found during testing, it could cause a delay. 
- For the Jetstream 10.0 (formerly 7.0) release, the architecture will need to include a RHV environment to host the Cloudforms appliances and OSPd. The gap here is that currently, OSPd is only supported (according to our documentation) on RHEV 3.x. I am looking into whether or not it is supported on RHV 4.x. Assuming we can get support for RHV 4.x for OSPd, then it will serve as the common platform for both.
- The support exception will be for the Cloudforms 4.2 appliance. Cloudforms 4.5 is targeted for late April and if you decide to upgrade to that version, a separate support exception will be required.

Comment 6 michael_rasoulian 2017-02-03 21:54:07 UTC
We will need the support exception for CloudForms version 4.1.  That is the version we have documented our process with and which we are currently in the process of validating.

Comment 7 Manisha Tripathy 2017-02-08 18:00:11 UTC
We have come up with a test plan for CloudForms4.1 and it mainly includes the following test cases:

Attach ceph volume on SAH node.
Launch CloudForms VM with 4.1 image and test network connectivity.
Setup CF DB with the ceph volume.
Attach infrastructure provider (undercloud).
Attach cloud provider (overcloud).
Turn on Capacity & Utilization. Note DB size initially and daily after this point.
Launch a new VM from CF.
Launch a new VM in OpenStack Horizon/CLI and ensure it gets populated in CF.
Access a VM console from within CF.
From within CF, create a new cloud volume and attach to an instance.
Add VMs and monitor CF performance and DB growth.

Curenntly testing is in progress and we are monitoring DB growth for 20vms.

Comment 8 arkady kanevsky 2017-02-08 18:05:45 UTC
Let's update CF version to 4.2

If time permits lets have a script that creates more VMs thru CF and OpenStack for load testing, and let it rung for a few days.
Maybe do some small busy work in launched VMs.

Dave,
what kind of tools do you have for CF testing, especially load testing.

Comment 9 Manisha Tripathy 2017-02-13 18:25:28 UTC
We tested CloudForms 4.1 according to test plan in Comment7 and monitored performance for 3 days. We launched 30 vms- 10 from cloudforms and others from Openstack horizon and generated reports.

Host CPU usage per vm Report 

Host Name	Activity Sample - Timestamp (Day/Time)	Asset Name	CPU - Usage Rate for Collected Intervals (%)	CPU - Usage Rate for Collected Intervals (MHz)	CPU - Total Available - from VM Analysis (MHz)
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	cirros_new-6	2.0%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/11/17 00:00:00 UTC	cirros_new-9	1.0%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	cirros_new-9	1.0%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/11/17 00:00:00 UTC	cirros_test_vm_cf	1.0%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	cirros_test_vm_cf	1.0%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/11/17 00:00:00 UTC	oss-dell-infra.manishaexample.com	0.2%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	oss-dell-infra.manishaexample.com	0.2%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/11/17 00:00:00 UTC	oss-dell-openshift-node-qq6620vz.manishaexample.com	2.4%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	oss-dell-openshift-node-qq6620vz.manishaexample.com	2.4%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/11/17 00:00:00 UTC	test_cf_vm-2	0.9%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	test_cf_vm-2	1.0%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/11/17 00:00:00 UTC	test_rhel_vms-1	0.8%	0 MHz	0 MHz
9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute)	02/12/17 00:00:00 UTC	test_rhel_vms-1	1.1%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/12/17 00:00:00 UTC	oss-dell-openshift-master-0.manishaexample.com	6.9%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/12/17 00:00:00 UTC	oss-dell-openshift-master-1.manishaexample.com	7.8%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/11/17 00:00:00 UTC	oss-dell-openshift-node-t58i64ih.manishaexample.com	2.3%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/11/17 00:00:00 UTC	test_cf_vm-3	1.0%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/12/17 00:00:00 UTC	test_cf_vm-3	1.6%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/11/17 00:00:00 UTC	test_cirros_1	2.1%	0 MHz	0 MHz
b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute)	02/11/17 00:00:00 UTC	test_cirros_1	2.1%	0 MHz	0 MHz
c103101b-ddde-428a-a229-3050425336cf (NovaCompute)	02/11/17 00:00:00 UTC	cirros_new-2	1.2%	0 MHz	0 MHz
c103101b-ddde-428a-a229-3050425336cf (NovaCompute)	02/12/17 00:00:00 UTC	cirros_new-2	1.3%	0 MHz	0 MHz
c103101b-ddde-428a-a229-3050425336cf (NovaCompute)	02/11/17 00:00:00 UTC	cirros_new-5	1.0%	0 MHz	0 MHz
c103101b-ddde-428a-a229-3050425336cf (NovaCompute)	02/12/17 00:00:00 UTC	cirros_new-5	1.0%	0 MHz	0 MHz
c103101b-ddde-428a-a229-3050425336cf (NovaCompute)	02/11/17 00:00:00 UTC	cirros_new-8	1.0%	0 MHz	0 MHz
c103101b-ddde-428a-a229-3050425336cf (NovaCompute)	02/12/17 00:00:00 UTC	cirros_new-8	1.0%	0 MHz	0 MHz

Top Memory Consumers(weekly) Report
Asset Name Cluster Name Host Name Memory - Used for Collected Intervals (MB) (Avg)
test_rhel_vms-2 overcloud-Compute- f76siwdsnrs6 c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 3.8 GB
test_rhel_vms-1 overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 3.7 GB
RH72-MIKE overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 3.4 GB
oss-dell-openshiftmaster-1.manishaexample.com
overcloud-Compute- f76siwdsnrs6 b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 3.1 GB
oss-dell-openshiftmaster-0.manishaexample.com
overcloud-Compute- f76siwdsnrs6 b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 3.1 GB
oss-dell-openshift-nodet58i64ih.manishaexample.com
overcloud-Compute- f76siwdsnrs6 b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 2.8 GB
oss-dell-openshift-nodeqq6620vz.manishaexample.com
overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 2.6 GB
oss-dell-openshift-nodeb0j38jf4.manishaexample.com
overcloud-Compute- f76siwdsnrs6 c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 2.3 GB
oss-dell-infra.manishaexample.com overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 1.9 GB

Comment 10 arkady kanevsky 2017-02-14 02:35:13 UTC
Changed CF version in the title for CF-4.1
CF-4.2 does not work with default OSP9 configuration that is what is used by DellEMC & Red Hat joint solution JS-6.x.

Comment 11 Manisha Tripathy 2017-02-14 19:47:40 UTC
evm.log for CF4.1 is listed below.

[----] I, [2017-02-14T03:35:29.399264 #14475:f8398c]  INFO -- : <AutomationEngine> MiqAeEvent.build_evm_event >> event=<"containergroup_failedsync"> inputs=<{:ext_management_systems=>#<ManageIQ::Pr
oviders::OpenshiftEnterprise::ContainerManager id: 1000000000006, name: "OssDellOpenshift", created_on: "2017-02-10 22:12:10", updated_on: "2017-02-14 08:35:20", guid: "f7de9aa6-efdd-11e6-b77a-5254
000a7981", zone_id: 1000000000001, type: "ManageIQ::Providers::OpenshiftEnterprise::Containe...", api_version: nil, uid_ems: nil, host_default_vnc_port_start: nil, host_default_vnc_port_end: nil, p
rovider_region: nil, last_refresh_error: nil, last_refresh_date: "2017-02-14 08:35:20", provider_id: nil, realm: nil, tenant_id: 1000000000001, project: nil, parent_ems_id: nil, subscription: nil>,
 :ems_event=>#<EmsEvent id: 1000000046946, event_type: "POD_FAILEDSYNC", message: "Error syncing pod, skipping: Disk \"8105e753-62d2-4...", timestamp: "2017-02-14 08:35:16", host_name: nil, host_id
: nil, vm_name: nil, vm_location: nil, vm_or_template_id: nil, dest_host_name: nil, dest_host_id: nil, dest_vm_name: nil, dest_vm_location: nil, dest_vm_or_template_id: nil, source: "KUBERNETES", c
hain_id: nil, ems_id: 1000000000006, is_task: nil, full_data: {:timestamp=>"2017-02-14T08:35:16Z", :kind=>"Pod", :name=>"docker-registry-2-er3ju", :namespace=>"default", :reason=>"FailedSync", :mes
sage=>"Error syncing pod, skipping: Disk \"8105e753-62d2-4004-b6fb-4c793c451fe8\" is attached to a different compute: \"37e88826-2b9a-4b72-854c-8c647a70fdf9\", should be detached before proceeding"
, :uid=>"b044e167-ee60-11e6-a9b7-fa163e3df321", :container_group_name=>"docker-registry-2-er3ju", :container_namespace=>"default", :event_type=>"POD_FAILEDSYNC"}, created_on: "2017-02-14 08:35:23",
 username: nil, ems_cluster_id: nil, ems_cluster_name: nil, ems_cluster_uid: nil, dest_ems_cluster_id: nil, dest_ems_cluster_name: nil, dest_ems_cluster_uid: nil, availability_zone_id: nil, contain
er_node_id: nil, container_node_name: nil, container_group_id: 1000000000001, container_group_name: "docker-registry-2-er3ju", container_namespace: "default", type: "EmsEvent", target_type: nil, ta
rget_id: nil, container_id: nil, container_name: nil, container_replicator_id: nil, container_replicator_name: nil, middleware_server_id: nil, middleware_server_name: nil, middleware_deployment_id:
 nil, middleware_deployment_name: nil>, "MiqEvent::miq_event"=>1000000046947, :miq_event_id=>1000000046947, "EventStream::event_stream"=>1000000046947, :event_stream_id=>1000000046947}>
[----] I, [2017-02-14T03:35:29.430791 #14475:f8398c]  INFO -- : MIQ(MiqQueue.put) Message id: [1000000257427],  id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [
], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Kubernetes::C
ontainerManager::ContainerGroup", :object_id=>1000000000001, :attrs=>{:event_type=>"containergroup_failedsync", "ExtManagementSystem::ext_management_system"=>1000000000006, :ext_management_system_i
d=>1000000000006, "EventStream::event_stream"=>1000000046947, :event_stream_id=>1000000046947, "MiqEvent::miq_event"=>1000000046947, :miq_event_id=>1000000046947}, :instance_name=>"Event", :user_id
=>1000000000001, :miq_group_id=>1000000000001, :tenant_id=>1000000000001, :automate_message=>nil}]
[----] I, [2017-02-14T03:35:29.431264 #14475:f8398c]  INFO -- : <AutomationEngine> Followed  Relationship [miqaedb:/System/event_handlers/event_action_policy?target=container_group&policy_event=con
tainergroup_failedsync&param=#create]
[----] I, [2017-02-14T03:35:29.431853 #14475:f8398c]  INFO -- : <AutomationEngine> Followed  Relationship [miqaedb:/System/Event/EmsEvent/KUBERNETES/POD_FAILEDSYNC#create]
[----] I, [2017-02-14T03:35:29.432868 #14475:f8398c]  INFO -- : MIQ(MiqQueue#delivered) Message id: [1000000257425], State: [ok], Delivered in [0.438147292] seconds
[----] I, [2017-02-14T03:35:29.622084 #6340:f8398c]  INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [1000000257423], MiqWorker id: [1000000000192], Zone: [default], Role: [
], Server: [06b01a7e-eede-11e6-9c5f-5254000a7981], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Session.check_session_timeout], Timeout: [600], Priority: [90], State: [d
equeue], Deliver On: [], Data: [], Args: [], Dequeued in: [6.59117433] seconds
[----] I, [2017-02-14T03:35:29.627029 #6340:f8398c]  INFO -- : MIQ(MiqQueue#deliver) Message id: [1000000257423], Delivering...
[----] I, [2017-02-14T03:35:29.641783 #6340:f8398c]  INFO -- : MIQ(MiqQueue#delivered) Message id: [1000000257423], State: [ok], Delivered in [0.019238571] seconds
                                                                            

Test results of  chargeback in CloudForms 4.1 are included below. 


Date Range	VM Name	CPU Total Cost	Storage Allocated Cost	Storage Allocated	Memory Allocated over Time Period	CPU Total

02/13/2017	cirros_test_vm_cf	$24.00	$24.00	40 GB	4 GB	2 MHz
02/13/2017	cirros_test_vms-1	$24.00	$24.00	40 GB	4 GB	2 MHz
02/13/2017	cirros_test_vms-2	$24.00	$24.00	40 GB	4 GB	2 MHz
02/13/2017	ospdell-infra.manishaexample.com	$24.00	$24.00	40 GB	4 GB	2 MHz
02/13/2017	oss-dell-infra.manishaexample.com	$24.00	$24.00	65 GB	4 GB	2 MHz
02/13/2017	oss-dell-openshift-master-0.manishaexample.com	$24.00	$24.00	65 GB	4 GB	2 MHz
02/13/2017	oss-dell-openshift-master-1.manishaexample.com	$24.00	$24.00
	75 GB	4 GB	2 MHz
02/13/2017	test_cf_vm-3	$24.00	$24.00	40 GB	4 GB	2 MHz
02/13/2017	test_cirros_1	$24.00	$24.00	45 GB	2 GB	1 MHz
02/13/2017	test_rhel_vm-1	$24.00	$24.00	80 GB	8 GB	4 MHz

Please let us know if any specific logs/reports are required in addition to this  and Comment9


Note You need to log in before you can comment on or make changes to this bug.