Description of problem: The request is to support CF as KVM VMs on top of SAH RHEL 7.3 node. Part of Dell Red Hat joint openstack solution. For min 10 node POC I expect that CF will be a single VM (of appropriate size) with Ceph block attached to it. (that is what we are testing now). For larger offering CF will be split into multiple VMs on SAH. One VM for PostGres DB, and other VMs are clients of it that collected data from nodes under management (VMs in overcloud and infrastructure overcloud nodes). Again CF VMs sizes and # of VMs subject to RH CF team recommendations. Version-Release number of selected component (if applicable): CF 4.3 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Sean, let you drive it.
Thomas. Adding you for visibility. As I stated we need CF outside the openstack to manage it to avoid security issues that happens when you try to manage infrastructure of openstack from a user VM in openstack. We will test it as part of JS but we need support for it just for our joint solution.
During bug triage you mentioned you were going to take this one and determine severity/priority.
Here is an update on where we are with regard to a support exception. This BZ can be used for any QE testing, etc. Support Exception: https://tools.apps.cee.redhat.com/dashboard/#/support-exceptions/id/222 - The Cloudforms team has no intentions of supporting CF appliance on KVM as a long term offering. - The Cloudforms PM team is on-board with accepting a request for support exception for the Jetstream 6.0.1 solution. - However, the PM needs to get both the QE team and Support team on-board with the exception. Otherwise stated, the ball is rolling, but it still needs to weave its way through the process. - PM insists that Red Hat QE must do initial testing for performance, scale and general smoke tests. - Dell will own any further testing, including use-cases, etc. - Colin Devine and I explained your timeline and that this is needed ASAP. They seemed okay with that, but expressed if any bugs are found during testing, it could cause a delay. - For the Jetstream 10.0 (formerly 7.0) release, the architecture will need to include a RHV environment to host the Cloudforms appliances and OSPd. The gap here is that currently, OSPd is only supported (according to our documentation) on RHEV 3.x. I am looking into whether or not it is supported on RHV 4.x. Assuming we can get support for RHV 4.x for OSPd, then it will serve as the common platform for both. - The support exception will be for the Cloudforms 4.2 appliance. Cloudforms 4.5 is targeted for late April and if you decide to upgrade to that version, a separate support exception will be required.
We will need the support exception for CloudForms version 4.1. That is the version we have documented our process with and which we are currently in the process of validating.
We have come up with a test plan for CloudForms4.1 and it mainly includes the following test cases: Attach ceph volume on SAH node. Launch CloudForms VM with 4.1 image and test network connectivity. Setup CF DB with the ceph volume. Attach infrastructure provider (undercloud). Attach cloud provider (overcloud). Turn on Capacity & Utilization. Note DB size initially and daily after this point. Launch a new VM from CF. Launch a new VM in OpenStack Horizon/CLI and ensure it gets populated in CF. Access a VM console from within CF. From within CF, create a new cloud volume and attach to an instance. Add VMs and monitor CF performance and DB growth. Curenntly testing is in progress and we are monitoring DB growth for 20vms.
Let's update CF version to 4.2 If time permits lets have a script that creates more VMs thru CF and OpenStack for load testing, and let it rung for a few days. Maybe do some small busy work in launched VMs. Dave, what kind of tools do you have for CF testing, especially load testing.
We tested CloudForms 4.1 according to test plan in Comment7 and monitored performance for 3 days. We launched 30 vms- 10 from cloudforms and others from Openstack horizon and generated reports. Host CPU usage per vm Report Host Name Activity Sample - Timestamp (Day/Time) Asset Name CPU - Usage Rate for Collected Intervals (%) CPU - Usage Rate for Collected Intervals (MHz) CPU - Total Available - from VM Analysis (MHz) 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC cirros_new-6 2.0% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/11/17 00:00:00 UTC cirros_new-9 1.0% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC cirros_new-9 1.0% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/11/17 00:00:00 UTC cirros_test_vm_cf 1.0% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC cirros_test_vm_cf 1.0% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/11/17 00:00:00 UTC oss-dell-infra.manishaexample.com 0.2% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC oss-dell-infra.manishaexample.com 0.2% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/11/17 00:00:00 UTC oss-dell-openshift-node-qq6620vz.manishaexample.com 2.4% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC oss-dell-openshift-node-qq6620vz.manishaexample.com 2.4% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/11/17 00:00:00 UTC test_cf_vm-2 0.9% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC test_cf_vm-2 1.0% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/11/17 00:00:00 UTC test_rhel_vms-1 0.8% 0 MHz 0 MHz 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 02/12/17 00:00:00 UTC test_rhel_vms-1 1.1% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/12/17 00:00:00 UTC oss-dell-openshift-master-0.manishaexample.com 6.9% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/12/17 00:00:00 UTC oss-dell-openshift-master-1.manishaexample.com 7.8% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/11/17 00:00:00 UTC oss-dell-openshift-node-t58i64ih.manishaexample.com 2.3% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/11/17 00:00:00 UTC test_cf_vm-3 1.0% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/12/17 00:00:00 UTC test_cf_vm-3 1.6% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/11/17 00:00:00 UTC test_cirros_1 2.1% 0 MHz 0 MHz b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 02/11/17 00:00:00 UTC test_cirros_1 2.1% 0 MHz 0 MHz c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 02/11/17 00:00:00 UTC cirros_new-2 1.2% 0 MHz 0 MHz c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 02/12/17 00:00:00 UTC cirros_new-2 1.3% 0 MHz 0 MHz c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 02/11/17 00:00:00 UTC cirros_new-5 1.0% 0 MHz 0 MHz c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 02/12/17 00:00:00 UTC cirros_new-5 1.0% 0 MHz 0 MHz c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 02/11/17 00:00:00 UTC cirros_new-8 1.0% 0 MHz 0 MHz c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 02/12/17 00:00:00 UTC cirros_new-8 1.0% 0 MHz 0 MHz Top Memory Consumers(weekly) Report Asset Name Cluster Name Host Name Memory - Used for Collected Intervals (MB) (Avg) test_rhel_vms-2 overcloud-Compute- f76siwdsnrs6 c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 3.8 GB test_rhel_vms-1 overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 3.7 GB RH72-MIKE overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 3.4 GB oss-dell-openshiftmaster-1.manishaexample.com overcloud-Compute- f76siwdsnrs6 b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 3.1 GB oss-dell-openshiftmaster-0.manishaexample.com overcloud-Compute- f76siwdsnrs6 b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 3.1 GB oss-dell-openshift-nodet58i64ih.manishaexample.com overcloud-Compute- f76siwdsnrs6 b25d9263-76e1-4a54-8d24-1eb9cbc9e7dc (NovaCompute) 2.8 GB oss-dell-openshift-nodeqq6620vz.manishaexample.com overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 2.6 GB oss-dell-openshift-nodeb0j38jf4.manishaexample.com overcloud-Compute- f76siwdsnrs6 c103101b-ddde-428a-a229-3050425336cf (NovaCompute) 2.3 GB oss-dell-infra.manishaexample.com overcloud-Compute- f76siwdsnrs6 9851c3aa-a197-450e-a0b9-c57c6555da1d (NovaCompute) 1.9 GB
Changed CF version in the title for CF-4.1 CF-4.2 does not work with default OSP9 configuration that is what is used by DellEMC & Red Hat joint solution JS-6.x.
evm.log for CF4.1 is listed below. [----] I, [2017-02-14T03:35:29.399264 #14475:f8398c] INFO -- : <AutomationEngine> MiqAeEvent.build_evm_event >> event=<"containergroup_failedsync"> inputs=<{:ext_management_systems=>#<ManageIQ::Pr oviders::OpenshiftEnterprise::ContainerManager id: 1000000000006, name: "OssDellOpenshift", created_on: "2017-02-10 22:12:10", updated_on: "2017-02-14 08:35:20", guid: "f7de9aa6-efdd-11e6-b77a-5254 000a7981", zone_id: 1000000000001, type: "ManageIQ::Providers::OpenshiftEnterprise::Containe...", api_version: nil, uid_ems: nil, host_default_vnc_port_start: nil, host_default_vnc_port_end: nil, p rovider_region: nil, last_refresh_error: nil, last_refresh_date: "2017-02-14 08:35:20", provider_id: nil, realm: nil, tenant_id: 1000000000001, project: nil, parent_ems_id: nil, subscription: nil>, :ems_event=>#<EmsEvent id: 1000000046946, event_type: "POD_FAILEDSYNC", message: "Error syncing pod, skipping: Disk \"8105e753-62d2-4...", timestamp: "2017-02-14 08:35:16", host_name: nil, host_id : nil, vm_name: nil, vm_location: nil, vm_or_template_id: nil, dest_host_name: nil, dest_host_id: nil, dest_vm_name: nil, dest_vm_location: nil, dest_vm_or_template_id: nil, source: "KUBERNETES", c hain_id: nil, ems_id: 1000000000006, is_task: nil, full_data: {:timestamp=>"2017-02-14T08:35:16Z", :kind=>"Pod", :name=>"docker-registry-2-er3ju", :namespace=>"default", :reason=>"FailedSync", :mes sage=>"Error syncing pod, skipping: Disk \"8105e753-62d2-4004-b6fb-4c793c451fe8\" is attached to a different compute: \"37e88826-2b9a-4b72-854c-8c647a70fdf9\", should be detached before proceeding" , :uid=>"b044e167-ee60-11e6-a9b7-fa163e3df321", :container_group_name=>"docker-registry-2-er3ju", :container_namespace=>"default", :event_type=>"POD_FAILEDSYNC"}, created_on: "2017-02-14 08:35:23", username: nil, ems_cluster_id: nil, ems_cluster_name: nil, ems_cluster_uid: nil, dest_ems_cluster_id: nil, dest_ems_cluster_name: nil, dest_ems_cluster_uid: nil, availability_zone_id: nil, contain er_node_id: nil, container_node_name: nil, container_group_id: 1000000000001, container_group_name: "docker-registry-2-er3ju", container_namespace: "default", type: "EmsEvent", target_type: nil, ta rget_id: nil, container_id: nil, container_name: nil, container_replicator_id: nil, container_replicator_name: nil, middleware_server_id: nil, middleware_server_name: nil, middleware_deployment_id: nil, middleware_deployment_name: nil>, "MiqEvent::miq_event"=>1000000046947, :miq_event_id=>1000000046947, "EventStream::event_stream"=>1000000046947, :event_stream_id=>1000000046947}> [----] I, [2017-02-14T03:35:29.430791 #14475:f8398c] INFO -- : MIQ(MiqQueue.put) Message id: [1000000257427], id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [ ], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Kubernetes::C ontainerManager::ContainerGroup", :object_id=>1000000000001, :attrs=>{:event_type=>"containergroup_failedsync", "ExtManagementSystem::ext_management_system"=>1000000000006, :ext_management_system_i d=>1000000000006, "EventStream::event_stream"=>1000000046947, :event_stream_id=>1000000046947, "MiqEvent::miq_event"=>1000000046947, :miq_event_id=>1000000046947}, :instance_name=>"Event", :user_id =>1000000000001, :miq_group_id=>1000000000001, :tenant_id=>1000000000001, :automate_message=>nil}] [----] I, [2017-02-14T03:35:29.431264 #14475:f8398c] INFO -- : <AutomationEngine> Followed Relationship [miqaedb:/System/event_handlers/event_action_policy?target=container_group&policy_event=con tainergroup_failedsync¶m=#create] [----] I, [2017-02-14T03:35:29.431853 #14475:f8398c] INFO -- : <AutomationEngine> Followed Relationship [miqaedb:/System/Event/EmsEvent/KUBERNETES/POD_FAILEDSYNC#create] [----] I, [2017-02-14T03:35:29.432868 #14475:f8398c] INFO -- : MIQ(MiqQueue#delivered) Message id: [1000000257425], State: [ok], Delivered in [0.438147292] seconds [----] I, [2017-02-14T03:35:29.622084 #6340:f8398c] INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [1000000257423], MiqWorker id: [1000000000192], Zone: [default], Role: [ ], Server: [06b01a7e-eede-11e6-9c5f-5254000a7981], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Session.check_session_timeout], Timeout: [600], Priority: [90], State: [d equeue], Deliver On: [], Data: [], Args: [], Dequeued in: [6.59117433] seconds [----] I, [2017-02-14T03:35:29.627029 #6340:f8398c] INFO -- : MIQ(MiqQueue#deliver) Message id: [1000000257423], Delivering... [----] I, [2017-02-14T03:35:29.641783 #6340:f8398c] INFO -- : MIQ(MiqQueue#delivered) Message id: [1000000257423], State: [ok], Delivered in [0.019238571] seconds Test results of chargeback in CloudForms 4.1 are included below. Date Range VM Name CPU Total Cost Storage Allocated Cost Storage Allocated Memory Allocated over Time Period CPU Total 02/13/2017 cirros_test_vm_cf $24.00 $24.00 40 GB 4 GB 2 MHz 02/13/2017 cirros_test_vms-1 $24.00 $24.00 40 GB 4 GB 2 MHz 02/13/2017 cirros_test_vms-2 $24.00 $24.00 40 GB 4 GB 2 MHz 02/13/2017 ospdell-infra.manishaexample.com $24.00 $24.00 40 GB 4 GB 2 MHz 02/13/2017 oss-dell-infra.manishaexample.com $24.00 $24.00 65 GB 4 GB 2 MHz 02/13/2017 oss-dell-openshift-master-0.manishaexample.com $24.00 $24.00 65 GB 4 GB 2 MHz 02/13/2017 oss-dell-openshift-master-1.manishaexample.com $24.00 $24.00 75 GB 4 GB 2 MHz 02/13/2017 test_cf_vm-3 $24.00 $24.00 40 GB 4 GB 2 MHz 02/13/2017 test_cirros_1 $24.00 $24.00 45 GB 2 GB 1 MHz 02/13/2017 test_rhel_vm-1 $24.00 $24.00 80 GB 8 GB 4 MHz Please let us know if any specific logs/reports are required in addition to this and Comment9
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days