Bug 1267136 - [Docs] [Deployment] Incorporate content from Red Hat IT using RHEV
[Docs] [Deployment] Incorporate content from Red Hat IT using RHEV
Status: CLOSED DUPLICATE of bug 1271437
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: Documentation (Show other bugs)
3.6.0
Unspecified Unspecified
medium Severity medium
: ovirt-3.6.5
: 3.6.1
Assigned To: Julie
:
Depends On:
Blocks: 1271437
  Show dependency treegraph
 
Reported: 2015-09-29 02:18 EDT by Julie
Modified: 2016-02-09 01:36 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-09 01:36:31 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Docs
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Julie 2015-09-29 02:18:28 EDT
Review the content incorporate suitable content to the deployment and planning guide.

Discussions from the email thread:  

There's no official update at this point, but below is an unofficial update.


While some of the numbers have changed, I don't think many of the
lessons have really changed.

These days we're at ~130 Hypervisors and ~3000 VMs in our RHEV
infrastructure.  The hypervisors are larger than before and RHEV 3 is
massively more efficient than RHEV 2.  (Brian also improved the
Memory/CPU balance).

Most of our non-virtual capacity is for
* Memory R/W intensive applications
  - Things like big databases
  - Many *can* be virtualized, but they make cluster maintenance harder
* Services needed to cold-start the DC
  - What ever you need to be able to start and log into RHEV.


Between Zoli and I these would be our main bullet points:

* We still prefer RHEL + packages over the RHEV-H appliance
  - Easier to hook into Configuration Management
* A ratio of 1:32  CPU:RAM is working out about right for us
  - RAM is cheap and was initially our big bottle-neck
* We're currently targeting between 50 and 70% utilization, CPU & Memory
  - But expecting to be under the 50% for a while after new purchases
  - Much higher than 80% and you start seeing contention during peaks.
* Try to avoid having too many disks on the same storage domain
  - under 100 is where we currently target
  - Various disk/VM operations become slow past this point
* Dedicated NIC for management
  - Generally low traffic but if flooded it can result in fencing
* Dedicated NIC for migration
  - Avoids flooding the other NICs
  - The faster you can migrate VMs, the faster you finish maintenance
* Dedicated NIC for storage
* Tag the VLANs to VM data NIC(s)
  - Most VLANs don't generate enough traffic to warrant a dedicated NIC
* split off the heavy traffic VLANs, but still tag them
  - RHEV defines tagged/untagged at the DC level so switching is complex
* JumboFrames is a good thing
* Have 2 separate Production RHEV instances
  - Use separate storage controllers, core switches, etc
  - Occasionally you do trigger bugs in prod that you didn't find before
  - Can avoid the need to cold-boot your DC services after major
    failures
* Gather Data about how your cluster's doing
  - Nobody likes RCAs, even less so without data
* VM Migrations are essential for RHEV maintenance
  - If you can't evacuate your Hypervisors you can't perform maintenance
  - Some applications are very memory read/write intensive and make the
    VM difficult to migrate
  - VMs with 32+GB of RAM is our rule of thumb for closer inspection
  - speak with application owners/architects if the load can be split
    into a higher number of smaller VMs
  - VM migration under load should be added to test plans for any
    application being moved into RHEV


And from the "Parting Tips" I'd reiterate:
* Available memory
  - If you do mess up capacity planning CPU contention is far less
    painful than OOM killer (although harder to spot)
* Plan for Growth
  - especially if purchasing is slow
* Setting Quotas
  - makes it possible to explain your costs and demonstrate the value
    brought by those costs
* Network Speed
  - as RHEV grows RHEV-H starts acting like an access layer switch
* Keep up with RHEV
  - 3.5 was a big improvement in the UI
  - I personally keep seeing batches of my RFEs closed out with each
    new release.
* Use Red Hat Support
  - the SBRs are fantastic
  - File RFEs, they really do get attention.
Comment 4 Julie 2016-02-09 01:36:31 EST

*** This bug has been marked as a duplicate of bug 1271437 ***

Note You need to log in before you can comment on or make changes to this bug.