Bug 506370

Summary: dom0 should get a higher priority than domU's by default
Product: Red Hat Enterprise Linux 5 Reporter: Bill McGonigle <bill-bugzilla.redhat.com>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: riel, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-16 22:21:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bill McGonigle 2009-06-16 22:09:45 UTC
Description of problem:

  dom0 can get starved of resources making domU's fail to function

Version-Release number of selected component (if applicable):

  xen-3.0.3-80.el5_3.2
  kernel-xen-2.6.18-128.1.10.el5

How reproducible:

  always

Steps to Reproduce:

1. in dom0, attach to iSCSI 'disks' provided by a domU
2. use these disks in another domU under heavy load

  gory details here:
   https://www.redhat.com/archives/rhelv5-list/2009-June/msg00044.html
   http://lists.centos.org/pipermail/centos-virt/2009-June/001021.html 
   
Actual results:

  iSCSI dies, all guests stop responding (regardless of iSCSI involvement or not) until iSCSI is killed.
  
Expected results:

  nothing falls over

Additional info:

  this is apparently due to dom0 being starved of resources.  Luke on the centos-virt list had seen this before and recommended this setting:

    xm sched-credit -d 0 60000

  which solves the problem. 60000 may be overkill, but due to dom0's unique roll, it should be able to preempt domU's to avoid these kinds of deadlocks.  His wise aphorism: "if the dom0 is unhappy, everyone is unhappy."

  There may be situations where one wants dom0 to be starved in the scheduler, but that's not a good default.

Comment 1 Rik van Riel 2009-06-16 22:21:44 UTC
In fact, there are situations where you want dom0 and the guests to have about equal priority.

One case is where dom0 simply forwards network packets into the guests.  If the system gets a lot of network traffic, dom0 will stay busy and the guests do not get to take network packets off their queues, leading to excessive packet loss.

Giving dom0 and the guests about equal priority solves this issue.

I have to agree with you that the current default is not ideal for some situations, but changing it will cause regressions for other workloads.

I do not believe that changing the default at this time in the RHEL 5 release cycle would be a good idea.