Description of problem: The number of Xen domains that can be started is determined in part by the number of available dynamic IRQs and the number of IRQs used by each guest. This is limited by the compile time constant NR_DYNIRQS: #define NR_DYNIRQS 256 When this number is exceeded, find_unbound_irq() will fail and panic the system: +static int find_unbound_irq(void) +{ + int irq; + + /* Only allocate from dynirq range */ + for (irq = DYNIRQ_BASE; irq < NR_IRQS; irq++) + if (irq_bindcount[irq] == 0) + break; + + if (irq == NR_IRQS) + panic("No available IRQ to bind to: increase NR_IRQS!n"); + + return irq; +} With typical guests needing a minimum of two interrupts this places an upper bound on the number of guests that can be created. Version-Release number of selected component (if applicable): 2.6.18-86.el5xen How reproducible: 100% Steps to Reproduce: 1. Boot a xen dom0 2. Configure a large number of guests 3. Start booting guests one at a time Actual results: Eventually (assuming sufficient memory / I/O resources are available) the Dom0 guest will panic: Kernel panic - not syncing: No available IRQ to bind to: increase NR_IRQS! (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. Expected results: No panic. Additional info:
Upstream discussion: http://lists.xensource.com/archives/html/xen-devel/2006-12/msg00311.html
OK. I briefly took a look at this. Upstream xen has since changed it so that if you run out of IRQs, you don't panic; this was put into xen-unstable c/s 12790. I think we should definitely take that patch. In the thread mentioned in Comment #2, Keir said that it would be nice to allocate IRQs dynamically, make it a config option, or have a boot option that you could pass to increase the number. I think allocating the IRQs dynamically is going to be a non starter, since it would likely require changes to the IRQ code that Xen shares with the bare metal kernel. So that leaves us with 2 options: 1. Have a boot time option that allows you to increase the number of IRQs at boot time. 2. Just increase NR_DYNIRQS I like option 2, since it is a better user experience, but we can consider 1 as well. Chris Lalancette
adding to RHEL5.2 release notes updates: <quote> * dom0 has a system-wide IRQ (interrupt request line) limit of 256, which is consumed as follows: o 3 per physical CPU. o 1 per guest device (i.e. NIC or block device) When the IRQ limit is reached, the system will crash. As such, check your IRQ consumption to make sure that the number of guests you create (and their respective block devices) do not exhaust the IRQ limit. </quote> please advise if any further revisions are required. thanks!
----- Additional Comments From krister.com 2008-04-28 11:09 EDT ------- Should there be something added to let the user know how to check their used Dynamic IRQs? I worry that without this the user might not know how to determine the number of Dynamic IRQs they have used. I ran this in dom0 on a blade with 2 guests running: [root@host ~]# grep Dynamic-irq /proc/interrupts | wc -l 30 This event sent from IssueTracker by jkachuck issue 173656
thanks, appending to note: <quote> To determine how many IRQs you are currently consuming, run the command grep Dynamic-irq /proc/interrupts | wc -l. </quote> please advise if any further revisions are required. thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. This Release Note is currently located in the Known Issues section.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team.
I have a test image that has the fix for the panic as well as an increase in the number of IRQs (256 more). Unfortunately the increase break the kernel abi and some further work is needed to see if that could be overcome. While this is being looked at could you please test this kernel to see if it solves the crash issue and what number of guests can be started with this change? The image is people.redhat.com/bburns/kernel-xen-2.6.18-103.el5IRQFIX.x86_64.rpm Thanks.
in kernel-2.6.18-113.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Putting this back to assigned. Testing of the pre-beta kernels has shown that the fix for the crash when exhausting the IRQs was not effective. It added the error path logic but there was a flaw in the implementation using unsigned variables and comparing them for < 0. Upstream has fixed and it's a small incremental change to incorporate the fix.
Created attachment 321335 [details] Posted patch. Patch to fix checking for negative IRQ return values.
in kernel-2.6.18-121.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,9 +1,2 @@ (all architectures) -dom0 has a system-wide IRQ (interrupt request line) limit of 256, which is consumed as follows: +When the Dynamic IRQs available for guests virtual machines were exhausted the domain 0 kernel would crash. This patch fixed the crash condition and also greatly increased the number of availble IRQs for x86_64 platforms.- - * 3 per physical CPU. - * 1 per guest device (i.e. NIC or block device) - -When the IRQ limit is reached, the system will crash. As such, check your IRQ consumption to make sure that the number of guests you create (and their respective block devices) do not exhaust the IRQ limit. - -To determine how many IRQs you are currently consuming, run the command grep Dynamic-irq /proc/interrupts | wc -l.
With much help from rharper, I finally have a small, ramdisk-based guest config suitable for creating many guest instances on my RHEL5.3 beta (2.6.18-121.el5xen #1 SMP Mon Oct 27 22:03:03 EDT 2008 x86_64) system: [root@elm3c13 xen]# cat /xen/disk2/etc-xen-share/test1 name = "test1" maxmem = 64 memory = 64 vcpus = 1 kernel = "/etc/xen/vmlinuz-autobench-xen" root = "/dev/xvda" extra = "console=xvc0" on_poweroff = "destroy" on_reboot = "restart" on_crash = "preserve" vif = [ '' ] disk = [ 'tap:aio:/etc/xen/initrd-1.1-i386.img,xvda,r' ] When I tried to create a bunch of guests based upon this config, I ran into 'Error: (12, 'Cannot allocate memory')' messages at the 89th guest, well before IRQs were exhausted ('grep Dynamic-irq /proc/interrupts | wc -l' reports only 202, and I had 26 even before creating the first guest). I also saw some '(XEN) Cannot handle page request order 0!' messages on the console while these failures were occurring. The system has plenty of free memory (MemTotal: 33554432 kB; MemFree: 32273780 kB), so this error is confusing. Am I doing something wrong?
(In reply to comment #27) > my RHEL5.3 beta > (2.6.18-121.el5xen #1 SMP Mon Oct 27 22:03:03 EDT 2008 x86_64) system: Er, I meant to say snap1.
Oh, a couple of other data points I forgot to mention. My first attempt to workaround this issue was to reduce maxmem and memory from 64 to 32. But the system failed in exactly the same way, and still at the 89th guest. Then I thought I might perhaps consume IRQs more quickly by allocating more CPUs per guest. But bumping vcpus from 1 to 4 caused the system to hit the 'Cannot allocate memory' failure even earlier -- at the 68th rather than the 89th guest instance.
Yes, it's unlikely that you will be able to exhaust the IRQs since the patch increased them by quite a large margin. It is assumed that with the IRQ limit out of the way the next limitation would be hit. Please file a separate bug report with the details. I think for verification of this bug, getting past the 70 or so guests that used to fail is sufficient. Thanks for the testing!
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,2 +1 @@ -(all architectures) +When the Dynamic IRQs available for guests virtual machines were exhausted, the dom0 kernel would crash. In this update, the crash condition has been fixed, and the number of available IRQs has been increased, which resolves this issue.-When the Dynamic IRQs available for guests virtual machines were exhausted the domain 0 kernel would crash. This patch fixed the crash condition and also greatly increased the number of availble IRQs for x86_64 platforms.
For the sake of completeness, I tried my test scenario on a plain vanilla RHEL5.2 Xen installation. As I'd hoped, I saw the following: Kernel panic - not syncing: No available IRQ to bind to: increase NR_IRQS! (XEN) Domain 0 crashed: rebooting machine in 5 seconds. This happens at the 116th guest, which attempts to use the 257th IRQ. I then repeated the experiment, but first installed the -119 kernel -- that was the last test kernel provided prior to moving my testing to RHEL5.3. With that setup, I observed that I can allocate at least 256 guests with at least 512 IRQs without crashing. If desired, I can rerun that setup to the next power of 2 to see what happens. In both of the RHEL5.2 cases, I see the following errors starting at the 100th guest: tap tap-312-51712: 2 getting info blk_tap: Error initialising /dev/xen/blktap - No more devices blk_tap: Error initialising /dev/xen/blktap - No more devices <last msg repeats about 8 times per guest-creation attempt> The guests get created, but eventually get marked 'crashed'. Bottom line is that it seems that we used to be able to get 99 usable guests with RHEL5.2, whereas with RHEL5.3, we can only get 88, at least based upon this particular guest config. Not saying that's a problem -- just providing an FYI. Unless there are requests for further tests, I think this bug can be closed. I'll open a separate bug to track the 'Cannot allocate memory' issue.
Thanks for the testing. Please do open the new BZ to track the memory issue. I think with the existing testing and my forcing IRQ exhaustion via a kernel hack I am confident this issue is all set.
(In reply to comment #33) > In both of the RHEL5.2 cases, I see the following errors starting at the 100th > guest: > > tap tap-312-51712: 2 getting info > blk_tap: Error initialising /dev/xen/blktap - No more devices > blk_tap: Error initialising /dev/xen/blktap - No more devices > <last msg repeats about 8 times per guest-creation attempt> If I remember correctly, you are running into blktap limitations here. There is a hard-coded 100 disk limit currently in blktap, so you get the "No more devices" message when you try to add more disks and it doesn't find any more room in the array. You'll probably have better luck using LVM backed guests, since there is no such limitation there. If we really want to support more blktap disks (and we probably do), we should open up another bug to up that limit in blktap (but this will have to be for later releases). Chris Lalancette
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html