=Comment: #0================================================= Monte R. Knutson <mknutson.com> - 1. Feature Overview: Feature Id: [201257] a. Name of Feature: Xen high performance multinode support b. Feature Description Further improvements to the Xen hypervisor to become consistent with the base level kernel, i.e. #cpu (256), amount of mem (6TB). Customers are looking for improvements to limitation that currently exist with XEN. Additional Comments: Approve Additional Comments: We are again requesting that the RHEL5 Xen hypervisor support up to 255 logical CPU threads and 6TB of memory. We know there is an issue at around 126 CPUs that needs to be addressed. 2. Feature Details: Sponsor: xSeries Architectures: x86 x86_64 Arch Specificity: Both Affects Core Kernel: Yes Delivery Mechanism: Direct from community Category: Kernel Request Type: Kernel - Enhancement from Upstream d. Upstream Acceptance: In Progress Sponsor Priority 1 f. Severity: High IBM Confidential: no Code Contribution: 3rd party code g. Component Version Target: Xen Enhancements 3. Business Case Our flagship multi-node product, x3950 M2, will support up to 96 cpus by the end of 2008. We are expecting this number to at least reach 256 logical threads in 2009/2010. Wtih 16GB DIMMS coming out, expect the max memory to be as high as 6TB. 4. Primary contact at Red Hat: John Jarvis jjarvis 5. Primary contacts at Partner: Project Management Contact: Monte Knutson, mknutson.com, 877-894-1495 Technical contact(s): Kevin Stansell, kstansel.com Chris McDermott, mcdermoc.com IBM Manager: Julio Alvarez, julioa.com =Comment: #3================================================= Monte R. Knutson <mknutson.com> - IBM will test as soon as OS and hardware is available. Please let IBM know when complete OS image can be verified. Please see Roadmap file delivered at last QBR for information on availability of hardware. We need OS sample at that time in order to test.
IBM is signed up to test and provide feedback.
Created attachment 328529 [details] Patch to allow 256 physical CPUS Here's a preliminary patch to allow the Xen hypervisor to see 256 cpus (although dom0 is still necessarily limited to 32 vcpus, as are all other domains). I've done some light testing with this patch so far, and things seem to be working well. Of course, the biggest machine I have here is a machine with 8 cores, so it's really only a smoke test. I'm building test packages at the moment; I'll update again to ask for some testing once that package is finished building. Chris Lalancette
Oh, and just for my own notes, this is basically a backport of xen-unstable c/s 18520, with some RHEL specific tweaks. Chris Lalancette
Created attachment 328534 [details] Patch to allow 256 physical CPUS Whoops. The previous patch broke the build on i386. This updated patch should fix that. Again, this is a backport of xen-unstable c/s 18520 and 18521. Test packages (assuming the build completes now) will be coming soon. Chris Lalancette
I've uploaded a test kernel that contains this fix (along with several others) to this location: http://people.redhat.com/clalance/virttest Could the original reporter try out the test kernels there, and report back if it fixes the problem? Thanks, Chris Lalancette
Chris, Can you please help us get this tested per our conversation earlier this week? Thanks, John
Adjusting the CPU count to the level IBM is confirmed to be able to test.
(In reply to comment #13) > Chris, > > Can you please help us get this tested per our conversation earlier this week? > > Thanks, > John > FYI, I have begun to test kernel-xen-2.6.18-131.el5virttest9.x86_64.rpm on our 8-node 128-CPU Hermes system. Apologies for the delay, but I lost quite a bit of time just getting the system back up as an 8-node RHEL5.2 system. I do finally have that config operational again, and was able to install clalance's test kernel. I am still running the system through its paces, but I can confirm that it at least boots up with the full complement of 128 CPUs, which is an incremental improvement from our previous limit of 126 CPUs. Also FYI, Chris McDermott has managed to borrow some time on an 8-node 192-CPU Athena that is located in Kirkland, WA, but has run into issues with getting even a non-Xen kernel to boot with all 8 nodes. Once he gets past those hurdles, he'll turn the system over to me to try out clalance's kernel with 192 CPUs.
(In reply to comment #14) > (In reply to comment #13) > > Chris, > > > > Can you please help us get this tested per our conversation earlier this week? > > > > Thanks, > > John > > > > FYI, I have begun to test kernel-xen-2.6.18-131.el5virttest9.x86_64.rpm on our > 8-node 128-CPU Hermes system. Apologies for the delay, but I lost quite a bit > of time just getting the system back up as an 8-node RHEL5.2 system. I do > finally have that config operational again, and was able to install clalance's > test kernel. I am still running the system through its paces, but I can > confirm that it at least boots up with the full complement of 128 CPUs, which > is an incremental improvement from our previous limit of 126 CPUs. > > Also FYI, Chris McDermott has managed to borrow some time on an 8-node 192-CPU > Athena that is located in Kirkland, WA, but has run into issues with getting > even a non-Xen kernel to boot with all 8 nodes. Once he gets past those > hurdles, he'll turn the system over to me to try out clalance's kernel with 192 > CPUs. Excellent news! Thanks for starting the testing, it will be really useful. 128 cpus is a good start. When you get time, can you do some basic testing of starting up various guests, and doing other basic operations on them? Specifically, I am looking for testing on: 32-bit PV guests* 64-bit PV guests 32-bit HVM guests 64-bit HVM guests *there is a known problem in the virttest9 kernel for 32-bit PV guests, so I'm pretty sure it won't work for you. I'll be putting out virttest10 tomorrow or Friday to correct it. Chris Lalancette
Any word on further testing? The latest virttest kernels here: http://people.redhat.com/clalance/virttest should have all of the fixes you need to run both 32-bit and 64-bit PV and HVM guest with a lot of processors. I would still like to see some basic testing on all of the combinations in comment #15, along with the results of some save/restore testing on the various guests. Chris Lalancette
(In reply to comment #18) > Any word on further testing? The latest virttest kernels here: > > http://people.redhat.com/clalance/virttest > > should have all of the fixes you need to run both 32-bit and 64-bit PV and HVM > guest with a lot of processors. I would still like to see some basic testing > on all of the combinations in comment #15, along with the results of some > save/restore testing on the various guests. On the -131 test kernel, I have thus far only checked off the two highest items on my list: 1) dom0 stress testing; and 2) re-verified that the many-guests (bz43871) fix is still working. I did not hit the blktap and/or 'out of memory' limitations I'd run into on RH5.2, but I need to recheck my notes and do some additional testing to identify where we top out on RH5.3. I'm currently chasing some other issues, but will upgrade to the -133 test kernel and try some of your requested test scenarios as soon as I get some free cycles.
(In reply to comment #16) > > Also FYI, Chris McDermott has managed to borrow some time on an 8-node 192-CPU > Athena that is located in Kirkland, WA, but has run into issues with getting > even a non-Xen kernel to boot with all 8 nodes. Once he gets past those > hurdles, he'll turn the system over to me to try out clalance's kernel with 192 > CPUs. > Current status on this front... Chris had reached the point of being able to boot the 8-node 192-cpu system with bare metal RH5.3, but was still seeing stability issues keeping the system up and merged as an 8-node. He also thinks there were some problems with IRQ and/or mmio resource management. Before he could finish working through those issues, the Kirkland team needed to take the system back for some unrelated BIOS testing (they are in debug mode as they cannot boot 8-nodes even with Windows right now). After he gets the system back, Chris will resume bare metal RH5.3 testing with the new BIOS. Then when the system is stable with bare metal RH5.3, I'll try the clalance test kernel there.
(In reply to comment #18) > (In reply to comment #18) > > Any word on further testing? The latest virttest kernels here: > > > > http://people.redhat.com/clalance/virttest > > > > should have all of the fixes you need to run both 32-bit and 64-bit PV and HVM > > guest with a lot of processors. I would still like to see some basic testing > > on all of the combinations in comment #15, along with the results of some > > save/restore testing on the various guests. > > On the -131 test kernel, I have thus far only checked off the two highest items > on my list: 1) dom0 stress testing; and 2) re-verified that the many-guests > (bz43871) fix is still working. I think you are missing a number in the above BZ number; what was the number again? > > I did not hit the blktap and/or 'out of memory' limitations I'd run into on > RH5.2, but I need to recheck my notes and do some additional testing to > identify where we top out on RH5.3. Just to be clear, the blktap limitation is the "100 total tapdisks" in the machine limitation (BZ 452650), right? And what was the problem with the "out of memory" limitation, and/or is there a BZ number for it? > > I'm currently chasing some other issues, but will upgrade to the -133 test > kernel and try some of your requested test scenarios as soon as I get some free > cycles. OK, great, that will be a huge help. Thanks! Chris Lalancette
(In reply to comment #21) > (In reply to comment #18) > > > > On the -131 test kernel, I have thus far only checked off the two highest items > > on my list: 1) dom0 stress testing; and 2) re-verified that the many-guests > > (bz43871) fix is still working. > > I think you are missing a number in the above BZ number; what was the number > again? The bug to which I alluded -- bz43871 -- refers to IBM's BZ database. It is apparently mirrored to Red Hat as RIT176656, but I don't have permissions to look at that. That RIT is somehow also linked on the Red Hat side to BZ442736, which is where most of the investigation comments are documented. > > I did not hit the blktap and/or 'out of memory' limitations I'd run into on > > RH5.2, but I need to recheck my notes and do some additional testing to > > identify where we top out on RH5.3. > > Just to be clear, the blktap limitation is the "100 total tapdisks" in the > machine limitation (BZ 452650), right? Yes. > And what was the problem with the "out > of memory" limitation, and/or is there a BZ number for it? The symptom is mentioned here: https://bugzilla.redhat.com/show_bug.cgi?id=442736#c27 But there is no separate BZ yet to track it. I'll open one if I hit it again. > > I'm currently chasing some other issues, but will upgrade to the -133 test > > kernel and try some of your requested test scenarios as soon as I get some free > > cycles. FYI, I'm going to continue this testing on a fresh RHEL5.3 install. Up to now, I've been installing your test kernels into a pre-GA RHEL5.3 release (circa snap6, IIRC).
I was finally able to get access again to the 8-node x3950 M2 system and ran a quick boot test with the latest Xen kernel from http://people.redhat.com/clalance/virttest. There results looked promising. I will leave the more rigorous testing to James, but at least the H/V booted and detected all of the CPUs. I have inlined the 'xm info' output below and will attach the complete xen boot log as soon as I can capture it. host : localhost.localdomain release : 2.6.18-134.el5virttest12xen version : #1 SMP Thu Mar 12 05:42:22 EDT 2009 machine : x86_64 nr_cpus : 192 nr_nodes : 8 sockets_per_node : 4 cores_per_socket : 6 threads_per_core : 1 cpu_mhz : 2132 hw_caps bfebfbff:20100800:00000000:00000140:000ce33d:00000000:00000001 total_memory : 38910 free_memory : 33994 node_to_cpu : node0:0-23 node1:24-47 node2:48-71 node3:72-95 node4:96-119 node5:120-143 node6:144-167 node7:168-191 xen_major : 3 xen_minor : 1 xen_extra : .2-134.el5virtt xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable cc_compiler : gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) cc_compile_by : mockbuild cc_compile_domain : redhat.com cc_compile_date : Thu Mar 12 05:21:59 EDT 2009 xend_config_format : 2
This enhancement request was evaluated by the full Red Hat Enterprise Linux team for inclusion in a Red Hat Enterprise Linux minor release. As a result of this evaluation, Red Hat has tentatively approved inclusion of this feature in the next Red Hat Enterprise Linux Update minor release. While it is a goal to include this enhancement in the next minor release of Red Hat Enterprise Linux, the enhancement is not yet committed for inclusion in the next minor release pending the next phase of actual code integration and successful Red Hat and partner testing.
------- Comment From jmtt.com 2009-03-25 17:03 EDT------- Chris McD handed the 8-node 192-CPU Athena to me for some additional testing with the clalance -134 kernel. There are some obstacles to overcome due to the machine being outside of our normal lab infrastructure, but here are some preliminary results: As Chris McD noted, the clalance -134 kernel boots all 192 CPUs. I verified that this is an improvement over the standard -128 xen kernel, which spontaneously reboots during boot attempts. I was able to rerun the many-guests test case successfully -- these were ram-based 64-bit PV guests as described in https://bugzilla.redhat.com/show_bug.cgi?id=442736#c27. As expected, I hit the known blktap device limit at the 101st guest (BZ 452650) . The previously reported "out of memory" issue is no longer being seen. . That's all I have thus far for results. I will next be trying to collect clalance's requested guest scenarios: 32-bit PV guests* 64-bit PV guests 32-bit HVM guests 64-bit HVM guests We have only about a week's time of availability on the 192 CPU machine, and that week must be shared across several large system activities besides this RH5.3 testing. Anything that we can't get done on the 192-CPU box will get moved to our 128-CPU system, so it would be good to know if there is any prioritization and/or combos of guests that are most important to test at 192 CPUs.
The most important things for me would be to make sure that 64-bit PV guests work with up to 32 cpus (which it seems like you've already done). Secondary to that, I would also like to see how 32-bit PV guests operate with 32 cpus on this setup. Priority number 3 would be seeing 64-bit HVM guests in action (with as many CPUs as will work), and finally 32-bit HVM guest testing would be good (again, with as many CPUs as will work). Chris Lalancette
------- Comment From jmtt.com 2009-03-25 17:20 EDT------- (In reply to comment #24) > I have inlined the 'xm info' output below and will > attach the complete xen boot log as soon as I can capture it. This refers to the fact that 'xm dmesg' output is currently truncated such that we can only see the end of the boot sequence. True, we know the result was a success. Nevertheless, it would be nice to capture the entire trace to have as a reference for comparison should something go south on us later on. We normally capture this info via the serial port, but we have no serial port access to the remotely located 192-CPU system. We had unsuccessfully been attempting to expand the printk ring buffer with the log_buf_len=size bootstring that works with the bare metal kernel. But I've been advised that the ring buffer is hard-wired in the xen kernel. I sent clalance an email requesting a new kernel built with those parameters enlarged. If said kernel can be provided within the week that we have left on the 192-CPU machine, we'll capture and attach the full boot log here for posterity. I view that to be a nice-to-have, however, not a show stopper.
------- Comment From jmtt.com 2009-03-28 00:57 EDT------- (In reply to comment #28) > The most important things for me would be to make sure that 64-bit PV guests > work with up to 32 cpus (which it seems like you've already done). Actually, my original 64-bit PV guests were 4 CPUs each, so I adjusted that config file to 32 CPUs. Just for grins, I also tried for 33 CPUs, but got an "Error: (22, 'Invalid argument')" in response. > Secondary > to that, I would also like to see how 32-bit PV guests operate with 32 cpus on > this setup. Along side 6 of these 32-cpu, 64-bit ramdisk guests, I also installed and created two 32-CPU rhel4.7 32-bit PV guests. All the guests stayed up and seemed operational, though building a kernel was pretty slow (the system is short on disk space, so I created the guest images with sparse files). 'xm vcpu-list' showed a pretty uniform distribution of VCPUs across the physical CPUs, with dedicated CPUs assigned to dom0. > Priority number 3 would be seeing 64-bit HVM guests in action > (with as many CPUs as will work), and finally 32-bit HVM guest testing would be > good (again, with as many CPUs as will work). The system is going to be shifted to some of the other large system tasks for now, but hopefully, I'll have time to try some HVM guests before we lose it for good.
Sorry for the delay. Thanks for the testing, that tells us a lot. So, at least the two cases I was most worried about seem to work, which is good. I would still like to see some HVM testing, just to make sure things are still sane there. I've done some of that testing on my own, but of course the biggest machine I have locally only has 8 cores :). Anyway, keep us updated on the testing coverage. Thanks, Chris Lalancette
in kernel-2.6.18-140.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
------- Comment From lcm.com 2009-04-20 20:02 EDT------- (In reply to comment #31) > in kernel-2.6.18-140.el5 > You can download this test kernel from http://people.redhat.com/dzickus/el5 > Our 192 CPU system is currently not available for testing. We already verified the Xen patches with Chris Lalancette's test kernel. And I'm hoping we will have access to the 192 CPU system eventually, in order to test the official RHEL5.4 kernel.
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~ RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner! If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~ RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching. If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.
------- Comment From jmtt.com 2009-07-16 20:59 EDT------- (In reply to comment #36) > ~~ Attention - RHEL 5.4 Beta Released! ~~ > I don't currently have access to a 192-cpu system, but FYI, the RH5.4 beta (-155) xen kernel boots OK on our 128-cpu 8-node x460 system. I confirmed that virt-manager plays well with xen along side kvm. But when I tried to create 100 ramdisk-based, 4-cpu guests, I encountered some problems. Guest creation seemed very slow, and around the 28th guest, the system hung -- all active windows, including the serial port and remote console became unresponsive; and when I checked in the lab, I also noticed the keyboards (I had both a ps2 and usb attached) were dead. I'll investigate this further after loading the recently released snap2 drop.
(In reply to comment #35) > I don't currently have access to a 192-cpu system, but FYI, the RH5.4 beta > (-155) xen kernel boots OK on our 128-cpu 8-node x460 system. > > I confirmed that virt-manager plays well with xen along side kvm. > > But when I tried to create 100 ramdisk-based, 4-cpu guests, I encountered some > problems. Guest creation seemed very slow, and around the 28th guest, the > system hung -- all active windows, including the serial port and remote console > became unresponsive; and when I checked in the lab, I also noticed the > keyboards (I had both a ps2 and usb attached) were dead. I'll investigate this > further after loading the recently released snap2 drop. OK, thanks for testing. Just to be clear, when you say you created 100 ramdisk-based guests, you were doing this on RHEL-5.4 Xen, not KVM, correct? Either way, if you can reproduce the issue with snap2, please open another bugzilla about it, and include details of the guest configuration, host configuration, etc. Thanks, Chris Lalancette
------- Comment From jmtt.com 2009-07-17 12:21 EDT------- (In reply to comment #42) > OK, thanks for testing. Just to be clear, when you say you created 100 > ramdisk-based guests, you were doing this on RHEL-5.4 Xen, not KVM, correct? Correct -- these were Xen guests. > Either way, if you can reproduce the issue with snap2, please open another > bugzilla about it, and include details of the guest configuration, host > configuration, etc. OK, will do.
Confirmed the patch is in -160.el5 kernel. Adding SanityOnly.
~~ Attention Partners - RHEL 5.4 Snapshot 5 Released! ~~ RHEL 5.4 Snapshot 5 is the FINAL snapshot to be release before RC. It has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular issue. Please test and report back your results here, at your earliest convenience. If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. If it is urgent, escalate the issue to your partner manager as soon as possible. There is /very/ little time left to get additional code into 5.4 before GA. Partners, after you have verified, do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.
IBM, can you please verify the 192 cpu support?
(In reply to comment #40) > IBM, can you please verify the 192 cpu support? We no longer have access to a 192 CPU system (8-node x3950 M2). The system we had remote access to was dismantled and shipped to a customer. The largest configuration we can test at this point in time is 128 CPUs (8-node x3950 M1). James will have some 128 results to post soon.
------- Comment From jmtt.com 2009-08-07 21:34 EDT------- (In reply to comment #47) > (In reply to comment #40) > > IBM, can you please verify the 192 cpu support? > We no longer have access to a 192 CPU system (8-node x3950 M2). The system we > had remote access to was dismantled and shipped to a customer. The largest > configuration we can test at this point in time is 128 CPUs (8-node x3950 M1). > James will have some 128 results to post soon. In general, RHEL5.4 xen support on our 128-cpu system seems to have regressed from RHEL5.3. Performance is very sluggish, and it's difficult to create and run guests. BZ55206/RIT325453 and BZ55360 (cannot create/boot a RHEL4.7 32-bit FV guest -- not yet mirrored) are the 2 worst quantifiable regressions that I've noticed thus far. The only thing that seems to have not regressed thus far is the dom0 stress test still passes.
IBM, please file specific bugs per each issue encountered as soon as possible so we can look to fix them for RHEL 5.5. Thanks for your testing feedback.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html
------- Comment From prem.karat.ibm.com 2011-02-22 05:08 EDT------- (In reply to comment #50) > An advisory has been issued which should help the problem > described in this bug report. This report is therefore being > closed with a resolution of ERRATA. For more information > on therefore solution and/or where to find the updated files, > please follow the link below. You may reopen this bug report > if the solution does not work for you. > > http://rhn.redhat.com/errata/RHSA-2009-1243.html ***Reviewed as a part of clean up activity******** Closing this one out as per the last comment Cheers, Prem