Bug 477206 - [LTC 5.4 FEAT] Xen support for 192 CPUs [201257]
[LTC 5.4 FEAT] Xen support for 192 CPUs [201257]
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.4
All All
high Severity high
: alpha
: 5.4
Assigned To: Chris Lalancette
Martin Jenner
: FutureFeature, OtherQA
Depends On:
Blocks: 445204 460955 483784 488638 488641
  Show dependency treegraph
 
Reported: 2008-12-19 12:02 EST by IBM Bug Proxy
Modified: 2011-12-02 14:05 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:19:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to allow 256 physical CPUS (25.71 KB, patch)
2009-01-09 04:39 EST, Chris Lalancette
no flags Details | Diff
Patch to allow 256 physical CPUS (27.55 KB, patch)
2009-01-09 05:34 EST, Chris Lalancette
no flags Details | Diff

  None (edit)
Description IBM Bug Proxy 2008-12-19 12:02:36 EST
=Comment: #0=================================================
Monte R. Knutson <mknutson@us.ibm.com> -  
1. Feature Overview:
Feature Id:	[201257]
a. Name of Feature:	Xen high performance multinode support
b. Feature Description
Further improvements to the Xen hypervisor to become consistent with the base level kernel, i.e.
#cpu (256), amount of mem (6TB).
 
 Customers are looking for improvements to limitation that currently exist with XEN.

Additional Comments:	Approve

Additional Comments:	We are again requesting that the RHEL5 Xen hypervisor support up to 255 logical
CPU threads and 6TB of memory.  We know there is an issue at around 126 CPUs that needs to be addressed.

2. Feature Details:
Sponsor:	xSeries
Architectures:
x86
x86_64

Arch Specificity: Both
Affects Core Kernel: Yes
Delivery Mechanism: Direct from community
Category:	Kernel
Request Type:	Kernel - Enhancement from Upstream
d. Upstream Acceptance:	In Progress
Sponsor Priority	1
f. Severity: High
IBM Confidential:	no
Code Contribution:	3rd party code
g. Component Version Target:	Xen Enhancements

3. Business Case
Our flagship multi-node product, x3950 M2, will support up to 96 cpus by the end of 2008.  We are
expecting this number to at least reach 256 logical threads in 2009/2010.  Wtih 16GB DIMMS coming
out, expect the max memory to be as high as 6TB.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis@redhat.com

5. Primary contacts at Partner:
Project Management Contact:
Monte Knutson, mknutson@us.ibm.com, 877-894-1495

Technical contact(s):
Kevin Stansell, kstansel@us.ibm.com
Chris McDermott, mcdermoc@us.ibm.com

IBM Manager:
Julio Alvarez, julioa@us.ibm.com
=Comment: #3=================================================
Monte R. Knutson <mknutson@us.ibm.com> - 
IBM will test as soon as OS and hardware is available.  Please let IBM know
when complete OS image can be verified.  Please see Roadmap file delivered at
last QBR for information on  availability of hardware.  We need OS sample at that time in order to test.
Comment 1 John Jarvis 2008-12-19 12:44:15 EST
IBM is signed up to test and provide feedback.
Comment 2 Chris Lalancette 2009-01-09 04:39:24 EST
Created attachment 328529 [details]
Patch to allow 256 physical CPUS

Here's a preliminary patch to allow the Xen hypervisor to see 256 cpus (although dom0 is still necessarily limited to 32 vcpus, as are all other domains).  I've done some light testing with this patch so far, and things seem to be working well.  Of course, the biggest machine I have here is a machine with 8 cores, so it's really only a smoke test.  I'm building test packages at the moment; I'll update again to ask for some testing once that package is finished building.

Chris Lalancette
Comment 3 Chris Lalancette 2009-01-09 04:53:26 EST
Oh, and just for my own notes, this is basically a backport of xen-unstable c/s 18520, with some RHEL specific tweaks.

Chris Lalancette
Comment 4 Chris Lalancette 2009-01-09 05:34:51 EST
Created attachment 328534 [details]
Patch to allow 256 physical CPUS

Whoops.  The previous patch broke the build on i386.  This updated patch should fix that.  Again, this is a backport of xen-unstable c/s 18520 and 18521.  Test packages (assuming the build completes now) will be coming soon.

Chris Lalancette
Comment 5 Chris Lalancette 2009-01-15 09:51:07 EST
I've uploaded a test kernel that contains this fix (along with several others)
to this location:

http://people.redhat.com/clalance/virttest

Could the original reporter try out the test kernels there, and report back if
it fixes the problem?

Thanks,
Chris Lalancette
Comment 6 John Jarvis 2009-02-06 15:43:58 EST
Chris,

Can you please help us get this tested per our conversation earlier this week?

Thanks,
John
Comment 13 John Jarvis 2009-02-18 14:07:55 EST
Adjusting the CPU count to the level IBM is confirmed to be able to test.
Comment 14 IBM Bug Proxy 2009-02-25 04:10:56 EST
(In reply to comment #13)
> Chris,
>
> Can you please help us get this tested per our conversation earlier this week?
>
> Thanks,
> John
>

FYI, I have begun to test kernel-xen-2.6.18-131.el5virttest9.x86_64.rpm on our 8-node 128-CPU Hermes system.  Apologies for the delay, but I lost quite a bit of time just getting the system back up as an 8-node RHEL5.2 system.  I do finally have that config operational again, and was able to install clalance's test kernel.  I am still running the system through its paces, but I can confirm that it at least boots up with the full complement of 128 CPUs, which is an incremental improvement from our previous limit of 126 CPUs.

Also FYI, Chris McDermott has managed to borrow some time on an 8-node 192-CPU Athena that is located in Kirkland, WA, but has run into issues with getting even a non-Xen kernel to boot with all 8 nodes.  Once he gets past those hurdles, he'll turn the system over to me to try out clalance's kernel with 192 CPUs.
Comment 15 Chris Lalancette 2009-02-25 08:17:11 EST
(In reply to comment #14)
> (In reply to comment #13)
> > Chris,
> >
> > Can you please help us get this tested per our conversation earlier this week?
> >
> > Thanks,
> > John
> >
> 
> FYI, I have begun to test kernel-xen-2.6.18-131.el5virttest9.x86_64.rpm on our
> 8-node 128-CPU Hermes system.  Apologies for the delay, but I lost quite a bit
> of time just getting the system back up as an 8-node RHEL5.2 system.  I do
> finally have that config operational again, and was able to install clalance's
> test kernel.  I am still running the system through its paces, but I can
> confirm that it at least boots up with the full complement of 128 CPUs, which
> is an incremental improvement from our previous limit of 126 CPUs.
> 
> Also FYI, Chris McDermott has managed to borrow some time on an 8-node 192-CPU
> Athena that is located in Kirkland, WA, but has run into issues with getting
> even a non-Xen kernel to boot with all 8 nodes.  Once he gets past those
> hurdles, he'll turn the system over to me to try out clalance's kernel with 192
> CPUs.

Excellent news!  Thanks for starting the testing, it will be really useful.  128 cpus is a good start.  When you get time, can you do some basic testing of starting up various guests, and doing other basic operations on them?  Specifically, I am looking for testing on:

32-bit PV guests*
64-bit PV guests
32-bit HVM guests
64-bit HVM guests

*there is a known problem in the virttest9 kernel for 32-bit PV guests, so I'm pretty sure it won't work for you.  I'll be putting out virttest10 tomorrow or Friday to correct it.

Chris Lalancette
Comment 17 Chris Lalancette 2009-03-11 13:05:28 EDT
Any word on further testing?  The latest virttest kernels here:

http://people.redhat.com/clalance/virttest

should have all of the fixes you need to run both 32-bit and 64-bit PV and HVM guest with a lot of processors.  I would still like to see some basic testing on all of the combinations in comment #15, along with the results of some save/restore testing on the various guests.

Chris Lalancette
Comment 18 IBM Bug Proxy 2009-03-11 14:51:24 EDT
(In reply to comment #18)
> Any word on further testing?  The latest virttest kernels here:
>
> http://people.redhat.com/clalance/virttest
>
> should have all of the fixes you need to run both 32-bit and 64-bit PV and HVM
> guest with a lot of processors.  I would still like to see some basic testing
> on all of the combinations in comment #15, along with the results of some
> save/restore testing on the various guests.

On the -131 test kernel, I have thus far only checked off the two highest items on my list: 1) dom0 stress testing; and 2) re-verified that the many-guests (bz43871) fix is still working.

I did not hit the blktap and/or 'out of memory' limitations I'd run into on RH5.2, but I need to recheck my notes and do some additional testing to identify where we top out on RH5.3.

I'm currently chasing some other issues, but will upgrade to the -133 test kernel and try some of your requested test scenarios as soon as I get some free cycles.
Comment 19 IBM Bug Proxy 2009-03-11 15:10:48 EDT
(In reply to comment #16)
>
> Also FYI, Chris McDermott has managed to borrow some time on an 8-node 192-CPU
> Athena that is located in Kirkland, WA, but has run into issues with getting
> even a non-Xen kernel to boot with all 8 nodes.  Once he gets past those
> hurdles, he'll turn the system over to me to try out clalance's kernel with 192
> CPUs.
>
Current status on this front...

Chris had reached the point of being able to boot the 8-node 192-cpu system with bare metal RH5.3, but was still seeing stability issues keeping the system up and merged as an 8-node. He also thinks there were some problems with IRQ and/or mmio resource management.

Before he could finish working through those issues, the Kirkland team needed to take the system back for some unrelated BIOS testing (they are in debug mode as they cannot boot 8-nodes even with Windows right now).  After he gets the system back, Chris will resume bare metal RH5.3 testing with the new BIOS.  Then when the system is stable with bare metal RH5.3, I'll try the clalance test kernel there.
Comment 20 Chris Lalancette 2009-03-11 15:46:19 EDT
(In reply to comment #18)
> (In reply to comment #18)
> > Any word on further testing?  The latest virttest kernels here:
> >
> > http://people.redhat.com/clalance/virttest
> >
> > should have all of the fixes you need to run both 32-bit and 64-bit PV and HVM
> > guest with a lot of processors.  I would still like to see some basic testing
> > on all of the combinations in comment #15, along with the results of some
> > save/restore testing on the various guests.
> 
> On the -131 test kernel, I have thus far only checked off the two highest items
> on my list: 1) dom0 stress testing; and 2) re-verified that the many-guests
> (bz43871) fix is still working.

I think you are missing a number in the above BZ number; what was the number again?

> 
> I did not hit the blktap and/or 'out of memory' limitations I'd run into on
> RH5.2, but I need to recheck my notes and do some additional testing to
> identify where we top out on RH5.3.

Just to be clear, the blktap limitation is the "100 total tapdisks" in the machine limitation (BZ 452650), right?  And what was the problem with the "out of memory" limitation, and/or is there a BZ number for it?

> 
> I'm currently chasing some other issues, but will upgrade to the -133 test
> kernel and try some of your requested test scenarios as soon as I get some free
> cycles.  

OK, great, that will be a huge help.

Thanks!
Chris Lalancette
Comment 21 IBM Bug Proxy 2009-03-13 14:00:45 EDT
(In reply to comment #21)
> (In reply to comment #18)
> >
> > On the -131 test kernel, I have thus far only checked off the two highest items
> > on my list: 1) dom0 stress testing; and 2) re-verified that the many-guests
> > (bz43871) fix is still working.
>
> I think you are missing a number in the above BZ number; what was the number
> again?

The bug to which I alluded -- bz43871 -- refers to IBM's BZ database.  It is apparently mirrored to Red Hat as RIT176656, but I don't have permissions to look at that.  That RIT is somehow also linked on the Red Hat side to BZ442736, which is where most of the investigation comments are documented.

> > I did not hit the blktap and/or 'out of memory' limitations I'd run into on
> > RH5.2, but I need to recheck my notes and do some additional testing to
> > identify where we top out on RH5.3.
>
> Just to be clear, the blktap limitation is the "100 total tapdisks" in the
> machine limitation (BZ 452650), right?

Yes.

> And what was the problem with the "out
> of memory" limitation, and/or is there a BZ number for it?

The symptom is mentioned here: https://bugzilla.redhat.com/show_bug.cgi?id=442736#c27

But there is no separate BZ yet to track it.  I'll open one if I hit it again.

> > I'm currently chasing some other issues, but will upgrade to the -133 test
> > kernel and try some of your requested test scenarios as soon as I get some free
> > cycles.

FYI, I'm going to continue this testing on a fresh RHEL5.3 install.  Up to now, I've been installing your test kernels into a pre-GA RHEL5.3 release (circa snap6, IIRC).
Comment 22 IBM Bug Proxy 2009-03-17 19:10:56 EDT
I was finally able to get access again to the 8-node x3950 M2 system and ran a quick boot test with the latest Xen kernel from http://people.redhat.com/clalance/virttest. There results looked promising. I will leave the more rigorous testing to James, but at least the H/V booted and detected all of the CPUs. I have inlined the 'xm info' output below and will attach the complete xen boot log as soon as I can capture it.

host                           : localhost.localdomain
release                      : 2.6.18-134.el5virttest12xen
version                       : #1 SMP Thu Mar 12 05:42:22 EDT 2009
machine                     : x86_64
nr_cpus                      : 192
nr_nodes                    : 8
sockets_per_node      : 4
cores_per_socket       : 6
threads_per_core       : 1
cpu_mhz                     : 2132
hw_caps                      bfebfbff:20100800:00000000:00000140:000ce33d:00000000:00000001
total_memory             : 38910
free_memory              : 33994
node_to_cpu              : node0:0-23
node1:24-47
node2:48-71
node3:72-95
node4:96-119
node5:120-143
node6:144-167
node7:168-191
xen_major                  : 3
xen_minor                  : 1
xen_extra                   : .2-134.el5virtt
xen_caps                    : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-86_64
xen_pagesize              : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset           : unavailable
cc_compiler                : gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)
cc_compile_by             : mockbuild
cc_compile_domain     : redhat.com
cc_compile_date         : Thu Mar 12 05:21:59 EDT 2009
xend_config_format    : 2
Comment 23 John Jarvis 2009-03-24 14:59:40 EDT
This enhancement request was evaluated by the full Red Hat Enterprise Linux 
team for inclusion in a Red Hat Enterprise Linux minor release.   As a 
result of this evaluation, Red Hat has tentatively approved inclusion of 
this feature in the next Red Hat Enterprise Linux Update minor release.   
While it is a goal to include this enhancement in the next minor release 
of Red Hat Enterprise Linux, the enhancement is not yet committed for 
inclusion in the next minor release pending the next phase of actual 
code integration and successful Red Hat and partner testing.
Comment 24 IBM Bug Proxy 2009-03-25 17:10:49 EDT
------- Comment From jmtt@us.ibm.com 2009-03-25 17:03 EDT-------
Chris McD handed the 8-node 192-CPU Athena to me for some additional testing with the clalance -134 kernel.  There are some obstacles to overcome due to the machine being outside of our normal lab infrastructure, but here are some preliminary results:

As Chris McD noted, the clalance -134 kernel boots all 192 CPUs.  I verified that this is an improvement over the standard -128 xen kernel, which spontaneously reboots during boot attempts.

I was able to rerun the many-guests test case successfully  -- these were ram-based 64-bit PV guests as described in https://bugzilla.redhat.com/show_bug.cgi?id=442736#c27.  As expected, I hit the known blktap device limit at the 101st guest (BZ 452650) .  The previously reported "out of memory" issue is no longer being seen.  .

That's all I have thus far for results.  I will next be trying to collect clalance's requested guest scenarios:

32-bit PV guests*
64-bit PV guests
32-bit HVM guests
64-bit HVM guests

We have only about a week's time of availability on the 192 CPU machine, and that week must be shared across several large system activities besides this RH5.3 testing. Anything that we can't get done on the 192-CPU box will get moved to our 128-CPU system, so it would be good to know if there is any prioritization and/or combos of guests that are most important to test at 192 CPUs.
Comment 25 Chris Lalancette 2009-03-25 17:27:38 EDT
The most important things for me would be to make sure that 64-bit PV guests work with up to 32 cpus (which it seems like you've already done).  Secondary to that, I would also like to see how 32-bit PV guests operate with 32 cpus on this setup.  Priority number 3 would be seeing 64-bit HVM guests in action (with as many CPUs as will work), and finally 32-bit HVM guest testing would be good (again, with as many CPUs as will work).

Chris Lalancette
Comment 26 IBM Bug Proxy 2009-03-25 17:30:24 EDT
------- Comment From jmtt@us.ibm.com 2009-03-25 17:20 EDT-------
(In reply to comment #24)
> I have inlined the 'xm info' output below and will
> attach the complete xen boot log as soon as I can capture it.

This refers to the fact that 'xm dmesg' output is currently truncated such that we can only see the end of the boot sequence.  True, we know the result was a success.  Nevertheless, it would be nice to capture the entire trace to have as a reference for comparison should something go south on us later on.  We normally capture this info via the serial port, but we have no serial port access to the remotely located 192-CPU system.

We had unsuccessfully been attempting to expand the printk ring buffer with the log_buf_len=size bootstring that works with the bare metal kernel.  But I've been advised that the ring buffer is hard-wired in the xen kernel.  I sent clalance an email requesting a new kernel built with those parameters enlarged.  If said kernel can be provided within the week that we have left on the 192-CPU machine, we'll capture and attach the full boot log here for posterity.  I view that to be a nice-to-have, however, not a show stopper.
Comment 27 IBM Bug Proxy 2009-03-28 01:00:15 EDT
------- Comment From jmtt@us.ibm.com 2009-03-28 00:57 EDT-------
(In reply to comment #28)
> The most important things for me would be to make sure that 64-bit PV guests
> work with up to 32 cpus (which it seems like you've already done).

Actually, my original 64-bit PV guests were 4 CPUs each, so I adjusted that config file to 32 CPUs.  Just for grins, I also tried for 33 CPUs, but got an "Error: (22, 'Invalid argument')" in response.

> Secondary
> to that, I would also like to see how 32-bit PV guests operate with 32 cpus on
> this setup.

Along side 6 of these 32-cpu, 64-bit ramdisk guests, I also installed and created two 32-CPU rhel4.7 32-bit PV guests.  All the guests stayed up and seemed operational, though building a kernel was pretty slow (the system is short on disk space, so I created the guest images with sparse files).  'xm vcpu-list' showed a pretty uniform distribution of VCPUs across the physical CPUs, with dedicated CPUs assigned to dom0.

> Priority number 3 would be seeing 64-bit HVM guests in action
> (with as many CPUs as will work), and finally 32-bit HVM guest testing would be
> good (again, with as many CPUs as will work).

The system is going to be shifted to some of the other large system tasks for now, but hopefully, I'll have time to try some HVM guests before we lose it for good.
Comment 28 Chris Lalancette 2009-04-08 04:19:43 EDT
Sorry for the delay.  Thanks for the testing, that tells us a lot.  So, at least the two cases I was most worried about seem to work, which is good.  I would still like to see some HVM testing, just to make sure things are still sane there.  I've done some of that testing on my own, but of course the biggest machine I have locally only has 8 cores :).  Anyway, keep us updated on the testing coverage.

Thanks,
Chris Lalancette
Comment 29 Don Zickus 2009-04-20 13:11:25 EDT
in kernel-2.6.18-140.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 31 IBM Bug Proxy 2009-04-20 20:10:37 EDT
------- Comment From lcm@us.ibm.com 2009-04-20 20:02 EDT-------
(In reply to comment #31)
> in kernel-2.6.18-140.el5
> You can download this test kernel from http://people.redhat.com/dzickus/el5
>

Our 192 CPU system is currently not available for testing. We already verified the Xen patches with Chris Lalancette's test kernel. And I'm hoping we will have access to the 192 CPU system eventually, in order to test the official RHEL5.4 kernel.
Comment 32 Chris Ward 2009-06-14 19:19:26 EDT
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
Comment 33 Chris Ward 2009-07-03 14:18:48 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 34 Chris Ward 2009-07-10 15:09:09 EDT
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~

RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching.

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.
Comment 35 IBM Bug Proxy 2009-07-16 21:00:49 EDT
------- Comment From jmtt@us.ibm.com 2009-07-16 20:59 EDT-------
(In reply to comment #36)
> ~~ Attention - RHEL 5.4 Beta Released! ~~
>
I don't currently have access to a 192-cpu system, but FYI, the RH5.4 beta (-155) xen kernel boots OK on our 128-cpu 8-node x460 system.

I confirmed that virt-manager plays well with xen along side kvm.

But when I tried to create 100 ramdisk-based, 4-cpu guests, I encountered some problems.  Guest creation seemed very slow, and around the 28th guest, the system hung -- all active windows, including the serial port and remote console became unresponsive; and when I checked in the lab, I also noticed the keyboards (I had both a ps2 and usb attached) were dead.  I'll investigate this further after loading the recently released snap2 drop.
Comment 36 Chris Lalancette 2009-07-17 02:46:44 EDT
(In reply to comment #35)
> I don't currently have access to a 192-cpu system, but FYI, the RH5.4 beta
> (-155) xen kernel boots OK on our 128-cpu 8-node x460 system.
> 
> I confirmed that virt-manager plays well with xen along side kvm.
> 
> But when I tried to create 100 ramdisk-based, 4-cpu guests, I encountered some
> problems.  Guest creation seemed very slow, and around the 28th guest, the
> system hung -- all active windows, including the serial port and remote console
> became unresponsive; and when I checked in the lab, I also noticed the
> keyboards (I had both a ps2 and usb attached) were dead.  I'll investigate this
> further after loading the recently released snap2 drop.  

OK, thanks for testing.  Just to be clear, when you say you created 100 ramdisk-based guests, you were doing this on RHEL-5.4 Xen, not KVM, correct?

Either way, if you can reproduce the issue with snap2, please open another bugzilla about it, and include details of the guest configuration, host configuration, etc.

Thanks,
Chris Lalancette
Comment 37 IBM Bug Proxy 2009-07-17 12:31:52 EDT
------- Comment From jmtt@us.ibm.com 2009-07-17 12:21 EDT-------
(In reply to comment #42)
> OK, thanks for testing.  Just to be clear, when you say you created 100
> ramdisk-based guests, you were doing this on RHEL-5.4 Xen, not KVM, correct?

Correct -- these were Xen guests.

> Either way, if you can reproduce the issue with snap2, please open another
> bugzilla about it, and include details of the guest configuration, host
> configuration, etc.

OK, will do.
Comment 38 Jan Tluka 2009-07-31 10:05:24 EDT
Confirmed the patch is in -160.el5 kernel. Adding SanityOnly.
Comment 39 Chris Ward 2009-08-03 11:45:44 EDT
~~ Attention Partners - RHEL 5.4 Snapshot 5 Released! ~~

RHEL 5.4 Snapshot 5 is the FINAL snapshot to be release before RC. It has been 
released on partners.redhat.com. If you have already reported your test results, 
you can safely ignore this request. Otherwise, please notice that there should be 
a fix available now that addresses this particular issue. Please test and report 
back your results here, at your earliest convenience.

If you encounter any issues while testing Beta, please describe the 
issues you have encountered and set the bug into NEED_INFO. If you 
encounter new issues, please clone this bug to open a new issue and 
request it be reviewed for inclusion in RHEL 5.4 or a later update, if it 
is not of urgent severity. If it is urgent, escalate the issue to your partner manager as soon as possible. There is /very/ little time left to get additional code into 5.4 before GA.

Partners, after you have verified, do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other 
appropriate customer representative.
Comment 40 John Jarvis 2009-08-04 15:25:04 EDT
IBM, can you please verify the 192 cpu support?
Comment 41 Chris McDermott 2009-08-04 17:01:27 EDT
(In reply to comment #40)
> IBM, can you please verify the 192 cpu support?  

We no longer have access to a 192 CPU system (8-node x3950 M2). The system we had remote access to was dismantled and shipped to a customer. The largest configuration we can test at this point in time is 128 CPUs (8-node x3950 M1). James will have some 128 results to post soon.
Comment 42 IBM Bug Proxy 2009-08-07 21:40:51 EDT
------- Comment From jmtt@us.ibm.com 2009-08-07 21:34 EDT-------
(In reply to comment #47)
> (In reply to comment #40)
> > IBM, can you please verify the 192 cpu support?
> We no longer have access to a 192 CPU system (8-node x3950 M2). The system we
> had remote access to was dismantled and shipped to a customer. The largest
> configuration we can test at this point in time is 128 CPUs (8-node x3950 M1).
> James will have some 128 results to post soon.

In general, RHEL5.4 xen support on our 128-cpu system seems to have regressed from RHEL5.3.  Performance is very sluggish, and it's difficult to create and run guests.  BZ55206/RIT325453 and BZ55360 (cannot create/boot a RHEL4.7 32-bit FV guest -- not yet mirrored) are the 2 worst quantifiable regressions that I've noticed thus far.  The only thing that seems to have not regressed thus far is the dom0 stress test still passes.
Comment 43 Chris Ward 2009-08-10 07:13:38 EDT
IBM, please file specific bugs per each issue encountered as soon as possible so we can look to fix them for RHEL 5.5. Thanks for your testing feedback.
Comment 45 errata-xmlrpc 2009-09-02 04:19:07 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html
Comment 46 IBM Bug Proxy 2011-02-22 05:12:56 EST
------- Comment From prem.karat@linux.vnet.ibm.com 2011-02-22 05:08 EDT-------
(In reply to comment #50)
> An advisory has been issued which should help the problem
> described in this bug report. This report is therefore being
> closed with a resolution of ERRATA. For more information
> on therefore solution and/or where to find the updated files,
> please follow the link below. You may reopen this bug report
> if the solution does not work for you.
>
> http://rhn.redhat.com/errata/RHSA-2009-1243.html

***Reviewed as a part of clean up activity********

Closing this one out as per the last comment

Cheers,
Prem

Note You need to log in before you can comment on or make changes to this bug.