Bug 607400
Summary: | UV support: kexec command: extend for large cpu count and memory | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | George Beshers <gbeshers> | ||||
Component: | kexec-tools | Assignee: | Cong Wang <amwang> | ||||
Status: | CLOSED ERRATA | QA Contact: | Chao Ye <cye> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.0 | CC: | cpw, cye, dwa, gbeshers, martinez, phan, qcai, rkhan, syeghiay, tee | ||||
Target Milestone: | rc | ||||||
Target Release: | 6.1 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kexec-tools-2_0_0-172_el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-05-19 14:15:15 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 619426, 650298 | ||||||
Bug Blocks: | 580566, 645474 | ||||||
Attachments: |
|
Description
George Beshers
2010-06-24 02:14:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. George -- Per your comment in the description, please verify and update this BZ accordingly. Thanks! George, please either provide the patch as an attachment or give me the upstream commit ID's, please don't inline the patch in BZ, it is unusable. Also, have you tested it? Amerigo, Sorry, our internal bug system doesn't have the attachment capability and I did a cut-and-paste. We found another problem in the kernel with kdump. I am planning on testing this on a 5Tb system tomorrow (7/21). George Amerigo, I ran across a couple of completely different bugs testing this. Also, we are making a large (1024core) system available to RedHat on Tuesdays. It did not happen this last Tuesday because of a problem booting the system. George commit 4b4b2a533e218e287ab4aed25678434ad938309e Author: Cliff Wickman <cpw> Date: Wed Jun 16 08:36:09 2010 -0500 kexec: extend for large cpu count and memory ----------- commit 26ed909df48ea3db3f7395713a9c68c94d091032 Author: Cliff Wickman <cpw> Date: Thu Jun 17 11:37:06 2010 -0500 kexec: Unusable memory range type ----------- Are the above two commits all what we need? It seems I am still missing some other commit? Hi Amerigo, I believe that those are the only two patches we need, although to actually do a dump we can't really dump a full 5Tb. Our suggestion is to set the debug level to 31 which should provide a great deal of useful information if there is a problem in the field with rhel6. In any case, SGI is making a large system available to RedHat this evening until early Wed morning. I am hoping to find time in that period to test kdump. George George, Okay, we already use '-d 31' by default now. I am waiting for your testing result. Thanks! I built a test package: https://brewweb.devel.redhat.com/taskinfo?taskID=2674836 The makedumpfile command worked with our modified kexec based on 2.0.1. However, the modified kexec did not work. I am currently on my third patch to try to fix the problem. George To clarify the situation. I asked another SGI engineer for help with this patch. The patch does work, but against a later version of the kexec-tools from upstream. It was my mistake to pass the patch along without personally testing it. I worked this last weekend to try to fix the patch. George thank you testing the package, and for the clarification.. so, which patch(es) from upstream kexec-tools is missing other than the two patches listed in comment#11 above? I have requested help from another SGI engineer with this and will be careful to test the patched rpm on the 1024 core 5Tb machine that we make available to RedHat on a weekly basis. George George -- I believe Linda's Q on comment #20 is still outstanding. Could you please update this BZ with the specific patches the upstream version has vs. RH's? Thanks! We finally found the problem with kexec-tools and the e820 table -- it manifested itself as a memory corruption in the running kernel. I am currently cleaning up the patchset -- the last patch is upstream. George Created attachment 483023 [details] Tar bz2 file of patches and a series file. Up to a few comment cleanups this is what was built http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3164707 I have verified that this works on a number of UV systems. The filo is a bzip2 tar file of a quilt patches directory George Ok, finally I get the tar ball. One question, are all these patches in upstream? And I do appreciate that your patches attached are against latest RHEL-6 kexec-tools, this would save me much time to handle conflicts. Anyway, I will try to see if this is true. :) Thanks. Hi Amerigo, I added Cliff Wickman to the CC list. He indicated that they all were and I found most of them. A few had been partially applied and IIRC one I was unsure about because some of the code had been rewritten and moved. Let me know when you are ready to test and I will grab a big system. George Thanks, George. There are some problems from my eyes: 1. Not all commits matches in your patchset description, e.g. in kexec_segs_ranges, Backport of commit 563ee341d950f2fae0ba6608d70c19eb647ff943 and commit 7b325f8528d230e50a0c3841a3ac587dea2200e2 just for crashdump-x86_64.c which doesn't exist upstream. Neither of them matches that patch. 2. For 100823.kcore_header_patch, probably we need to backport my patch commit 1100580b05e3fdfe648d9be8617d962b11f4b88b Author: Amerigo Wang <amwang> Date: Thu Mar 3 00:10:43 2011 +0800 get the backup area dynamically Anyway, I will build a kexec-tools package with all of your patches except 100823.kcore_header_patch, plus the backport of 1100580b05e3fdfe648d9be8617d962b11f4b88b for you to test. George, please help to test this one: https://brewweb.devel.redhat.com/taskinfo?taskID=3181998 Thanks! Hmm, please use this one instead: https://brewweb.devel.redhat.com/taskinfo?taskID=3182054 Hi Amerigo, Interestingly enough if I take the x86_64 rpm that fails, but if I rebuild the source rpm on the system I am testing (I was trying to locate the problem) then it does work. Possibly a problem with the Brew root? George Oh, maybe, I made the srpm locally and send it to brew to build. Anyway, I take all the patches. Please try https://brewweb.devel.redhat.com/buildinfo?buildID=159954 to see if this rpm is okay. Thanks. Seems to be OK, but I haven't tested on a 2 rack system yet. George An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0736.html |