Bug 607650
Summary: | KVM uses wrong permissions for large guest pages | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Martin Banas <mbanas> | ||||||||
Component: | kernel | Assignee: | Karen Noel <knoel> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 6.0 | CC: | aarcange, amit.shah, bmarson, ddumas, dmalcolm, ebenes, emcnabb, ghacker, hdegoede, jarod, jbrier, jclift, jcm, jokajak, jpirko, jstodola, justin, kchamart, knoel, lihuang, liko, llim, lwang, lwoodman, maier, mbanas, michen, mishu, msauton, mtosatti, mvadkert, pholica, qcai, qwan, riel, rjones, shalli.vcgfdt, shuang, sushil.singh, syeghiay, tao, tburke | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 615225 (view as bug list) | Environment: | |||||||||
Last Closed: | 2010-11-11 15:44:09 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 582286, 599016, 615225 | ||||||||||
Attachments: |
|
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. I have no clue, let's debug it using all the steps below: 1) use updates=http://akozumpl.fedorapeople.org/bz607650.img. it should give us a tiny bit of more information. 2) use nokill option on the kernel command line. 3) do not reboot the machine. 4) call me over after a crash so we can take a look at the installation computer while its still running. 5) try to secure files in/mnt/sysimage/root. they should contain logs that will tell us what package is currently being installed---maybe it's one of them. 6) BTW is it possible that one of the QE tools running (or the QE engineer itself) sent anaconda the SIGUSR1 signal? Thanks. Ales Hi Ales, I'm going to use updates.img today, I'll tell you when I have something. Yes, yesterday before calling you we sent SIGUSR1 signal to anaconda to get tb_ logs :)) Created attachment 426773 [details]
traceback screenshot
I got traceback while using updates.img. Can you provide another updates.img?
Thanks
New update posted at the same location. We've seen this kind of crash so far in: * enablefilesystems * installpackages * right at the start of anaconda after udevadm is called If anaconda is always the same (that is top of the current beta branch), I can reproduced this when I use kernel+images from: * releng 0622.1 (kernel 2.6.32-37) * releng 0621.0 (kernel 2.6.32-37) * releng 0617.0 (kernel 2.6.32-36) I have not been able to reproduce this with: * releng 0603.1 (kernel 2.6.32-33) * nightly 0610.n (kernel 2.6.32-33) Once again: the stage2 anaconda version is always the same and we haven't changed anything in stage1 since: commit c58efd1e9d0971ad1ddd155be4fc930006af7a5c Author: Chris Lumens <clumens> Date: Fri May 28 15:57:11 2010 -0400 Also reproducible on: * nightly 0628.n.3 (kernel 2.6.32-37) I discovered yesterday that the crashes happen because a routine deep down in Python C code (even glibc possibly) calls abort() in the anaconda process upon seeing a corrupted memory structure. Note: I don't know of anyone (the Brno RTT or myself) who has seen this on anything else except x86_64 qemu-kvm virtual machine. Also see bug 609071. All the kvm virtual machines passed 1-pass memtest without any errors. So did the kvm host. (In reply to comment #6) > We've seen this kind of crash so far in: > * enablefilesystems > * installpackages > * right at the start of anaconda after udevadm is called It might not be the same kind of crash, however, Ales' patch to add a dump on anaconda crash might also help debugging bug 525804 and is thus a good thing. Is there any swap in use, especially on the host? Do you run KSM? Thanks! Hi Andrea, The host: [root@cobra03 ~]# cat /proc/swaps Filename Type Size Used Priority /dev/dm-1 partition 6160376 13100 -1 [root@cobra03 ~]# cat /sys/kernel/mm/ksm/pages_sharing 246676 IOW: we didn't set anything special, its a rhel6 machine with qemu-kvm version 0.12.1.2. The guest: In anaconda, we don't use swap from the start but it is used once disks are mounted. At the moment most of the crashes appear, the disks are mounted. I have seen it crash before that though. *** Bug 609071 has been marked as a duplicate of this bug. *** For now I'd like you to test this kernel (ideally both as host and guest, but you can start testing it on the host), to know if the problem goes away with it (as long as it wasn't reproducible with transparent hugepage disabled). If swapping wasn't happening (at least on the host), I doubt it will help though. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2560476 (In reply to comment #16) > For now I'd like you to test this kernel (ideally both as host and guest, but > you can start testing it on the host), to know if the problem goes away > with it (as long as it wasn't reproducible with > transparent hugepage disabled). Andrea, I didn't try with hugepage disabled. How can it be set and where---host or guest? Thanks. Ales oh wait, it's been moved to kernel component. sorry for changing the status. also in that case, I guess the kernel QE can take care of any additional testing. You tried with transparent hugepage disabled because you tested it on older kernels and it worked. Transparent hugepage has been enabled only on more recent kernels. Can you just load the kernel at the above link in the host and see if it happens again? Hopefully it doesn't take too much time to be sure it's not reproducible anymore! Hi Andrea, Our kvm server is a production machine used by my entire team and I can't adhoc install a new (experimental) kernel into it and reset it. Ales Andrea, To be clear the various kernel versions with which the tests were run mentioned in comment #7, were tested on the guest side. No changes were made on the host side during these tests. So to be sure: Do you believe that enabling transparent huge page support inside the guest can cause problems on the host side? Or was there a misunderstanding that we were testing with different kernels on the host side? Regards, Hans I just tried and I am unable to reproduce this with kernel 2.6.32-35.el6. This could also be useful: while testing the 0630.n.0 compose I found that the best way to reproduce this is using the text installer. *** Bug 606700 has been marked as a duplicate of this bug. *** to comment #22 and comment #23: then we can test that kernel in guest. But if this was only the guest I've an hard time to see how this could be related to that patch. Still if that patch is hiding a race condition in the VM that doesn't only trigger with KSM copy, it's worth testing. THP enabled triggers stuff in the VM that wouldn't normally trigger without THP but those triggers are unrelated to the THP support. So it's worth testing the rpm in guest but I'm not optimistic. If EPT is enabled on host and there are no transparent hugepages on host, it's unlikely to be a bug in the host. And I assume you don't get any kernel error in guest (kernel logs) or you would have mentioned already, and you only get that python error in userland. the tmp.tar.gz didn't include /tmp/updates/iutil.py the error in this bug says TRANSLATION_UPDATE_DIR isn't defined, that is a much saner error than what is for example posted here: https://bugzilla.redhat.com/attachment.cgi?id=427650 Looking the logs it seems there is no kernel error in guest, just these userland failures. Andrea, the logs attached in the bug description are a bit misleading: 09:39:22,327 DEBUG : X server has signalled a successful start. 09:39:22,329 ERROR : Error running /usr/bin/metacity: Interrupted system call Those two lines only appear because the QA engineer who first encountered the problem sent anaconda USR1 in an attempt to extract traceback (what he really meant was USR2, thus the error lines instead). Normally, anaconda crashes without any error displayed at all, only the anaconda init process says that the termination was abnormal. I added code into more recent anaconda that also displays what signal killed it (if any) and, more importantly, coredumps. Upon inspecting the coredump I discovered that anaconda is abort()ed by a glibc() routine handling a malloc(). If you are interested in seeing the dump than get it touch with me (the core dump is 84 MB and you'll need the right debug symbols, I can help you with that or give you access to my testing machine). Ales Thanks for sharing the link, does the new kernel need to be tested at both the host and the guest? If you have cat /sys/module/kvm_intel/parameters/ept == Y on host, then it's hard to tell how hugepages on guest could trigger bugs in KVM. If EPT is off, then yes you should try with latest rhel6 KVM on host too. If EPT is on, it should be enough to test the taskID=2569345 on guest, considering the bug doesn't trigger with transparent hugepage off. Good news, I did several installs with kernel-2.6.32-41.el6transhuge in both the kvm host and the guest and I am fairly confident the problem is gone! Hello, I hit the bug again: Anaconda died after receiving signal 6. I was installing RHEL6.0-20100707.4, which has kernel-2.6.32-44. The host was RHEL5.5 with kernel 2.6.18-194. Same thing here, host fedora 12, tested compose the same as in comment #39 So I've been told on irc that host crashed too, and that ept is N. So this sounds like a bug in kvm in RHEL 5.5 in dealing with shadow pagetables changing size. Likely a host kvm mmu bug. can you verify that all systems where this happened (either corruption in guest or rhel5 host crash) where ept=0 systems? s/where/were/ (In reply to comment #43) > can you verify that all systems where this happened (either corruption in guest > or rhel5 host crash) where ept=0 systems? Martin, you can check that by doing: cat /sys/module/kvm_intel/parameters/ept Thanks. Ales Likely 372f84cecff2af0c5a14ebaef9563b1a2e2acfdb in kvm.git. Even more likely, 3be2264b. No, 3be2264b is only suspect if the host uses large pages as well. How much memory and cpus do you assign to a guest to reproduce this? I use always 1 CPU and 700-1000MB of RAM For those testing on RHEL 6 beta hosts, please try https://brewweb.devel.redhat.com/taskinfo?taskID=2587538. Others, place your orders here. Hi Ales, There's no such file in RHEL5 host: /sys/module/kvm_intel/parameters/ept I also reproduced on my notebook with Fedora 12, where ept=n. I always assign 1 CPU a 1GB of memory to the guest. *** Bug 613320 has been marked as a duplicate of this bug. *** *** Bug 610261 has been marked as a duplicate of this bug. *** *** Bug 610255 has been marked as a duplicate of this bug. *** Avi, do you want testing done on F13 too? This problem occurs in RHEL 6 beta 2 in KVM on my F13 desktop, so easy to test. Yes. Do you need me to build a test kernel (please say no)? Thanks Avi (and Andrea). Just tried your kernel from comment #52 on the F13 system here, using this ISO on KVM locally, and it still crashes out Anaconda: ftp://ftp.redhat.com/pub/redhat/rhel/beta/6Server-beta2/x86_64/iso/RHEL6.0-20100622.1-Server-x86_64-DVD1.iso $ cat /sys/module/kvm_intel/parameters/ept N $ rpm -qa | grep kvm qemu-kvm-0.12.3-8.fc13.x86_64 $ The VM in question was allocated 1024MB ram, and 2 virtual cpu's. Physically, it's running on a dual core box (E3300) with 4GB ram. I can do a screencast of the whole process if you want, using something like RecordMyDesktop? (it's pretty simple) Created attachment 431813 [details]
Screeenshot of the crashed out anaconda.
I now have a reliable reproducer (on F13 host). Hacked khugepaged/scan_sleep_millisecs = 1 in guest initrd, crash is immediate. Yay! bad: 2.6.33.6-147.fc13.x86_64 Awesome news from comment #62! And you can set it to 0 too so it won't even schedule outside of cond_resched. In addition to be able to reproduce fast, this is great hint as it means we've only to focus on the change from 4k to 2M on the same guest virtual address. NOTE: see mm/huge_memory.c:collapse_huge_page() and search for the pmdp_clear_flush_notify. I am only setting the pmd (regular pmd) to NULL and then flushing the tlb (the tlb flush for pmd isn't run with invlpg as there was some errata, I'm doing a safer cr3 overwrite inside pmdp_clear_flush_notify with ipis on all cpus with the active_mm) and I'm not touching the pte! Yet writing zero in the pmd must drop all ptes too. I guess that is what may be going wrong. Later (after writing zero in the regular pmd and writing self to cr3 with ipis on the relevant cpus, not necessarily current as it runs from kernel thread) I simply write again on the pmdp with the new hugepmd value with PSE set. So the only thing that can be going wrong is that writing zero in the pmd and flushing the tlb, must get rid of all underlying 4k sptes too. good: kvm.git next (cb7eaecb3389c7fa2490ea1bee8f10cfa5df30d4) bad: 2.6.35-rc5+ (2f7989e) good: kvm.git 2b2e379 indeterminate: kvm.git 83e2e42 (probable unrelated kernel issue) bad: kvm.git cda5dcb 06f334e2b509b4c9f6c4cec7e0e56444a2730922 is the first good commit commit 06f334e2b509b4c9f6c4cec7e0e56444a2730922 Author: Xiao Guangrong <xiaoguangrong.com> Date: Wed Jun 30 16:02:45 2010 +0800 KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong.com> Signed-off-by: Marcelo Tosatti <mtosatti> Bisect log: (inverted, good=bad and vice versa) # bad: [8dea5648467102184c65d61cf2be6e0fbfa41060] KVM: VMX: fix tlb flush with invalid root # good: [83e2e428db2c9f40c52f3f7764feec974e322183] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 git bisect start 'HEAD' '83e2e42' 'arch/x86/kvm' # good: [a63e16c655f9e68d49d6fae4275ffda16b1888b2] KVM: Prevent internal slots from being COWed git bisect good a63e16c655f9e68d49d6fae4275ffda16b1888b2 # good: [a0a7ccde2fe285f4cbb71eeab3c9b7f7bb68231a] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices git bisect good a0a7ccde2fe285f4cbb71eeab3c9b7f7bb68231a # bad: [372f84cecff2af0c5a14ebaef9563b1a2e2acfdb] KVM: MMU: fix forgot to flush all vcpu's tlb git bisect bad 372f84cecff2af0c5a14ebaef9563b1a2e2acfdb # bad: [06f334e2b509b4c9f6c4cec7e0e56444a2730922] KVM: MMU: fix conflict access permissions in direct sp git bisect bad 06f334e2b509b4c9f6c4cec7e0e56444a2730922 # good: [52403eac7dadaef462954e0a680149d5d8536fac] KVM: MMU: fix writable sync sp mapping git bisect good 52403eac7dadaef462954e0a680149d5d8536fac Andrea, does THP support PROT_READ pages? That is, will it set a pmd with the writeable bit clear? Patch fixes upstream, so looking good. Yes THP supports wrprotected shared anon pages with only PROT_READ set. It's identical to regular anon pages, but huge. But shared anon hugepages are only generated by fork, or if they're setup with mprotect/mmap without PROT_WRITE. Ok, that explains how the patch fixes the issue (could also be kvm picking up an existing kernel mapping for the new user mapping). https://brewweb.devel.redhat.com/taskinfo?taskID=2602160 Doesn't end for some reason. to comment #74, I guess khugepaged was only needed to create more anon hugepages, so that more frequently fork would generate the readonly anon hugepages. Otherwise khugepaged would only work on writable ptes and create writable huge pmd. Not sure I understand how kvm could pick an kernel mapping huge pmd for the user mapping, even if they both spte points to the same host physical address, they should always be at different guest virtual addresses. fork would instead generate the same guest virtual address. But they would be all readonly if they all point to the same host physical address. There are multiple failure modes for this: 1. guest kernel maps lowmem using huge pages 2. guest kernel touches page 3. kvm instantiates direct map for huge page with kernel mode access, since the guest huge page is mapped to host small pages 4. guest kernel maps same page to userspace using huge page 5. guest userspace touches page 6. kvm uses existing direct map from step 3 instead of generating a new one 7. guest userspace retries touching the page, #PF because it has kernel permissions The alterative scenario is a read-only map at step 3 reused for a rw mapping in step 6 (leading to #PF) or a rw mapping at step 3 reused for a ro mapping in step 6 (corruption, what fun). oh, direct maps are not indexed by virtual address but by guest physical address. (and indirect maps are not indexed by virtual address, instead they are indexed by the guest physical address of the page table used to map) I could have translated the source to optimized machine code by hand faster, but the build is complete: https://brewweb.devel.redhat.com/taskinfo?taskID=2602160 Please test (RHEL 6 host). Good news. That new kernel build in brew seems a lot better. On my Fedora 13 workstation, the RHEL 6 beta 2 DVD now installs without issue. Also installed the kernel on a RHEL 6 beta 2 host itself (with EPT=N), and then installed the RHEL 6 beta 2 DVD multiple times in that. No problems at all. On the Fedora side of things, any idea how long until the fix will make it's way to the public? Wondering if we should put this kernel on a testing page (ie someone.fedorapeople.org/kernels/) for people to get early access to test it? Specifically thinking of the guy from BZ #610911 here: https://bugzilla.redhat.com/show_bug.cgi?id=610911 I put an F13 kernel on http://people.redhat.com/akivity/. However I get a silly welcome page instead of a directory listing, perhaps it's a cache thing. Also: shell.devel.redhat.com:~akivity/kernel-2.6.33.6-147.avi.fc13.x86_64.rpm Thanks Avi. The people.redhat.com server gave me the welcome page greeting too, but shell.devel.redhat.com worked better. Tried the package locally here, but couldn't get X running without matching -devel & -headers packages (to recompile nVidia drivers locally), so not able to test it. Would you be able to put the matching -devel & -headers package on shell.devel.redhat.com? I'll then test it here, and copy the packages externally for the guy to test with. -devel and -headers now on shell.devel. Who's going to backport to rhel5, do we need a new bug for that? BZ #615225 is a clone of this, for RHEL 5: https://bugzilla.redhat.com/show_bug.cgi?id=615225 Thanks Avi. They're available publicly here for people now: http://justinclift.fedorapeople.org/bz610911/ And updated BZ #610911 to point Scott at them. *** Bug 616454 has been marked as a duplicate of this bug. *** 1. Tried to install rhel6 guest for 8 times with with host kernel as kernel-2.6.32-44.2.el6 and transparent hugepage is on, new crash was found during guest installation steps: 1) Install guest with tree RHEL6.0-20100622.1: # /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 4G -smp 4 -uuid `uuidgen` -monitor stdio -rtc base=localtime -usbdevice tablet -drive file=test.qcow2,if=none,format=qcow2,werror=stop,rerror=stop,id=drive-virtio0-0-0,boot=on,cache=none -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virtio0-0-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=20:20:20:56:42:19 -cpu qemu64,+x2apic -vnc :10 -boot n 2) at the anaconda install wizard, select 'Basic-Server' and customize it by adding "Desktop" 2. load guest with transparent huge is on for 1h, no crash was found for ((;;)) do dd if=/dev/uramdom of=/test bs=1M count=6000 rm -rf /test done num=$processor /proc/cpuinfo | tail -n1 | awk '{print $NF}') for cpu in $(seq 0 $num) do taskset -c $cpu yes >/dev/null & done (In reply to comment #89) > 1. Tried to install rhel6 guest for 8 times with with host kernel as > kernel-2.6.32-44.2.el6 and transparent hugepage is on, new crash was found > during guest installation Where is the crash info? *** Bug 612525 has been marked as a duplicate of this bug. *** (In reply to comment #89) > 1. Tried to install rhel6 guest for 8 times with with host kernel as > kernel-2.6.32-44.2.el6 and transparent hugepage is on, new crash was found > during guest installation > > steps: > 1) Install guest with tree RHEL6.0-20100622.1: > # /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 4G -smp 4 -uuid `uuidgen` > -monitor stdio -rtc base=localtime -usbdevice tablet -drive > file=test.qcow2,if=none,format=qcow2,werror=stop,rerror=stop,id=drive-virtio0-0-0,boot=on,cache=none > -device virtio-blk-pci,drive=drive-virtio0-0-0,id=virtio0-0-0 -netdev > tap,id=hostnet0,vhost=on -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=20:20:20:56:42:19 -cpu > qemu64,+x2apic -vnc :10 -boot n > 2) at the anaconda install wizard, select 'Basic-Server' and customize it by > adding "Desktop" > > 2. load guest with transparent huge is on for 1h, no crash was found > for ((;;)) > do > dd if=/dev/uramdom of=/test bs=1M count=6000 > rm -rf /test > done > > num=$processor /proc/cpuinfo | tail -n1 | awk '{print $NF}') > for cpu in $(seq 0 $num) > do > taskset -c $cpu yes >/dev/null & > done sorry, in the first scenario, it should be "no crash was found" Whew. +1 Patch(es) available on kernel-2.6.32-52.el6 Patch(es) available on kernel-2.6.32-52.el6 *** Bug 612853 has been marked as a duplicate of this bug. *** *** Bug 618227 has been marked as a duplicate of this bug. *** I've got a rhel6 kvm guest that does a lot of mock builds. It hasn't managed to get through populating a mock chroot while running 2.6.32-52.el6 until just now, after I set transparent hugepages to 'never'. So methinks there's still a buglet somewhere with transparent hugepage support in guests. (In reply to comment #108) > I've got a rhel6 kvm guest that does a lot of mock builds. It hasn't managed to > get through populating a mock chroot while running 2.6.32-52.el6 until just > now, after I set transparent hugepages to 'never'. So methinks there's still a > buglet somewhere with transparent hugepage support in guests. What's on your host? To fix the bug, 2.6.32-52.el6 needs to be on the host, not the guest. (If the host is Fedora, try http://people.redhat.com/akivity/kernel-2.6.33.6-147.avi.fc13.x86_64.rpm as the host kernel). (In reply to comment #110) > (In reply to comment #108) > > I've got a rhel6 kvm guest that does a lot of mock builds. It hasn't managed to > > get through populating a mock chroot while running 2.6.32-52.el6 until just > > now, after I set transparent hugepages to 'never'. So methinks there's still a > > buglet somewhere with transparent hugepage support in guests. > > > What's on your host? To fix the bug, 2.6.32-52.el6 needs to be on the host, > not the guest. (If the host is Fedora, try > http://people.redhat.com/akivity/kernel-2.6.33.6-147.avi.fc13.x86_64.rpm as the > host kernel). Ah, didn't realize that. The host is indeed Fedora, but Fedora 12, kernel 2.6.32.16-141.fc12.x86_64. Can you post the patch you've added to that F13 kernel somewhere? For this particular system, I'd prefer to just patch atop the latest F12 kernel for now. Never mind, found it. Patch added to 2.6.32.16-153.fc12. Chuck is adding another few things to the f12 tree, will then tag and build for us laggards not on f13 (or 14) yet. ;) *** Bug 596517 has been marked as a duplicate of this bug. *** Local 2.6.32.16-153.fc12 build on my Fedora 12 host with rhel6 guest, transparent hugepages re-enabled in the guest, and things do indeed finally seem to be stable, made it through multiple mock builds last night without incident. *** Bug 617204 has been marked as a duplicate of this bug. *** *** Bug 610227 has been marked as a duplicate of this bug. *** *** Bug 615102 has been marked as a duplicate of this bug. *** *** Bug 619017 has been marked as a duplicate of this bug. *** *** Bug 613917 has been marked as a duplicate of this bug. *** *** Bug 612627 has been marked as a duplicate of this bug. *** *** Bug 626279 has been marked as a duplicate of this bug. *** I think I'm hitting this on RHEV, can someone help me confirm? Should we be expecting this on this version of RHEL 6: http://download.devel.redhat.com/rel-eng/RHEL6.0-RC-4/6.0/Server/x86_64/os/ Hypervisor is [root@rhevh-4 ~]# uname -a Linux rhevh-4.gsslab.rdu.redhat.com 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux [root@rhevh-4 ~]# cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 5.5-2.2 (4.2) qemu-kvm process 9419 ? Sl 43:17 /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate 2010-10-23T13:15:46 -name rhel6 -smp 1,cores=1 -k en-us -m 1024 -boot nc -net nic,vlan=1,macaddr=00:1a:4a:0a:39:0d,model=virtio -net tap,vlan=1,ifname=virtio_12_1,script=no -drive file=/rhev/data-center/b2252e5b-70b9-428c-bd5e-474008b44982/7f888454-f103-4af3-b3ea-29e027c9d638/images/619fecbc-0b63-4fa8-834c-a741953f1865/ce4d01a4-04cc-498c-970b-41d200451226,media=disk,if=virtio,cache=off,serial=a8-834c-a741953f1865,boot=on,format=raw,werror=stop -pidfile /var/vdsm/50b70f81-35b9-4df6-a23d-5628d983ee83.pid -soundhw ac97 -spice sslpassword=,sslciphersuite=DEFAULT,sslcert=/var/vdsm/ts/certs/vdsmcert.pem,sslkey=/var/vdsm/ts/keys/vdsmkey.pem,ssldhfile=/var/vdsm/ts/keys/dh.pem,sslcafile=/var/vdsm/ts/certs/cacert.pem,host=0,secure-channels=main+inputs,ic=on,sport=5888,port=5912 -qxl 1 -cpu qemu64,+sse2,+cx16,+ssse3,+sse4.1 -M rhel5.5.0 -notify all -balloon none -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=5.5-2.2-4.2,serial=FF282989-953E-36C5-80A6-7CB9E0653068_00:1a:64:21:74:6a,uuid=50b70f81-35b9-4df6-a23d-5628d983ee83 -vmchannel di:0200,unix:/var/vdsm/50b70f81-35b9-4df6-a23d-5628d983ee83.guest.socket,server -monitor unix:/var/vdsm/50b70f81-35b9-4df6-a23d-5628d983ee83.monitor.socket,server I'm attaching a screenshot of the error from the guest console and a tarball of /tmp from inside the guest Created attachment 455260 [details]
screenshot of rhel6 guest console showing python exception
Created attachment 455261 [details]
/tmp from inside rhel6 rc4/rc2 guest after python exception
That does look similar to the bug as it first cropped up on RHEL 6 beta 2 hosts with RHEL 6 beta 2 guests. As a thought, that server is running an old kernel from the RHEL 5.5 series: Linux rhevh-4.gsslab.rdu.redhat.com 2.6.18-194.3.1.el5 #1 SMP ... The latest is 2.6.18-194.17.1: Linux localhost.localdomain 2.6.18-194.17.1.el5 #1 SMP ... Looking at the release date of the older kernel, it was from before this bug was known about and fixed. Are you able to update the host server's packages? *** Bug 629671 has been marked as a duplicate of this bug. *** Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |
Created attachment 426588 [details] logs from /tmp. Description of problem: Installation is failing from time to time (we can't reproduce it) in KVM in time packages are being installed. Sometimes it crashes just after package group is selected, sometimes later. I can't tell that it's happening just in KVM, it maybe the same on bare metal. Version-Release number of selected component (if applicable): RHEL6.0-20100622.1, x86_64 anaconda-13.21.50-9.el6 How reproducible: sometimes, without and good log files Steps to Reproduce: 1. Start RHEL6 installation 2. Proceed to stage2 (select any installation source) 3. Leave all options default, proceed to package selection 4. Just click on next and wait if anaconda crashes. Actual results: Installer crasher Expected results: anaconda should be able to finish installation every time. Additional info: