Bug 483279
Summary: | Starting 5th domU causes dom0 to reboot on RHEL 5.3 AP i386 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Thomas Cameron <tcameron> | ||||||
Component: | kernel-xen | Assignee: | Chris Lalancette <clalance> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.3 | CC: | clalance, rjones, sputhenp, syeghiay, xen-maint | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-04-10 07:33:58 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 330493 [details]
xml file for the guests - all are identically configured
I think this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=454285 although in this case, the dom0 actually rebooted instead of just throwing errors in /var/log/messages. I changed "kernel /xen.gz-2.6.18-128.el5" to "kernel /xen.gz-2.6.18-128.el5 dom0_mem=1024MB" in grub.conf and I was able to boot up 6 total domUs with no issues. Hm, I don't really think it's the same, though. While I (and plenty of others) have run into the "Memory squeeze in netback driver" many times before, it's never caused a reboot before. My guess is that there are 2 separate issues here, and the dom0_mem=1024M is working around both of them somehow. I have 2 requests for testing, if you can: 1) Try out the kernel at http://people.redhat.com/clalance/virttest; it has a patch for the Memory squeeze problem that may or may not help. 2) Either get a serial console output, or a core-dump (via kdump) when the dom0 crashes. That way we can at least see the stack trace that is causing the crash. Thanks, Chris Lalancette It appears that kdump is not available for dom0. http://kbase.redhat.com/faq/docs/DOC-10126 I will have to pick up a serial cable next time I'm at Fry's, I don't have one now. I installed the test kernel and removed the "dom0_mem=1024MB" entry from grub.con. When I started the 4th domU I got: Feb 2 14:37:11 molly kernel: xenbr0: topology change detected, propagating Feb 2 14:37:11 molly kernel: xenbr0: port 5(vif3.0) entering forwarding state Feb 2 14:37:29 molly kernel: device vif4.0 entered promiscuous mode Feb 2 14:37:30 molly kernel: xen_net: Memory squeeze in netback driver. Feb 2 14:37:30 molly last message repeated 2 times Feb 2 14:37:31 molly kernel: blkback: ring-ref 8, event-channel 8, protocol 1 (x86_32-abi) Feb 2 14:37:31 molly kernel: xen_net: Memory squeeze in netback driver. Feb 2 14:37:38 molly last message repeated 7 times Feb 2 14:37:40 molly kernel: xenbr0: topology change detected, propagating Feb 2 14:37:40 molly kernel: xenbr0: port 6(vif4.0) entering forwarding state Feb 2 14:37:40 molly kernel: printk: 1 messages suppressed. Feb 2 14:37:40 molly kernel: xen_net: Memory squeeze in netback driver. Feb 2 14:37:47 molly kernel: printk: 548 messages suppressed. Feb 2 14:37:47 molly kernel: xen_net: Memory squeeze in netback driver. Feb 2 14:37:51 molly kernel: printk: 50 messages suppressed. Feb 2 14:37:51 molly kernel: xen_net: Memory squeeze in netback driver. Feb 2 14:38:00 molly kernel: printk: 68 messages suppressed. Feb 2 14:38:00 molly kernel: xen_net: Memory squeeze in netback driver. I was able to start the 5th and 6th domUs successfully, though. I wanted to try exactly what I had done before - kickstarting a 5th domU. Oddly, when I went through the virt-manager interface and tried to kickstart host5 again, it would not allow me to choose a bridged network. None showed up in the drop-down menu. I shut down my guests, rebooted dom0 and tried again. This time, starting the kickstart of the 5th domU caused dom0 to reboot. This time, dom0 rebooted while I was still in the GUI to create the domU, I had not even started the installation yet. After reboot, I still do not have the ability to kickstart a guest on the shared physical network. There is no shared network to choose in virt-manager's installation GUI. I went ahead and kickstarted the 5th domU on the default 192.168.122 network and it did come up. No instanced of the "Memory squeeze in netback driver" error during the kickstart of the 5th domU. Once the 5th domU was built, I was able to start the saved 6th domU with no issues. I am not sure if the test kernel helped. The first time I tried to use it the system did reboot. The second time it seemed to work. I'd like to try to rebuild all of the guests again but they really need to be on the shared physical network, not the 192.168.122 network. dom0 *does* support kdump, since 5.1. That kbase article is just wrong. You'll need to add the "crashkernel" parameter to the hypervisor line (the line that has xen.gz on it), and then you'll need to make sure you start the kdump service. After that, it should work. It doesn't seem like the test kernel made a difference for you, so for now, let's try to stick with the base 5.3 kernel and see what we can figure out there. Hopefully you'll be able to get a successful core dump; once that happens, I can at least look at the trace. Chris Lalancette I've rebooted with the crashkernel section in grub.conf but I am still having problems: [root@molly ~]# cat /boot/grub/grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/sda2 # initrd /initrd-version.img #boot=/dev/sda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title Red Hat Enterprise Linux Server (2.6.18-128.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-128.el5 crashkernel=128M@16M module /vmlinuz-2.6.18-128.el5xen ro root=LABEL=/ module /initrd-2.6.18-128.el5xen.img [root@molly ~]# service kdump propagate Using existing keys... /root/.ssh/kdump_id_rsa.pub has been added to ~kdump/.ssh/authorized_keys2 on 172.31.100.1 [root@molly ~]# service kdump restart Stopping kdump: [ OK ] No kdump kernel image found. [WARNING] Tried to locate /boot/vmlinuz-2.6.18-128.el5PAE Starting kdump: [FAILED] [root@molly ~]# uname -a Linux molly.tc.redhat.com 2.6.18-128.el5xen #1 SMP Wed Dec 17 12:22:24 EST 2008 i686 i686 i386 GNU/Linux Not sure why the kdump service is looking for vmlinuz-2.6.18-128.el5PAE when that is not the kernel I am running. Thoughts? Sorry, this is probably important as well: [root@molly ~]# grep -v "^#" /etc/kdump.conf net kdump.100.1 Oh, right. Yes, you can't kexec *into* a Xen kernel, so the kdump service falls back into the default kernel, which would be PAE. So you need to install the PAE kernel as well, then it can use that. Chris Lalancette ok, got it to reboot again kickstarting the 5th domU. What I did was nuke all of the guests and start fresh, installing to the 192.168.122 network. For some reason I can no longer build domUs with the bridged network, the drop-down is blank. Anyway, I set up netconsole and got it logging on my workstation. When I brought up the 5th domU for kickstart, I got this in the log: Feb 2 12:45:51 172.31.100.3 BUG: unable to handle kernel paging request Feb 2 12:45:51 172.31.100.3 at virtual address e541f000 Feb 2 12:45:51 172.31.100.3 printing eip: Feb 2 12:45:51 172.31.100.3 c04540e9 Feb 2 12:45:51 172.31.100.3 29b44000 -> *pde = 00000000:b7873001 Feb 2 12:45:51 172.31.100.3 28273000 -> *pme = 00000000:3c121067 Feb 2 12:45:51 172.31.100.3 00121000 -> *pte = 00000000:00000000 Feb 2 12:45:51 172.31.100.3 Oops: 0002 [#1] Feb 2 12:45:51 172.31.100.3 SMP Feb 2 12:45:51 172.31.100.3 Feb 2 12:45:51 172.31.100.3 last sysfs file: /class/net/lo/type Feb 2 12:45:51 172.31.100.3 Modules linked in: Feb 2 12:45:51 172.31.100.3 xt_physdev Feb 2 12:45:51 172.31.100.3 netloop Feb 2 12:45:51 172.31.100.3 netbk Feb 2 12:45:51 172.31.100.3 blktap Feb 2 12:45:51 172.31.100.3 blkbk Feb 2 12:45:51 172.31.100.3 ipt_MASQUERADE Feb 2 12:45:51 172.31.100.3 iptable_nat Feb 2 12:45:51 172.31.100.3 ip_nat Feb 2 12:45:51 172.31.100.3 xt_state Feb 2 12:45:51 172.31.100.3 ip_conntrack Feb 2 12:45:51 172.31.100.3 nfnetlink Feb 2 12:45:51 172.31.100.3 ipt_REJECT Feb 2 12:45:51 172.31.100.3 xt_tcpudp Feb 2 12:45:51 172.31.100.3 iptable_filter Feb 2 12:45:51 172.31.100.3 ip_tables Feb 2 12:45:51 172.31.100.3 x_tables Feb 2 12:45:51 172.31.100.3 bridge Feb 2 12:45:51 172.31.100.3 netconsole Feb 2 12:45:51 172.31.100.3 autofs4 Feb 2 12:45:51 172.31.100.3 hidp Feb 2 12:45:51 172.31.100.3 rfcomm Feb 2 12:45:51 172.31.100.3 l2cap Feb 2 12:45:51 172.31.100.3 bluetooth Feb 2 12:45:51 172.31.100.3 sunrpc Feb 2 12:45:51 172.31.100.3 xfrm_nalgo Feb 2 12:45:51 172.31.100.3 crypto_api Feb 2 12:45:51 172.31.100.3 dm_multipath Feb 2 12:45:51 172.31.100.3 scsi_dh Feb 2 12:45:51 172.31.100.3 video Feb 2 12:45:51 172.31.100.3 hwmon Feb 2 12:45:51 172.31.100.3 backlight Feb 2 12:45:51 172.31.100.3 sbs Feb 2 12:45:51 172.31.100.3 i2c_ec Feb 2 12:45:51 172.31.100.3 button Feb 2 12:45:51 172.31.100.3 battery Feb 2 12:45:51 172.31.100.3 asus_acpi Feb 2 12:45:51 172.31.100.3 ac Feb 2 12:45:51 172.31.100.3 lp Feb 2 12:45:51 172.31.100.3 sg Feb 2 12:45:51 172.31.100.3 parport_pc Feb 2 12:45:51 172.31.100.3 i2c_i801 Feb 2 12:45:51 172.31.100.3 parport Feb 2 12:45:51 172.31.100.3 snd_intel8x0 Feb 2 12:45:51 172.31.100.3 snd_ac97_codec Feb 2 12:45:51 172.31.100.3 ac97_bus Feb 2 12:45:51 172.31.100.3 snd_seq_dummy Feb 2 12:45:51 172.31.100.3 snd_seq_oss Feb 2 12:45:51 172.31.100.3 snd_seq_midi_event Feb 2 12:45:51 172.31.100.3 snd_seq Feb 2 12:45:51 172.31.100.3 snd_seq_device Feb 2 12:45:51 172.31.100.3 snd_pcm_oss Feb 2 12:45:51 172.31.100.3 ide_cd Feb 2 12:45:51 172.31.100.3 i2c_core Feb 2 12:45:51 172.31.100.3 serio_raw Feb 2 12:45:51 172.31.100.3 snd_mixer_oss Feb 2 12:45:51 172.31.100.3 Feb 2 12:45:51 172.31.100.3 [<c041e393>] Feb 2 12:45:51 172.31.100.3 [<c0410a4b>] Feb 2 12:45:51 172.31.100.3 [<c061088c>] Feb 2 12:45:51 172.31.100.3 [<c0453ece>] Feb 2 12:45:51 172.31.100.3 pte_alloc_one+0x11/0x29 Feb 2 12:45:51 172.31.100.3 [<c041e393>] Feb 2 12:45:51 172.31.100.3 [<c0405413>] The dom0 machine then rebooted. Ug, that's unfortunate, most of the interesting pieces of the stack trace got truncated. I might be able to squeeze a little bit of information out of this by seeing exactly what eip c04540e9 is; I'll try that tomorrow. In the meantime, if you get a chance to get a serial cable and get a full dump, that would be best. Thanks! Chris Lalancette I had a similar problem yesterday, starting 4 guests (on starting the 4th guest, the host hard rebooted). Host is RHEL 5.3 Xen x86_64: Linux intel-mb 2.6.18-128.el5xen #1 SMP Wed Dec 17 12:01:40 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Guests were RHEL 5, RHEL 4, 32 and 64 bits, all PV. It was while I was starting up the install of the fourth one (RHEL 4 32 bit) that the reboot happened. This looked easily reproducible so ask me if you need more information. Richard - can you capture a dump? I've been travelling heavily so this has been on my backburner. I still don't have a serial cable to capture anything on the console and kdump isn't working for me. Unfortunately I cannot reproduce this now, even starting and stopping lots more domains than before. If it reoccurs I'll try to capture a crashdump. I'm pretty sure this is the same as 479754, so I'm going to close this as a dup. If it turns out to be different, we can re-open. Chris Lalancette *** This bug has been marked as a duplicate of bug 479754 *** |
Created attachment 330492 [details] sosreport from dom0 Description of problem: I am trying to set up 6 RHEL 4.6 i386 domUs on a RHEL 5.3 i386 dom0. I am using kickstart to build them. Four domUs built and run successfully, but when I start the 5th domU, dom0 reboots every time. The only thing odd in /var/log/messages is "kernel: xen_net: Memory squeeze in netback driver" several hundred times (suppressed). The dom0 machine is a Dell Optiplex GX280 with an Intel 2.8GHz processor, 4GB memory and a 250GB SATA drive. I am installing the domUs to 4GB LVM slices /dev/mapper/XenVol-host1, /dev/mapper/XenVol-host2 and so on on dom0. Each domU is set up with 1 virtual CPU and 384MB memory. This should leave plenty of memory for dom0 - even with the lost memory to pci dom0 sees 3.2GB memory, so 5 guests (384*5) should still leave over a gig of memory for dom0. Version-Release number of selected component (if applicable): xen-3.0.3-80.el5 How reproducible: Install 4 domUs then start a 5th. Steps to Reproduce: 1. See above 2. 3. Actual results: dom0 reboots spontaneously Expected results: 5th guest installs and run. Additional info: