Bug 199944
Summary: | XenU has Kernel panic (xennet?) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Russell McOrmond <russell> | ||||
Component: | xen | Assignee: | Herbert Xu <herbert.xu> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5 | CC: | bench, bstein, bugs-redhat, christophe, cpaul, jacob, jan.roehrich, jussi.siponen, katzj, managed, ozone, quentin, quintela, spam, ssnodgra, xen-maint | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | athlon | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-04-18 11:32:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Russell McOrmond
2006-07-24 15:14:30 UTC
I don't know if it is related, but I've experienced problems with programs that were listening on a port no longer accepting connections. Restarting the program allows it to listen again. The program is still running, but the listen() seems to be disconnected. It's not any specific program (happens with sendmail, Apache, OpenLDAP, Cyrus-IMAPD). I have taken one of my two Xen machines and backed out to 2.6.17-1.2145_FC5xen0 (and xenU in the user domains) to see if this will help. Since the problem is so intermittent it is hard to diagnose. I'm seeing the same problem; albeit i haven't yet managed to obtain a kernel trace, I'm suffering the same symptoms. The specifics: * running 2.6.17-1-2157_FC5xenU in DomU's, all running debian * running 2.6.17-1-2157_FC5xen0 in Dom0, under FC5 * one domain has iptables setup; the others weren't using it * this domain is the one which crashes and ends up in a Zombie state - xend.log says this: [2006-07-29 00:41:53 xend.XendDomainInfo] ERROR (XendDomainInfo:1577) VM (VMNAME) restarting too fast (14.881988 seconds since the last restart). Refusing to restart to avoid loops. I've downgraded the xen0 kernels to a locally-compiled version (2.6.16.3) with no iptables modules. I'll see how things go; then upgrade to the FC5xenU kernels without iptables and see how that goes. I attempted to back up to an earlier version of the XenU kernels, as well as ensure that the various TLS/nosegneg issues are dealt with http://www.flora.ca/status/322 kernel-xenU-2.6.17-1.2145_FC5 This did not fix the problem, and intermittant problems where XenU's turn to Zombie's continues. I'm not sure that these are useful, but the output of 'xm console' when the XenU switches to a Zombie state is as follows. ---- Fedora Core release 5 (Bordeaux) Kernel 2.6.17-1.2145_FC5xenU on an i686 newdelhi login: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000ba printing eip: e10d21a0 *pde = ma 10fdd067 pa 1021c067 *pte = ma 00000000 pa fffff000 Oops: 0002 [#1] SMP Modules linked in: ipv6 autofs4 xennet ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables dm_mirror dm_mod CPU: 0 EIP: 0061:[<e10d21a0>] Not tainted VLI EFLAGS: 00010046 (2.6.17-1.2145_FC5xenU #1) EIP is at network_tx_buf_gc+0xb7/0x1aa [xennet] eax: 00000027 ebx: 0000000d ecx: decc0cfc edx: 00000000 esi: 00000001 edi: decc0400 ebp: 0000002a esp: c064dedc ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, threadinfo=c064c000 task=c05ef800) Stack: <0>decc0cfc 00000000 00000000 00000004 decc0000 00128782 00128783 0012877c 00000000 decc0488 decc0400 decc0000 e10d30ea df4a0cc0 00000000 00000000 00000108 c0439ed9 00000108 decc0000 c064df88 c064df88 00000108 c0640800 Call Trace: <e10d30ea> netif_int+0x24/0x66 [xennet] <c0439ed9> handle_IRQ_event+0x42/0x85 <c0439fa9> __do_IRQ+0x8d/0xdc <c040662a> do_IRQ+0x1a/0x25 <c0518f4c> evtchn_do_upcall+0x66/0x9f <c0404d49> hypervisor_callback+0x3d/0x48 <c0407a2f> safe_halt+0x79/0x9c <c0402bde> xen_idle+0x46/0x4e <c0402cfd> cpu_idle+0x94/0xad <c0651772> start_kernel+0x346/0x34c Code: b4 9f 00 09 00 00 50 e8 5a 75 44 df c7 84 9f 00 09 00 00 00 00 00 00 8b 87 f4 00 00 00 89 84 9f f4 00 00 00 89 9f f4 00 00 00 90 <ff> 8d 90 00 00 00 0f 94 c0 83 c4 10 84 c0 74 62 bb 00 e0 ff ff EIP: [<e10d21a0>] network_tx_buf_gc+0xb7/0x1aa [xennet] SS:ESP 0069:c064dedc <0>Kernel panic - not syncing: Fatal exception in interrupt <c0418394> panic+0x3c/0x188 <c04057a0> die+0x246/0x27b <c040ee98> do_page_fault+0x0/0x70f <c040f4a7> do_page_fault+0x60f/0x70f <c040ee98> do_page_fault+0x0/0x70f <c0404d07> error_code+0x2b/0x30 <e10d21a0> network_tx_buf_gc+0xb7/0x1aa [xennet] <e10d30ea> netif_int+0x24/0x66 [xennet] <c0439ed9> handle_IRQ_event+0x42/0x85 <c0439fa9> __do_IRQ+0x8d/0xdc <c040662a> do_IRQ+0x1a/0x25 <c0518f4c> evtchn_do_upcall+0x66/0x9f <c0404d49> hypervisor_callback+0x3d/0x48 <c0407a2f> safe_halt+0x79/0x9c <c0402bde> xen_idle+0x46/0x4e <c0402cfd> cpu_idle+0x94/0xad <c0651772> start_kernel+0x346/0x34c [root@westbengal ~]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 605 2 r----- 829.2 Zombie-newdelhi 7 512 1 ----cd 2844.6 calcutta 5 512 1 -b---- 1801.6 [root@westbengal ~]# ---- While the 'calcutta' server is listed as fine, it is also dead. I have to do a reboot of the entire machine. Is there some new kernel (Xen0 and/or XenU) that I should be testing? It seems that .2145 and .2157 both react the same in this case. *** Bug 201504 has been marked as a duplicate of this bug. *** Just an update. I'm yet to reproduce this problem here so that's why it's taking a bit of time to resolve. There's also a bug fix patch that may be relevant which should make it into an FC5 kernel soon that I'd like you guys to test. Created attachment 134318 [details]
[NET] back: Initialise first fragment properly
This patch should fix the problem. It'll enter the kernels soon.
I hate to be anxious, but do you know when there might be a kernel we can test? A XenU kernel panic'd late last night and it was down again this morning (same "EIP: [<e10d21a0>] network_tx_buf_gc+0xb7/0x1aa [xennet]" as last night). I've been told that the merge should be done on Monday or Tuesday. What's the status on this? (And how would I find out, short of asking this way?) I've been bitten many, many times by a bug that appears to be this one over the weekend. I updated to the latest kernel package which I assumed would have the fix in it, but it Zombie'd again this morning. It may also be useful to note that once this happens, any attempt to shutdown other XenU's stops at the unloading of iptables. It eventually times out with that XenU also being a Zombie. --- Fedora Core release 5 (Bordeaux) Kernel 2.6.17-1.2187_FC5xenU on an i686 pune login: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000d1 printing eip: e10f9206 *pde = ma 15531067 pa 13c93067 *pte = ma 00000000 pa fffff000 Oops: 0002 [#1] SMP Modules linked in: ipv6 autofs4 xennet ip_conntrack_netbios_ns ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables dm_snapshot dm_zero dm_mirror dm_mod CPU: 0 EIP: 0061:[<e10f9206>] Not tainted VLI EFLAGS: 00210046 (2.6.17-1.2187_FC5xenU #1) EIP is at network_tx_buf_gc+0xc4/0x1b7 [xennet] eax: 00000051 ebx: 00000063 ecx: deb38cf8 edx: 00000000 esi: 00000001 edi: deb38400 ebp: 00000041 esp: c0651edc ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, threadinfo=c0650000 task=c05f2800) Stack: <0>deb38cf8 00000000 00000000 00000004 deb38000 0001eb44 0001eb46 0001eb39 00000000 deb38488 deb38400 deb38000 e10fa3b7 c0991360 00000000 00000000 00000107 c043a63d 00000107 deb38000 c0651f88 c0651f88 00000107 c0644780 Call Trace: <e10fa3b7> netif_int+0x24/0x66 [xennet] <c043a63d> handle_IRQ_event+0x42/0x85 <c043a70d> __do_IRQ+0x8d/0xdc <c040665a> do_IRQ+0x1a/0x25 <c051a159> evtchn_do_upcall+0x66/0x9f <c0404d79> hypervisor_callback+0x3d/0x48 <c0407aad> safe_halt+0x84/0xa7 <c0402bde> xen_idle+0x46/0x4e <c0402cfd> cpu_idle+0x94/0xad <c0655772> start_kernel+0x346/0x34c Code: b4 9f fc 08 00 00 50 e8 a0 17 42 df c7 84 9f fc 08 00 00 00 00 00 00 8b 87 f4 00 00 00 89 84 9f f4 00 00 00 89 9f f4 00 00 00 90 <ff> 8d 90 00 00 00 0f 94 c0 83 c4 10 84 c0 74 62 bb 00 e0 ff ff EIP: [<e10f9206>] network_tx_buf_gc+0xc4/0x1b7 [xennet] SS:ESP 0069:c0651edc <0>Kernel panic - not syncing: Fatal exception in interrupt The newer kernel seems to be far worse, and I'm backing out of it. Fedora Core release 5 (Bordeaux) Kernel 2.6.17-1.2187_FC5xenU on an i686 calcutta login: ------------[ cut here ]------------ kernel BUG at net/core/dev.c:1206! invalid opcode: 0000 [#1] SMP Modules linked in: ipv6 xennet ipt_REJECT xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables dm_mirror dm_mod CPU: 0 EIP: 0061:[<c055821a>] Not tainted VLI EFLAGS: 00210297 (2.6.17-1.2187_FC5xenU #1) EIP is at skb_gso_segment+0x29/0xc9 eax: 00000000 ebx: ded26ba4 ecx: 00050003 edx: c05f7700 esi: ded26ba4 edi: 00000008 ebp: df220000 esp: dbb83c60 ds: 007b es: 007b ss: 0069 Process httpd (pid: 1021, threadinfo=dbb82000 task=c01f3870) Stack: <0>00000001 ded26ba4 c0899300 c055938b ded26ba4 00050003 00000001 df220000 ded26ba4 df220180 00000000 c0564e1e ded26ba4 df220000 c010d200 00000000 df220000 dbb82000 ded26ba4 c055af7e df220000 df2f8754 df2f8774 c08992cc Call Trace: <c055938b> dev_hard_start_xmit+0x174/0x203 <c0564e1e> __qdisc_run+0xe0/0x19a <c055af7e> dev_queue_xmit+0x1ce/0x2cc <c0575232> ip_output+0x1b6/0x1ec <c0574ace> ip_queue_xmit+0x374/0x3b3 <c040734f> monotonic_clock+0x30/0x70 <c0440df0> get_page_from_freelist+0x99/0x463 <c05821f3> tcp_transmit_skb+0x5d2/0x602 <c058192f> tcp_snd_test+0x17/0xcc <c0583d08> tcp_push_one+0xb2/0xd4 <c057a466> tcp_sendmsg+0x7a1/0x9cc <c054f6b5> do_sock_write+0xa3/0xac <c055171c> sock_writev+0xab/0xc3 <c042a45b> autoremove_wake_function+0x0/0x3a <c044111b> get_page_from_freelist+0x3c4/0x463 <c045a1d8> do_readv_writev+0x148/0x23a <c040f5b1> do_page_fault+0x414/0x8c1 <c045a74b> sys_writev+0x3b/0x97 <c0404ba7> syscall_call+0x7/0xb Code: eb a6 57 56 53 8b 5c 24 10 8b 83 a0 00 00 00 0f b7 7b 76 83 78 10 00 74 08 0f 0b b5 04 a0 b3 5d c0 8a 43 74 83 e0 0c 3c 04 74 08 <0f> 0b b6 04 a0 b3 5d c0 8b 83 98 00 00 00 8b 53 20 89 43 24 29 EIP: [<c055821a>] skb_gso_segment+0x29/0xc9 SS:ESP 0069:dbb83c60 <0>Kernel panic - not syncing: Fatal exception in interrupt Well, I don't want to jinx it for myself, but I have yet to have a problem with the new kernel crashing my domU. (It did crash my dom0 once, and I haven't figured that one out yet, but at least my domU seems stable now.) I wonder if there is a pattern we can figure out to help trace this bug. I have three servers I'm involved with: One is a "production" server (although with this reliability we risk loosing customers) with 6 XenU's. When the machine starts going to Zombies it has nearly always started with a specific XenU which is a busy LAMP server. The second is a personal box with two XenU's which are LAMP. The third is a personal box with two XenU's which have BIND/SendMail/Cyrus-IMAPD on one and OpenLDAP on the other. The first and second boxes go into Zombie mode often, but the third box (knock on wood) has not experienced many problems at all. I receive a lot of email traffic, and there are times of the day where the Email server is more busy than the LAMP servers, and yet it is the LAMP servers that are crashing. Another friend runs a Xen box in the same colocation facility as the first. This box is primarily running DNS and various shell utilities, and from what I'm told he has not had any of the problems I've observed. I wonder if there is something in what Apache or MySQL does with the network that makes it more likely with servers running these applications to have problems. Actually this problem is already fixed in rawhide. It's just that Xen hasn't been updated in FC5 for quite a while due to the effort being focused on FC6 and RHEL5. Do you have suggested binaries to try that will otherwise integrate with a FC5 environment? I've just been told that there really is going to be an FC5 Xen update now. It's going to be tested tomorrow. This is great news. I tried doing a "--enablerepo=development" and while I saw an update to the regular kernel I didn't see a xen0 or xenU kernel update. I did notice: xen.i386 3.0.2-33 development If you can point us to the binaries to test when they are available (even before the mirrors get it) it would be helpful. For whatever reason we seem to have just the right conditions to be hitting this specific bug. I just checked the FC5 branch and unfortunately the update isn't there yet. Thanks for checking. Just for kicks I tried the kernel-xen, xen and dependencies from the 'development' yum repository. When trying to create the XenU's it gave me a "Error: (22, 'Invalid argument')". Is this enough of a change that I need to install a different kernel in each of the XenU's as well, or did that indicate something else is wrong/missing? The xend logs didn't seem to help much in figuring out the problem. What kernels should be in the xenU's now? Is there a different setup for FC6 (and these development packages) than what was there for FC5? If your domU kernel does not have PAE then you'll need to replace it with one that does have PAE. I experienced similar problems after I moved my xen host and guetst to 2187. I seem to have stablised my situation by rolling the host back to 2.6.17-1.2145_FC5xen0 and leaving the guest at kernel-xenU-2.6.17-1.2187_FC5 This is on x86_64 anyway. Herbert, would be great to know when we can expect that FC5 update ? Chris - I'm having the same problem, but am a bit new to the Fedora world - can you tell me where I can get 2145? Or did you mean 2045 that comes with the base distribution? Many thanks! Quentin, the above trick did not hold for long. Looks like I might need to migrate services out of Xen until there's an update. Would be interested to know if others here have stabilised their FC5 Xen setups, and how? No stabilization here. I'm contemplating moving to FC6t3 because this is supposed to be fixed there.... but I really don't want to go there if I don't have to. That said, I can go days without issues. (Kinda sad "going days without issues" is considered good, though....) We are experiencing the same kernel panic on the 2.6.17-1.2187 kernel on FC5, but we are not using Xen at all, just a plain machine with the standard kernel and no virtual machines. The problem occurs when a significant amount of outbound network IO is processed, such as running 'dmesg' from an ssh session to the machine. Interestingly if ip_conntrack and all of its dependent modules are removed and then added back in, the problem goes away temporarily. We are experiencing the same kernel panic on the 2.6.17-1.2187 kernel on FC5, but we are not using Xen at all, just a plain machine with the standard kernel and no virtual machines. The problem occurs when a significant amount of outbound network IO is processed, such as running 'dmesg' from an ssh session to the machine. Interestingly if ip_conntrack and all of its dependent modules are removed and then added back in, the problem goes away temporarily. There is now a 2.6.18 kernel (2189) in FC5 testing that should resolve this bug. *** Bug 204468 has been marked as a duplicate of this bug. *** Many thanks, Herbert - the 2.6.18 kernel is looking good so far... Managed to get the 2189 dom0 kernel installed, it hasn't crashed yet. But the 2189 domU kernel crashed within minutes, with very little load. Since the crash the xend service will not restart. There seemed to be a mis-match with xen-3.0.2-3.FC5 , so I reverted to an older xen0. The xenU's didn't come up properly, outputting messages that possibly related to the mismatched xen0. 4gb seg fixup, process httpd (pid 1079), cs:ip 73:0085d6b8 printk: 10446 messages suppressed. 4gb seg fixup, process sendmail (pid 1017), cs:ip 73:0032143e printk: 7266 messages suppressed. 4gb seg fixup, process httpd (pid 1128), cs:ip 73:0085d6b8 printk: 37 messages suppressed. I have reverted back to 2.6.17-1.2174 on, I installed the new 'testing' kernel 2.6.18-1.2189.fc5xen0 however when trying to run an xm command I get the following [root@xen1 ~]# xm li Error: Error connecting to xend: No such file or directory. Is xend running? Is there a new xen package to go along with the new kernel? That 2189 testing kernel has resolved the problem for me. For me, service xend start generates the following in /var/log/xend.log: [2006-09-28 12:43:17 xend] INFO (SrvDaemon:283) Xend Daemon started [2006-09-28 12:43:17 xend] INFO (SrvDaemon:287) Xend changeset: unavailable . [2006-09-28 12:43:17 xend] ERROR (SrvDaemon:297) Exception starting xend ((38, 'Function not implemented')) Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 291, in run servers = SrvServer.create() File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvServer.py", line 108, in create root.putChild('xend', SrvRoot()) File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in __init__ self.get(name) File "/usr/lib64/python2.4/site-packages/xen/web/SrvDir.py", line 82, in get val = val.getobj() File "/usr/lib64/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj self.obj = klassobj() File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvDomainDir.py", line 39, in __init__ self.xd = XendDomain.instance() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 609, in instance inst.init() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 76, in init self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 139, in xen_domains domlist = xc.domain_getinfo() Error: (38, 'Function not implemented') For those that are having trouble using the new kernel, please make sure you've upgraded both dom0 and domU kernels. You also need the new xen package since the manage ABI has changed. If you're still having start-up problems with the latest packages, please file a new bug on your problem. Thanks. Hope you don't mind me continuing this thread, but I don't know where a new xen package can be found. I seem to have the latest available, but still no luck. Am I the only one still having problems? $ uname -r 2.6.18-1.2189.fc5xen0 $ yum --enablerepo=updates-testing list xen xen.x86_64 3.0.2-3.FC5 installed # service xend status xend is running # xm list Error: Error connecting to xend: No such file or directory. Is xend running? # service xend stop Stopping xend: [ OK ] [root@syd3 ~]# service xend start Starting xend: [2006-09-28 19:26:27 xend] INFO (SrvDaemon:283) Xend Daemon started [2006-09-28 19:26:27 xend] INFO (SrvDaemon:287) Xend changeset: unavailable . [2006-09-28 19:26:27 xend] ERROR (SrvDaemon:297) Exception starting xend ((38, 'Function not implemented')) Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 291, in run servers = SrvServer.create() File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvServer.py", line 108, in create root.putChild('xend', SrvRoot()) File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in __init__ self.get(name) File "/usr/lib64/python2.4/site-packages/xen/web/SrvDir.py", line 82, in get val = val.getobj() File "/usr/lib64/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj self.obj = klassobj() File "/usr/lib64/python2.4/site-packages/xen/xend/server/SrvDomainDir.py", line 39, in __init__ self.xd = XendDomain.instance() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 609, in instance inst.init() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 76, in init self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 139, in xen_domains domlist = xc.domain_getinfo() Error: (38, 'Function not implemented') OK, please file a new bug report since this is a different problem. Herbert, I think this is stil relevant, and might explain why others have had success. There's another kernel in updates-testing, a -xen (not -xen0): kernel-xen.x86_64 0:2.6.18-1.2189.fc5 I was about to boot this but noticed it boots a PAE kernel by default: title Fedora Core (2.6.18-1.2189.fc5xen) root (hd0,0) kernel /xen.gz-2.6.18-1.2189.fc5-PAE module /vmlinuz-2.6.18-1.2189.fc5xen ro root=/dev/md1 module /initrd-2.6.18-1.2189.fc5xen.img There are a number of other kernels provided in the package, that seem to be non-PAE kernels. If I choose a non-PAE variant thus: title Fedora Core (2.6.18-1.2189.fc5xen) root (hd0,0) kernel /xen.gz-2.6.18-1.2189.fc5 module /vmlinuz-2.6.18-1.2189.fc5xen ro root=/dev/md1 module /initrd-2.6.18-1.2189.fc5xen.img Is the system likely to boot? I don't have physical access to the system, nor do I have an x86_64 machine here to test conclusively. Or should I just stay away from the -xen (non -xen0) package? The PAE xen.gz file is the only one for 64-bit. In fact /xen.gz-2.6.18-1.2189.fc5 (without the PAE suffix) does not exist in the x86-64 package. 64-bit always uses PAE. Is it possible that the needed updated 'xen' is the one in the development repository? I can't test right now, but someone else might want to. Installed Packages xen.i386 3.0.2-3.FC5 installed Available Packages xen.i386 3.0.2-36 development xen.i386 3.0.1-4 core If that is the problem, file a bug to indicate that there should be a dependency listed, and the new xen also added to the 'updates-testing' so that it will be sent to 'updates' at the same time as the new kernels. Thanks Herbert. The kernel-xen package produced the same result as the kernel-xen0 package. On boot service xend status says xend is running, but xm list fails "Error: Error connecting to xend: No such file or directory. Is xend running?" Guess I'll wait and see if the eventual kernel update proper works. Please file a new bug report on this. Thanks. Herbert, a gotcha that needs to be addressed before this kernel is released: With the change to 2.6.18, the raid5 module has been renamed to raid456. mkinitrd is not aware of this resulting in a broken initrd for systems which have / on raid 5. So a dependency on a newer version of mkinitrd will most definitely be needed. I have the same problem as Chris with the update. Details in bug 208529. Please bring up new issues like the raid456 module in new bug reports. Thanks. Is there a timeframe for the new kernel (and the thus far unreleased xen package) to be pushed to updates? *** Bug 209910 has been marked as a duplicate of this bug. *** We're also seeing this problem with a four-processor (dual-processor dual-core) Intel Xeon. We had a very stable machine running for months with zero problems, but recently it's been crashing once per day. Currently running 2.6.17-1.2187_FC5xen0, although I believe the problem also occurred with 2.6.16-1.2096_FC5xen0 (the domUs kernel being in sync with the dom0 both times). For those who are awaiting a fix, we just downgraded our Xen host to 2.6.15-1.2054_FC5xen0 (and downgraded all the domUs to the corresponding xenU kernel) and haven't had a single crash yet. Andre, Watch out for a different Bug 203122 when running one of the older Xen kernels. This seemed to have been fixed in the newer kernels, with the newer kernels having different issues which are believed to be fixed in the most recent (which requires the updated 'xen' package). Is this still reproducible on the latest stable release+updates? Thanks. Not by me... This can be closed, as I have not seen this bug for months now. I know it is easier to know a bug is there than know it is not, but the lack of problems over a long period makes me confident. |