|
Description
David Juran
2011-09-13 10:55:59 UTC
Created attachment 522888 [details]
dump of when the boot suceeds
Created attachment 522889 [details]
dump of when boot fails
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. I am having a similar issue to David ( Connection timeout when trying to PXE ). a difference w/ David issue, is that if I move my VM NIC from Virtio to e1000 or rtl everything works fine. Pls find some tcpdump from different part s Created attachment 523149 [details]
Dump from the NIC of the VM ( as seen from the hypervisor )
Created attachment 523150 [details]
Dump from the bridge of the hypervisor
Created attachment 523151 [details]
Dump from the tftp server
Hello Olivier and David: Not sure whether your issue is the same as https://bugzilla.redhat.com/show_bug.cgi?id=737448. If possible, would you please try the pxe room in fileshare.englab.nay.redhat.com/pub/section2/kvm/akong/virtio-net.rom to check whether it solves your problem? Thanks i ve tried w/ the new virtio-net.rom and I still have the same issue ( tftp timeout ) (In reply to comment #11) > i ve tried w/ the new virtio-net.rom and I still have the same issue ( tftp > timeout ) Thanks for the test, and from the dump file you provides, it seems tftp server recevice ack of block 0 from client but didn't respond for some reasons. So: 1 Please try boot into a guest an use tftp cmd to check whether you can get the pxelinux.0 from the server. 2 Have a look at syslog of tftp server when pxe boot fails to see whether tftpd complains. 3 The tftp server configuration. Thanks Hello. Looking at Olivier's dumps, I believe his issue is different from the one I opened this Bz about. In Olivier's dump, the boot proceeds past DHCP while in the case this Bz is handling, the boot process never gets past DHCP (see https://bugzilla.redhat.com/attachment.cgi?id=522889) Also, I just tried fileshare.englab.nay.redhat.com/pub/section2/kvm/akong/virtio-net.rom mentioned in comment #10 and PXE now works. Of course this doesn't prove anything since the problem wasn't 100% reproducible in the first place, but at least I managed to boot the VM three consecutive times. Anyhow, I suggest we leave wait a few days and if the problem of booting won't re-occur, I guess we can close this Bz as a dup of Bz 737448. Hi Jason, Please find the answer to your question below. ( let me know, if you would like me to open a new BZ as it seems both issues are unrelated ) 1 Please try boot into a guest an use tftp cmd to check whether you can get the pxelinux.0 from the server. Ok, it works without pb 2 Have a look at syslog of tftp server when pxe boot fails to see whether tftpd complains. It looks normal to me, pls find the extract below. Sep 20 13:29:30 satellite dhcpd: DHCPDISCOVER from 00:1a:4a:03:76:00 via eth0 Sep 20 13:29:30 satellite dhcpd: DHCPOFFER on 10.3.118.18 to 00:1a:4a:03:76:00 via eth0 Sep 20 13:29:31 satellite dhcpd: uid lease 10.3.118.231 for client 00:1a:4a:03:76:00 is duplicate on 10.3.118/24 Sep 20 13:29:31 satellite dhcpd: DHCPDISCOVER from 00:1a:4a:03:76:00 via eth0 Sep 20 13:29:31 satellite dhcpd: DHCPOFFER on 10.3.118.18 to 00:1a:4a:03:76:00 via eth0 Sep 20 13:29:33 satellite dhcpd: uid lease 10.3.118.231 for client 00:1a:4a:03:76:00 is duplicate on 10.3.118/24 Sep 20 13:29:33 satellite dhcpd: DHCPREQUEST for 10.3.118.18 (10.3.118.65) from 00:1a:4a:03:76:00 via eth0 Sep 20 13:29:33 satellite dhcpd: DHCPACK on 10.3.118.18 to 00:1a:4a:03:76:00 via eth0 Sep 20 13:29:34 satellite xinetd[12489]: START: tftp pid=6889 from=10.3.118.18 3 The tftp server configuration. Pls, find my dhcpd.conf attached. FYI, I am using cobbler on a satellite server. Also pls note that tftp works perfectly for Physical server or VM using rtl or e1000 NIC. Created attachment 524011 [details]
dhcpd.conf file
Pls find my dhcpd.conf file.
Regards,
Olivier
Olivier: The dumps I'm encountering look quite different from yours, so I'm guessing you issue is different. If you agree, please do open a seperate bz for your issue. Regarding the original issue, with the rom from http://fileshare.englab.nay.redhat.com/pub/section2/kvm/akong/virtio-net.rom today the booting fails. The dumps (attached, captured on the bridge) show that the conversation goes a bit further but not as far as to TFTP. Created attachment 524184 [details]
Dump captured form the bridge with gPXE testing ROM
(In reply to comment #14) > Hi Jason, > > Please find the answer to your question below. ( let me know, if you would like > me to open a new BZ as it seems both issues are unrelated ) > > 1 Please try boot into a guest an use tftp cmd to check whether you can get the > pxelinux.0 from the server. > Ok, it works without pb > > 2 Have a look at syslog of tftp server when pxe boot fails to see whether tftpd > complains. > It looks normal to me, pls find the extract below. > > Sep 20 13:29:30 satellite dhcpd: DHCPDISCOVER from 00:1a:4a:03:76:00 via eth0 > Sep 20 13:29:30 satellite dhcpd: DHCPOFFER on 10.3.118.18 to 00:1a:4a:03:76:00 > via eth0 > Sep 20 13:29:31 satellite dhcpd: uid lease 10.3.118.231 for client > 00:1a:4a:03:76:00 is duplicate on 10.3.118/24 > Sep 20 13:29:31 satellite dhcpd: DHCPDISCOVER from 00:1a:4a:03:76:00 via eth0 > Sep 20 13:29:31 satellite dhcpd: DHCPOFFER on 10.3.118.18 to 00:1a:4a:03:76:00 > via eth0 > Sep 20 13:29:33 satellite dhcpd: uid lease 10.3.118.231 for client > 00:1a:4a:03:76:00 is duplicate on 10.3.118/24 > Sep 20 13:29:33 satellite dhcpd: DHCPREQUEST for 10.3.118.18 (10.3.118.65) from > 00:1a:4a:03:76:00 via eth0 > Sep 20 13:29:33 satellite dhcpd: DHCPACK on 10.3.118.18 to 00:1a:4a:03:76:00 > via eth0 > Sep 20 13:29:34 satellite xinetd[12489]: START: tftp pid=6889 from=10.3.118.18 > > 3 The tftp server configuration. > Pls, find my dhcpd.conf attached. FYI, I am using cobbler on a satellite > server. Also pls note that tftp works perfectly for Physical server or VM using > rtl or e1000 NIC. Thanks Olivier, as David said, this looks like a different bug, please open a new one. (In reply to comment #16) > Olivier: The dumps I'm encountering look quite different from yours, so I'm > guessing you issue is different. If you agree, please do open a seperate bz for > your issue. > > Regarding the original issue, with the rom from > http://fileshare.englab.nay.redhat.com/pub/section2/kvm/akong/virtio-net.rom > today the booting fails. The dumps (attached, captured on the bridge) show that > the conversation goes a bit further but not as far as to TFTP. Hello David: Form the dump file you provides, I'm not sure whether this does matter, but see something interesting: 1. looks like there are two different dhcp servers (10.9.163.2 and 10.9.163.3), and they try to assign the same ip address(10.9.141.246) to the client. Is this the configuration you want to use? 2. And the server indentifier used in server replies are 10.8.136.105 which is different from server ip (10.9.163.2 and 10.9.163.3), does your tftp/dhcp server have multiple network interfaces? Thanks Hello Jason So the DHCP (and TFTP server) really is 10.8.136.105 (it's a RHEL5 machine). It is however on a different subnet from the VM hence the Juniper router (acting as gateway on 10.9.163.1) is used as a DHCP forwarder. Now this Juniper router really is a cluster of two routers (.2 and .3) which use the gateway IP (.1) as a floating IP (VIP). It seems however that the implementation of the DHCP forwarder in Juniper makes both nodes of the Juniper cluster to forward the DHCP individually and that's why we see the two replies. Apart from that it looks weird, I don't _think_ it should pose a problem that the reply comes twice. Also, this configuration works fine on physical hardware (HP Proliant) (In reply to comment #20) > Hello Jason > > So the DHCP (and TFTP server) really is 10.8.136.105 (it's a RHEL5 machine). It > is however on a different subnet from the VM hence the Juniper router (acting > as gateway on 10.9.163.1) is used as a DHCP forwarder. > Now this Juniper router really is a cluster of two routers (.2 and .3) which > use the gateway IP (.1) as a floating IP (VIP). It seems however that the > implementation of the DHCP forwarder in Juniper makes both nodes of the Juniper > cluster to forward the DHCP individually and that's why we see the two replies. > > Apart from that it looks weird, I don't _think_ it should pose a problem that > the reply comes twice. Also, this configuration works fine on physical hardware > (HP Proliant) Hello David: Thanks for the information, and if possible, would you please try the rom with debugging output in http://fileshare.englab.nay.redhat.com/pub/section2/kvm/jasonwang/virtio-net.rom to see what happens when dhcp session fail? Thanks. Hello Jason. Screenshot attached Created attachment 524550 [details]
Screenshot from trying to boot with debug rom
Hi David, Can you test this bug when it's fixed, I can not reproduce it. Thanks Suqin (In reply to comment #23) > Created attachment 524550 [details] > Screenshot from trying to boot with debug rom Hello David, Is the environment met this issue can be used by me for debugging? Thanks Hi, in order to narrow doen the problem, please try the following two roms to see whether it helps: 1 upstream ipxe rom with debugging enabled: https://bugzilla.redhat.com/attachment.cgi?id=527668, please collect the debug infomation from guest serial 2 our gpxe with upstream virtio-net driver: https://bugzilla.redhat.com/attachment.cgi?id=527821 Thanks Hello David, Henri: Since we are close to snapshot, is there any update on this? Thanks Created attachment 529458 [details]
VirtIO PXE log
Log from serial port when using virtio-net ROM attached previously
Created attachment 529459 [details]
VirtIO PXE tcpdump from the bridge
Tcpdump from bridge when using virtio-net ROM attached previously
These two attached files were captured from VM serial port and bridge when PXE booting over network. (In reply to comment #32) > These two attached files were captured from VM serial port and bridge when PXE > booting over network. Hi Henri: Thanks for the testing. Unfortunately, the debug rom does not provide much more valuable information as we even did not receive an ACK of DHCP discovery which is even worse than the what David see. In order to see what happens, please check and provides: 1 The host network configuration (interfaces, bonding or vlans ...) and make sure the forward delay of bridge is zero checking (/sys/class/net/${the bridge used by guest}/bridge/forward_delay) 2 qemu command line 3 dhcp server log ( usually /var/log/messages ) 4 Please also try to dump in dhcp server to see what happens on server 5 Try to test without vhost by doing modprobe -r vhost_net before starting virt-manager Thanks Jason More updated log files being attached at next posts. 1. /sys/class/net/br1/bridge/forward_delay is/was zero 2. qemu command line: /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu core2duo,+lahf_lm,+dca,+xtpr,+cx16,+tm2,+est,+vmx,+ds_cpl,+pbe,+tm,+ht,+ss,+acpi,+ds -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -name wano -uuid 166a994e-0b88-01af-7403-94bb9b7db4d6 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/wano.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/var/lib/libvirt/images/wano.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=22,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:30,bus=pci.0,addr=0x3,bootindex=1 -chardev file,id=charserial0,path=/tmp/VMpxe-boot.log -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 3. DHCP log file attached, 16 rows. 4. Capture was not successful from DHCP server as too much traffic coming and MAC filtering rules didn't work (router does dhcp-forwarding). 5. Tested and same result. Virt-manager didn't load vhost_net automatically. I hope this updated data will help. Created attachment 529476 [details]
DHCP server log
Created attachment 529477 [details]
tcpdump from br1
Created attachment 529478 [details]
VM console screencapture
Created attachment 529771 [details]
br1 tcpdump (20111024)
tcpdump from vnet0 and br1 made same time with ipxe console log output
Created attachment 529773 [details]
vnet0 tcpdump (20111024)
Created attachment 529774 [details]
virtio-rom log output (20111024)
Created attachment 529778 [details]
ifconfig vnet0 (20111024)
Closing. Not enough info. Maybe it is a setup issue. Seems like the reporter lost interest. We can reopen if the problem returns. |