Description of problem: PXE booting a LiveCD Image no longer works after upgrading kernel to 3.8.1-201.fc18. The last good kernel I can install is 3.6.10-4 Rebooting into an older kernel and trying the exact same configuration works just fine. Version-Release number of selected component (if applicable): dnsmasq-2.65-4.fc18.x86_64 kernel-3.8.1-201.fc18.x86_64 kernel-3.6.10-4.fc18.x86_64 How reproducible: Setup dnsmasql as a tftp/dhcp server. Try and boot from any livecd image. Steps to Reproduce: 1. setup pxe boot server 2. boot a client via pxe 3. watch the server logs for: Mar 4 22:04:16 recon dnsmasq-tftp[4855]: failed sending /var/lib/tftpboot/pxelinux.0 to 169.254.132.138 Actual results: pxe image isn't sent to client and client errors Expected results: pxe client to boot.
Hi. I'm not able to reproduce your issue. PXE server: ----------- # dnsmasq -d --enable-tftp --tftp-root=/tftproot/tftpboot --dhcp-option=66,"192.168.133.1" --conf-file= --except-interface lo --bind-dynamic --interface eth1 --dhcp-range 192.168.133.128,192.168.133.254 --dhcp-leasefile=/tmp/hosts.leases --dhcp-lease-max=127 --dhcp-no-override --dhcp-boot=pxelinux.0 dnsmasq: started, version 2.65 cachesize 150 dnsmasq: compile time options: IPv6 GNU-getopt DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack dnsmasq-dhcp: DHCP, IP range 192.168.133.128 -- 192.168.133.254, lease time 1h dnsmasq-tftp: TFTP root is /tftproot/tftpboot dnsmasq: reading /etc/resolv.conf dnsmasq: using nameserver 192.168.122.1#53 dnsmasq: read /etc/hosts - 2 addresses dnsmasq-dhcp: DHCPDISCOVER(eth1) 52:54:00:c2:3f:ae dnsmasq-dhcp: DHCPOFFER(eth1) 192.168.133.169 52:54:00:c2:3f:ae dnsmasq-dhcp: DHCPDISCOVER(eth1) 52:54:00:c2:3f:ae dnsmasq-dhcp: DHCPOFFER(eth1) 192.168.133.169 52:54:00:c2:3f:ae dnsmasq-dhcp: DHCPREQUEST(eth1) 192.168.133.169 52:54:00:c2:3f:ae dnsmasq-dhcp: DHCPACK(eth1) 192.168.133.169 52:54:00:c2:3f:ae dnsmasq-tftp: sent /tftproot/tftpboot/pxelinux.0 to 192.168.133.169 dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/102c331e-9486-b2da-d1cf-c39c68fb85d3 not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/01-52-54-00-c2-3f-ae not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0A885A9 not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0A885A not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0A885 not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0A88 not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0A8 not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0A not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C0 not found dnsmasq-tftp: file /tftproot/tftpboot/pxelinux.cfg/C not found dnsmasq-tftp: sent /tftproot/tftpboot/pxelinux.cfg/default to 192.168.133.169 dnsmasq-tftp: sent /tftproot/tftpboot/vesamenu.c32 to 192.168.133.169 dnsmasq-tftp: sent /tftproot/tftpboot/pxelinux.cfg/default to 192.168.133.169 dnsmasq-tftp: sent /tftproot/tftpboot/vmlinuz to 192.168.133.169 dnsmasq-tftp: sent /tftproot/tftpboot/initrd.img to 192.168.133.169 # rpm -qi dnsmasq Name : dnsmasq Version : 2.65 Release : 4.fc18 Architecture: x86_64 ... # uname -r 3.8.1-201.fc18.x86_64 Can you please attach dnsmasq configuration (options) you are using? Also please attach network communication dump if possible. Thank you!
Hi, I get that far too but the boot image doesn't seem to be being sent over the wire. The PXE client then times out and doesn't boot into the OS. My net is as follows: wlan0: dhcp IP of 192.168.1.253/255.255.255.0 eth0: static IP of 169.254.0.1/255.255.0.0 dnsmasq started with a modified copy of your suggestion: dnsmasq -d --enable-tftp --tftp-root=/var/lib/tftpboot \ --dhcp-option=66,"169.254.0.1" \ --conf-file= --except-interface lo \ --bind-dynamic --interface eth0 --dhcp-range 169.254.0.2,169.254.255.254 --dhcp-leasefile=/tmp/hosts.leases \ --dhcp-lease-max=127 --dhcp-no-override --dhcp-boot=pxelinux.0 and this is the result: dnsmasq: started, version 2.65 cachesize 150 dnsmasq: compile time options: IPv6 GNU-getopt DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack dnsmasq-dhcp: DHCP, IP range 169.254.0.2 -- 169.254.255.254, lease time 1h dnsmasq-tftp: TFTP root is /var/lib/tftpboot dnsmasq: reading /etc/resolv.conf dnsmasq: using nameserver 192.168.1.254#53 dnsmasq: read /etc/hosts - 5 addresses dnsmasq-dhcp: DHCPDISCOVER(eth0) 08:00:27:bb:32:40 dnsmasq-dhcp: DHCPOFFER(eth0) 169.254.199.211 08:00:27:bb:32:40 dnsmasq-dhcp: DHCPREQUEST(eth0) 169.254.199.211 08:00:27:bb:32:40 dnsmasq-dhcp: DHCPACK(eth0) 169.254.199.211 08:00:27:bb:32:40 dnsmasq-tftp: error 0 TFTP Aborted received from 169.254.199.211 dnsmasq-tftp: failed sending /var/lib/tftpboot/pxelinux.0 to 169.254.199.211 dnsmasq-tftp: sent /var/lib/tftpboot/pxelinux.0 to 169.254.199.211 dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/78f2b504-2ae2-4d75-b1a3-103e48e325ac not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/01-08-00-27-bb-32-40 not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9FEC7D3 not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9FEC7D not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9FEC7 not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9FEC not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9FE not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9F not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A9 not found dnsmasq-tftp: file /var/lib/tftpboot/pxelinux.cfg/A not found dnsmasq-tftp: sent /var/lib/tftpboot/pxelinux.cfg/default to 169.254.199.211 dnsmasq-tftp: failed sending /var/lib/tftpboot/vmlinuz0 to 169.254.199.211 As you can see, it part works as the PXE client sents back the UUID (78f2b504-2ae2-4d75-b1a3-103e48e325ac). Rich
In my case the client starts booting without any problem. I'm concerned by the following line in your dnsmasq output: > dnsmasq-tftp: error 0 TFTP Aborted received from 169.254.199.211 It looks more like your client is doing something wrong. Can you please attach network communication dump between PXE server and PXE client? Thank you
> I'm concerned by the following line in your dnsmasq output: > > dnsmasq-tftp: error 0 TFTP Aborted received from 169.254.199.211 This is reported with the older kernels too. > It looks more like your client is doing something wrong. That maybe so, but it still boots with older kernels. The only change I've made is booting the PXE server into a newer kernel. Everything else is fine. If I reboot into the older kernel it all works just fine. > Can you please attach network communication dump between PXE server > and PXE client? I will see if I can create on this weekend. Thanks, Rich
(In reply to comment #4) > > I'm concerned by the following line in your dnsmasq output: > > > dnsmasq-tftp: error 0 TFTP Aborted received from 169.254.199.211 > > This is reported with the older kernels too. > > > It looks more like your client is doing something wrong. > > That maybe so, but it still boots with older kernels. > The only change I've made is booting the PXE server into a newer kernel. > Everything else is fine. If I reboot into the older kernel it all works just fine. > > > Can you please attach network communication dump between PXE server > > and PXE client? > > I will see if I can create on this weekend. Great. Can you please also attach your "default" file from "pxelinux.cfg" directory in TFTP server root? Thanks!
Created attachment 708418 [details] tftp boot default file Attached is the default pxe boot config file created by livecd-iso-to-pxeboot
Created attachment 708421 [details] tcpdump of the broken kernel setup. This is a TCP Dump of the boot process with the broken kernel.
Created attachment 708422 [details] tcpdump of the working kernel setup. This is a TCP dump with the older working kernel. The only difference between the two on system setup is what kernel is running.
I have attached the required tcp dumps. This all works with the following: VirtualBox Host Only Adapter Both my HP Laptop and my Lenovo Laptop work using eth0 on the PXE server and a cross over cable between the two. And none of the above devices boot when the PXE server is booted with the newer kernel. I have just tried this weekends released kernel (3.8.2-206.fc18.x86_64) and its still the same ;-(
(In reply to comment #9) > I have attached the required tcp dumps. > > This all works with the following: > > VirtualBox Host Only Adapter > > Both my HP Laptop and my Lenovo Laptop work using eth0 on the PXE server and > a cross over cable between the two. > > And none of the above devices boot when the PXE server is booted with the > newer kernel. > > I have just tried this weekends released kernel (3.8.2-206.fc18.x86_64) and > its still the same ;-( I have so far no idea where might be the problem. I think it is the kernel or some configuration (PXE clients) issue. I want to ask you if you could install dnsmasq [1] with extra debugging output and run it on "bad" kernel. Then please add the output here as a comment or attachment. Thank you [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=5127230
Hi, Do I need to do anything different to what I did in Comment #2 when running this new version to get the extra debugging info. Otherwise all I get is the same as in Comment #2. I think we could assume from this testing that its defo a kernel issue as my only current solution to fix this is to use a specific (older) kernel version and if I boot into a newer kernel one it all stops working. If there's anything else I can do to help, please shout. Rich
(In reply to comment #11) > Hi, > > Do I need to do anything different to what I did in Comment #2 when running > this new version to get the extra debugging info. Otherwise all I get is the > same as in Comment #2. All you need is to install the build I provided in comment #10 and run dnsmasq with exactly the same options as in comment #2. You should see some extra lines in dnsmasq output. If you don't see anything different than in comment #2 please check also /var/log/messages. > I think we could assume from this testing that its defo a kernel issue as my > only current solution to fix this is to use a specific (older) kernel > version and if I boot into a newer kernel one it all stops working. I will most probably change the component to kernel, but want to check if the cause is in dnsmasq since this issue is very strange. Thanks!
I installed the RPMS from here: http://koji.fedoraproject.org/koji/buildinfo?buildID=402775 But it gave no additional output.
Changing the component to Kernel.
I've just tried this with kernel-3.8.3-203.fc18.x86_64 and tftp-server-5.2-6.fc18.x86_64 and is till fails on 3.8.x kernels so it seems to be a kernel thing and not a pxe/tftp configuration issue.
Still a problem on 3.8.4-202.fc18
Still a problem on 3.8.6-203.fc18
Still a problem on 3.8.7-201.fc18.x86_64
When you say "on 3.8.x", do you mean the kernel running on the tftp server machine is 3.8.x? You mentioned virtual box in comment #9. Are you trying to tftp a kernel in a virtual box guest? If you could explain a bit more about your setup and exactly what is running where, that would be helpful. Tomas, can you elaborate on why you think this is a kernel problem? Do you think this is some kind of ethernet driver issue corrupting packets, or?
Hi, The kernel running on the tftp server is having issues sending PXE images out when its running a kernel 3.8.x or above. kernel-3.7.9-205.fc18.x86_64 and below were ok. Basically I have a server setup with both Virtual Box (on vboxnet0) and the motherboards eth0 used for PXE booting. eth0 is a AR8131 Gigabit Ethernet nic and vboxnet0 is the standard VirtualBox interface. I was initially using virtual box only and that stopped working so I setup eth0 and used a HP laptop and a Lenovo laptop to test PXE booting with. Neither of these work with the 3.8.x kernels, but are fine with the 3.7 and 3.6 kernels. I believe this is a kernel issue as the only change I need to make to my system to get things to work is to reboot into a kernel lower than 3.8.x i.e. 3.6.10-4.fc18.x86_64. I have tried both tftp-server-5.2-6.fc18.x86_64 and dnsmasq-2.65-5.fc18.x86_64 for sending ftfp files and neither work with the 3.8.x kernels on the server side but both are ok with the 3.6.x kernels. Thanks, Rich
oh, and yes, I was booting a virtualbox guest VM via PXE from the tftp server running on the same host node.
(In reply to comment #21) > oh, and yes, I was booting a virtualbox guest VM via PXE from the tftp > server running on the same host node. Is that what you are always trying to boot? Does PXE booting of an actual machine work?
No, I test PXE booting with both the laptops and also VM's. When it doesn't work neither pyhsical machines connected via eth0 or VM's connected on vboxnet0 boot with a 3.8.x kernel. When using any kernel lower than 3.8.x both pyhsical and VM's boot ok.
Are the vbox modules still loaded on the server machines? If so, could you try this without loading any 3rd party modules? Nobody else has reported issues with PXE booting on 3.8.x and we're at a loss as to why you would be the only person seeing this.
FWIW, looking at the tcpdump, it appears that the tftp server, after getting an ACK to the read request for block 634 (out of 2000-some-odd), it just stops sending frames. So if it is a kernel problem I would imagine that other network services would also stop working - i.e. you wouldn't be able to ping the server from other operational systems. But it doesn't sound like thats the case (please correct me if I'm wrong). I would suggest disabling the tftp service in dnsmasq and installing an alternate tftp server (I use the tftp-server package, which provides tftpd). If that daemon is capable of serving the newer kernels, I would imagine we can conclude that its actually a dnsmasq based problem you're looking at here
I've already tried tftp-server (see comment 15). And tftp is the only network service that stops working with the 3.8.x kernels. SSH etc is all ok. The only thing I do to get the system working again is reboot into a 3.6.x or a 3.7.x kernel. both tftp-server AND dnsmasq both stop working with the newer kernels and are both ok with the older ones.
Ok, thats good information, thank you. That said it would point to either the application getting confused and not sending on the connection (unlikely since it happens on two separate connections), or frames getting dropped in the kernel. To track this down I think we need to capture the following during a failed tftp: 1) strace output from the tftp server process 2) Stats ouput from before and after the tftp operation (cat /proc/net/snmp, ethtool -S <ifname>) 3) Use of dropwatch would also be a great help in pinpointing this issue (please let me know if you need usage instructions) 4) /var/log/messages taken from after the failed tftp may also provide clues. Thanks!
ping, any feedback here?
Hi, Sorry for the delay... I gave up on the physical host with issues and I built a new guest VM in VirtualBox (on the dodgy host) and set that up as a PXE server. This now works for PXE booting VM guest nodes. I believe my issue has come from something mixing up when upgrading my FC17 box to FC18 on the fly via fedup and as my VM's work I'm happy for this to be closed as "Won't fix" or "Invalid" as it seems it was my setup that was the issue and not FC18. However, if we want to follow this up "just for fun" I'll happily try and collect the info required. Thanks, Rich
I would like to follow up, unfortunately, I don't have time. If the problem recurrs however, please reopen this bug, and we can pick it up again. Thanks!
I've not found the fix, but I've found the cause of my PXE boot issues breaking with recent kernel releases :-) Running both KeepaliveD and dnsmasq as a tftp server on the same box doesn't work nicely... KeepaliveD manages a public VIP that moves between two servers depending on which one is master. when one of the two machines is made a master, dnsmasq is started to provide PXE Boot services. Anyway, without keepalived running, I am able to PXE boot :-) Does anyone know of a nice alternative to Keepalived?