Bug 1411025
Summary: | DHCP server is not ready on time to respond to VMs requests | ||
---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Nadav Goldin <ngoldin> |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
Status: | CLOSED NOTABUG | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | unspecified | CC: | laine, libvirt-maint, ngoldin, rbalakri, ykaul |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1408371 | Environment: | |
Last Closed: | 2017-06-26 06:45:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nadav Goldin
2017-01-07 15:47:56 UTC
I'm cloning this as it seems not related to the IPv6 issue in the original description. The guest image we're using is cirros(a minimal image - 19MB), with NAT configuration. After creating the network configuration, and then starting a transient domain with 'createXML' call, the guest fails to get an IP. From the guest initial boot up logs, I see it failed to acquire an IP with DHCP: Sending discover... Sending discover... Usage: /sbin/cirros-dhcpc <up|down> No lease, failing WARN: /etc/rc3.d/S40-network failed Watching dnsmasq logs, I don't see any DHCP requests from the guest during the bootup time. Some workarounds that resolve this: 1. Rebooting the machine. 2. Restarting the network via the console(with cirros init.d scripts). 3. Adding to the image's bootup sequence another attempt to acquire the IP in a later stage('S49' worked for me) A workaround using the libvirt API that also seems to work: 1. Start the transient domain with flags=libvirt.VIR_DOMAIN_START_PAUSED 2. Sleep for 0.5-1 seconds 3. Issue a domain.resume() call. Waiting between the time the network is created, and the domain's createXML call is made does not help. The STP forward relay is set to 0 in the XML(as suggested, I think), However when running 'brctl showstp' on the created bridge I see the forward delay is not '0': 608b-lago-ni bridge id 8000.fe52c0a8c802 designated root 8000.fe52c0a8c802 root port 0 path cost 0 max age 20.00 bridge max age 20.00 hello 17:21 2.00 bridge hello 17:21 2.00 forward delay 2.00 bridge forward delay 2.00 ageing 17:21 300.00 hello timer 0.13 tcn timer 0.00 topology change timer 0.00 gc timer 101.13 flags vnet0 (1) port id 8001 state forwarding designated root 8000.fe52c0a8c802 path cost 100 designated bridge 8000.fe52c0a8c802 message age timer 0.00 designated port 8001 forward delay timer 0.00 designated cost 0 hold timer 0.00 flags Versions: kernel 4.8.15-200.fc24.x86_64 libvirt-1.3.3.2-1.fc24.x86_64 dnsmasq-2.76-2.fc24.x86_64 qemu-kvm-2.6.2-5.fc24.x86_64 cirros image: http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img Network XML: <network connections='1'> <name>608b-lago</name> <uuid>511a172d-ab80-4231-808b-336d8d773ae1</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='608b-lago-ni' stp='on' delay='0'/> <mac address='52:54:00:a8:5b:33'/> <domain name='lago.local' localOnly='yes'/> <dns forwardPlainNames='yes'> <host ip='192.168.200.2'> <hostname>cirros-image</hostname> </host> <host ip='fd8f:1391:3a82:200::c0a8:c802'> <hostname>cirros-image</hostname> </host> </dns> <ip address='192.168.200.1' netmask='255.255.255.0'> <dhcp> <range start='192.168.200.100' end='192.168.200.254'/> <host mac='54:52:c0:a8:c8:02' name='cirros-image' ip='192.168.200.2'/> </dhcp> </ip> <ip family='ipv6' address='fd8f:1391:3a82:200::1' prefix='64'> <dhcp> <range start='fd8f:1391:3a82:200::c0a8:c864' end='fd8f:1391:3a82:200::c0a8:c8fe'/> <host id='0:3:0:1:54:52:c0:a8:c8:02' name='cirros-image' ip='fd8f:1391:3a82:200::c0a8:c802'/> </dhcp> </ip> </network> Relevant interface from domain XML: <interface type='network'> <mac address='54:52:c0:a8:c8:02'/> <source network='608b-lago' bridge='608b-lago-ni'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> Entire domain XML: <domain type='kvm' id='7'> <name>608b29be-cirros-image</name> <uuid>508107d5-1c4a-45de-94d0-b105b7570bf6</uuid> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <vcpu placement='static'>2</vcpu> <iothreads>1</iothreads> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.6'>hvm</type> <bootmenu enable='no'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>Westmere</model> <topology sockets='2' cores='1' threads='1'/> <feature policy='optional' name='vmx'/> <feature policy='optional' name='svm'/> </cpu> <clock offset='utc'> <timer name='kvmclock'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' discard='unmap'/> <source file='/home/ngoldin/src/nvgoldin.github.com/lago-cirros-img/minimal_vm_nohack_2.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <serial>1</serial> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <controller type='usb' index='0' model='none'> <alias name='usb'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='network'> <mac address='54:52:c0:a8:c8:02'/> <source network='608b-lago' bridge='608b-lago-ni'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/2'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-7-608b29be-cirros-imag/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='cirrus' vram='16384' heads='1' primary='yes'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='none'> <alias name='balloon0'/> </memballoon> <rng model='virtio'> <backend model='random'>/dev/random</backend> <alias name='rng0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </rng> </devices> <seclabel type='dynamic' model='selinux' relabel='yes'> <label>system_u:system_r:svirt_t:s0:c157,c953</label> <imagelabel>system_u:object_r:svirt_image_t:s0:c157,c953</imagelabel> </seclabel> </domain> > The STP forward relay is set to 0 in the XML(as suggested, I think),
> However when running 'brctl showstp' on the created bridge I see the forward
> delay is not '0':
> [...]
> forward delay 2.00 bridge forward delay 2.00
Yes, this is normal. Although libvirt can have the STP delay set to 0, and the kernel allows setting it to 0 (only before the bridge is IFF_UP!), that is internally adjusted to 2.0 when the bridge is marked IFF_UP.
If your guest really times out on DHCP due to a 2 second forwarding delay, then its DHCP timeout is *much* too short.
You can see if this is the problem by setting stp to "off" in the network) (then stop and restart the network before testing again). Turning STP off is the only way to guarantee that the port begins forwarding data immediately when the tap device is marked IFF_UP.
Indeed looks like the CirrOS DHCP client attempts at boot are not enough. > Yes, this is normal. Although libvirt can have the STP delay set to 0, and the kernel allows setting it to 0 (only before the bridge is IFF_UP!), that is internally adjusted to 2.0 when the bridge is marked IFF_UP. Thanks for the explanation. Maybe it is documented and I missed it, if not, I think it should be, because it is misleading to see "..delay='0'/>". > You can see if this is the problem by setting stp to "off" in the network) (then stop and restart the network before testing again). Turning STP off is the only way to guarantee that the port begins forwarding data immediately when the tap device is marked IFF_UP. I'll test and update. (In reply to Nadav Goldin from comment #3) > Indeed looks like the CirrOS DHCP client attempts at boot are not enough. > > > Yes, this is normal. Although libvirt can have the STP delay set to 0, and the kernel allows setting it to 0 (only before the bridge is IFF_UP!), that is internally adjusted to 2.0 when the bridge is marked IFF_UP. > > Thanks for the explanation. Maybe it is documented and I missed it, if not, > I think it should be, because it is misleading to see "..delay='0'/>". > > > > You can see if this is the problem by setting stp to "off" in the network) (then stop and restart the network before testing again). Turning STP off is the only way to guarantee that the port begins forwarding data immediately when the tap device is marked IFF_UP. > > I'll test and update. Any updates? Sorry, I think this can be closed. It was most likely the DHCP timeouts in the CirrOS image. I will test this on the newer CirrOS image and reopen if it looks like a libvirt issue. |