Bug 1186425 - VM falls to start when out of allocated hugepages; worked on f20
Summary: VM falls to start when out of allocated hugepages; worked on f20
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 21
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-27 16:17 UTC by Jerry James
Modified: 2015-03-31 19:31 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-31 19:31:47 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
virsh dumpxml output for the running machine (5.52 KB, text/plain)
2015-01-27 16:17 UTC, Jerry James
no flags Details
virsh dumpxml output for the failed machine (3.93 KB, text/plain)
2015-01-27 16:18 UTC, Jerry James
no flags Details

Description Jerry James 2015-01-27 16:17:54 UTC
Created attachment 984745 [details]
virsh dumpxml output for the running machine

Description of problem:
I have about a dozen VMs.  I can start any one of the VMs, and it works normally.  If I attempt to start any other VM as a second guest, the second one fails to start.  Messages in /var/log/messages hint that there is some kind of problem setting up the network for the second guest.

Version-Release number of selected component (if applicable):
libvirt-1.2.9.1-2.fc21.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Start any one of my VMs.
2. Attempt to start a second VM while the first is still running.
3.

Actual results:
The second VM fails to start.

Expected results:
The second VM should start.

Additional info:
See https://lists.fedoraproject.org/pipermail/virt/2015-January/004210.html for additional information.  I will attach the output of "virsh dumpxml" for an attempt I made this morning.  The first attachment is for the machine I started first (which worked); the second attachment is for the machine I attempted to start second (which failed).

Comment 1 Jerry James 2015-01-27 16:18:37 UTC
Created attachment 984746 [details]
virsh dumpxml output for the failed machine

Comment 2 Jerry James 2015-01-27 16:19:48 UTC
The contents of /var/log/libvirt/qemu/Rawhide64.log for today's failed attempt to start are as follows:

2015-01-27 15:54:12.581+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -name Rawhide64 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu Nehalem,+rdtscp,+pdcm,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 1024 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid db41d1a4-0ba4-4858-bd45-eec371ff084a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rawhide64.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/dev/fedora_diannao/Rawhide64,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8d:f1:4c,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/Rawhide64.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5901,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/7 (label charserial0)
2015-01-27 15:54:12.929+0000: shutting down

Comment 3 Cole Robinson 2015-01-27 19:20:54 UTC
Hmm, annoying that there isn't a qemu error message, and libvirt doesn't give any useful error...

It's also weird that syslog has the back to back systemd messages.

Is this 100% reproducible?
Does it only affect two particular VMs, or any combination of VMs?
Does starting both VMs one after the other also reproduce the issue, or does one need to be running for a long time?

Comment 4 Jerry James 2015-01-27 20:05:17 UTC
This is 100% reproducible.  It affects every combination of VMs I have tried so far.  No matter which one I run first, it starts.  No matter which one I run second, it doesn't start.

I haven't tried starting two very quickly.  I've always had one running for awhile before starting the second one.  I'll try that experiment this afternoon when I'm done with the VM I'm currently using.

Comment 5 Jerry James 2015-01-28 16:17:44 UTC
I figured it out.  Since I nearly always have at least one VM running, I have configured some hugepages on my host machine.  However, since I nearly always have at most one VM running, I only configured enough hugepages to cover a single guest (they all use 1 GB of RAM).  This is what I have in the configs:

  <memoryBacking>
    <hugepages/>
  </memoryBacking>

On Fedora 20, if I ran two guests, the first guest would consume all available hugepages, and the second guest would fall back to regular memory allocation.  That is no longer happening on Fedora 21.  Now the second guest is unable to allocate any memory, and so it exits.

Has the config changed for this?  Do I need to do something different to say "try to allocate hugepages, but fall back to regular memory allocation if there aren't enough hugepages"?

Comment 6 Cole Robinson 2015-01-29 17:37:14 UTC
There have been some libvirt changes WRT hugepages between f20 and f21 versions, but I didn't look at the details. CCing mprivozn

Did you just figure this out by deduction, or did you get libvirt to generate an actual error message here?

Comment 7 Jerry James 2015-01-29 17:56:36 UTC
I went back to /var/log/libvirt/qemu/<machine>.log to compare the messages I used to get with Fedora 20 with the messages I am getting now on Fedora 21.  I saw that back in the Fedora 20 days, when I started up a second machine, I would get this message in the log:

file_ram_alloc: can't mmap RAM pages: Cannot allocate memory

... after which the machine would start anyway.  That isn't happening anymore.  But that gave me the thought that hugepages might have something to do with it, which turned out to be a lucky guess.  So I guess there are two bugs here: (1) lack of a fallback to non-hugepages allocation and (2) lack of a log message indicating that failure to start is due to failure to allocate memory.

Comment 8 Michal Privoznik 2015-01-30 08:17:19 UTC
(In reply to Jerry James from comment #7)
> I went back to /var/log/libvirt/qemu/<machine>.log to compare the messages I
> used to get with Fedora 20 with the messages I am getting now on Fedora 21. 
> I saw that back in the Fedora 20 days, when I started up a second machine, I
> would get this message in the log:
> 
> file_ram_alloc: can't mmap RAM pages: Cannot allocate memory
> 
> ... after which the machine would start anyway.  That isn't happening
> anymore.  But that gave me the thought that hugepages might have something
> to do with it, which turned out to be a lucky guess.  So I guess there are
> two bugs here: (1) lack of a fallback to non-hugepages allocation and (2)
> lack of a log message indicating that failure to start is due to failure to
> allocate memory.

Well, the 1) is not currently possible. I mean, if you set your guest to be backed by hugepages it's your responsibility to make sure there's enough hugepages in the pool. I admit that it's not much user friendly, but it's the best we can do with the current kernel APIs. They are very racy - even if libvirt were to allocate enough hugepages and size up the pool to reflect needs for a guest that's starting, another process may consume them until the qemu mmap()-s them. And making the libvirt silently drop any setting from XML is a big no-no.

As for 2) - yeah, that's nasty. I've written a patch against qemu a while ago to produce more meaningful error message. It's part of v2.2.0 release. What's the qemu version you're running? At any rate, it's qemu that's doing the actual memory allocation (*), which should produce an error message in case of failure. From libvirt's POV we just see qemu process exiting and have no idea why.

* - allocation is used in two meanings: in case of regular system pages allocation makes them accessible for the process. In case of hugepages it means cutting off some (continuously) free RAM and placing it into a pool. If a process wants to use such pages it has to mmap() them. That's what kernel guys thought would be the best approach. I know.

Comment 9 Cole Robinson 2015-01-30 14:52:50 UTC
FYI looks like the qemu error message patch was only released in qemu 2.1.3, but that's in updates-testing now

Comment 10 Cole Robinson 2015-03-31 19:31:47 UTC
The improved qemu error message fix is in F21 now, so this should be less confusing. But aside from that it doesn't sound like there's anything else to do here, so closing. Please reopen if I've misunderstood


Note You need to log in before you can comment on or make changes to this bug.