Description of problem: Sometimes pygrub dies with a traceback instead of starting the VM. Version-Release number of selected component (if applicable): xen-3.0.3-120.el5_6.3 ( 5.6 / updates ) xen-3.0.3-132.el5 ( 5.7 / base ) How reproducible: sometimes Steps to Reproduce: 1. install CentOS 5.7 with xen packages and kernel-xen 2. user virt-manager to create a VM; finish the installation normally 3. attempt to start the newly created VM Actual results: [root@pc68 ~]# xm create -c test1 Using config file "/etc/xen/test1". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Traceback (most recent call last): File "/usr/bin/pygrub", line 691, in ? chosencfg = run_grub(file, entry, fs) File "/usr/bin/pygrub", line 547, in run_grub curses.wrapper(run_main) File "/usr/lib64/python2.4/curses/wrapper.py", line 44, in wrapper return func(stdscr, *args, **kwds) File "/usr/bin/pygrub", line 543, in run_main sel = g.run() File "/usr/bin/pygrub", line 418, in run self.run_main(timeout) File "/usr/bin/pygrub", line 451, in run_main self.screen.addstr(20, 5, "Will boot selected entry in %2d seconds" _curses.error: addstr() returned ERR No handlers could be found for logger "xend" Error: Boot loader didn't return any data! Usage: xm create <ConfigFile> [options] [vars] [...] usage instructions printed here Expected results: The VM should start and the standard selection of kernels offered by pygrub should appear Additional info: 1) The VM is started normally after deleting the xen logs: [root@pc68 ~]# ls /var/log/xen/ console/ qemu-dm.3929.log xend-debug.log xen-hotplug.log domain-builder-ng.log qemu-dm.4235.log xend.log [root@pc68 ~]# rm /var/log/xen/*log -f [root@pc68 ~]# xm create -c test1 Using config file "/etc/xen/test1". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Started domain test1 2) The problem was first noticed on several CentOS 5.6 servers, one week ago, after a maintenance window when several physical servers were rebooted to activate the new versions of kernel and xen installed from 5.6/updates.
sorry for the initial component selection, it should have been xen but my my mouse played ugly tricks.
This error happens usualy when terminal is too small to display grub menu. What size is terminal when error occurs? Does it occur when you do not open console (not use -c param)?
(In reply to comment #2) > This error happens usualy when terminal is too small to display grub menu. > > > What size is terminal when error occurs? large enough to work correctly after I delete the logs > Does it occur when you do not open > console (not use -c param)? yes
(In reply to comment #3) > (In reply to comment #2) > > This error happens usualy when terminal is too small to display grub menu. > > > > > > What size is terminal when error occurs? > > large enough to work correctly after I delete the logs > > > > Does it occur when you do not open > > console (not use -c param)? > yes As I have said, the problem was first seen when several VMs did not autostart after the physical machines were rebooted. Odd thing is that I found no obvious reason, on the same server some VMs did start, some did not. And those which would not start ( either with xm create or with xm create -c ) did start after either of the next two actions took place - remove logs - replace the pygrub line in the config with kernel+ramdisk.
ENOSPC on /var/log/xen? (Asking just to exclude it.)
Yes, not enough space on /var partition is probably the problem here as pygrub excract kernel and ramdisk to /var/lib/xen. Can you verify this?
I did think of that but a) Today's event happened on a fresh install of CentOS 5.7 with more than plenty room left in all partitions ( 120 GB disk, 30 GB allocated to /var/lib/xen/images ot of which 20GB taken by the test VM; all the rest went into / ) Meanwhile the machine was reassigned to another purpose and reformatted but if mandatory I can try to postpone the new assignment and recreate the problem. b) one of the servers which failed last week ( still at 5.6 level) has [root@xenh12 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 9.5G 1.3G 8.3G 14% / /dev/md2 4.8G 1.1G 3.4G 25% /var /dev/md1 965M 18M 898M 2% /tmp /dev/md4 99M 17M 78M 18% /boot tmpfs 3.8G 0 3.8G 0% /dev/shm /dev/md5 46G 20G 27G 43% /var/lib/xen/images /dev/sdc5 92G 79G 14G 86% /var/lib/xen/images/stuff none 3.8G 216K 3.8G 1% /var/lib/xenstored [root@xenh12 ~]# du -sh /var/log/xen/ 1.2M /var/log/xen/ No changes occured on this server after I managed to start the VMs, 10 days ago.
From another server that failed: [root@xenh03 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/xen_group-root 4.8G 1.3G 3.5G 27% / /dev/mapper/xen_group-xen_images 281G 248G 19G 94% /var/lib/xen/images /dev/sda1 99M 31M 64M 33% /boot tmpfs 128M 0 128M 0% /dev/shm /dev/mapper/VolGroup02-Vol01 74G 25G 46G 36% /var/lib/xen/images2 none 128M 88K 128M 1% /var/lib/xenstored [root@xenh03 ~]# du -sh /var/log/xen/ 920K /var/log/xen/
Can you please provided following information after fail to create guest: - xend log file (xend.log) - output of `df /var/lib/xen` - output of `xm dmesg` a `dmesg` - /var/log/audit/audit.log
I'll try to reproduce the problem and provide the info that you have asked for. Unfortunately now all the servers are in normal production mode and it is a bit difficult to take the VMs offline for testing purposes.
Created attachment 523129 [details] Fix pygrub when window is too small for curses (In reply to comment #0) > Description of problem: > Sometimes pygrub dies with a traceback instead of starting the VM. > > Version-Release number of selected component (if applicable): > xen-3.0.3-120.el5_6.3 ( 5.6 / updates ) > xen-3.0.3-132.el5 ( 5.7 / base ) > > How reproducible: > sometimes > > Steps to Reproduce: > 1. install CentOS 5.7 with xen packages and kernel-xen > 2. user virt-manager to create a VM; finish the installation normally > 3. attempt to start the newly created VM > > Actual results: > [root@pc68 ~]# xm create -c test1 > > Using config file "/etc/xen/test1". > > Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst > Traceback (most recent call last): > File "/usr/bin/pygrub", line 691, in ? > chosencfg = run_grub(file, entry, fs) > File "/usr/bin/pygrub", line 547, in run_grub > curses.wrapper(run_main) > File "/usr/lib64/python2.4/curses/wrapper.py", line 44, in wrapper > return func(stdscr, *args, **kwds) > File "/usr/bin/pygrub", line 543, in run_main > sel = g.run() > File "/usr/bin/pygrub", line 418, in run > self.run_main(timeout) > File "/usr/bin/pygrub", line 451, in run_main > self.screen.addstr(20, 5, "Will boot selected entry in %2d seconds" > _curses.error: addstr() returned ERR > > No handlers could be found for logger "xend" > It's possible that the the actual error is being masked by this error (which already happened with one or more bugs and it take a longer time to see the actual cause because the real error has been masked by others) that the window size is too small. I'm attaching the patch to switch back to non-interactive behaviour but go on when this happens so if pygrub will meet some other error during the process you should be able to see it with this patch applied. Michal
(In reply to comment #11) > Created attachment 523129 [details] > Fix pygrub when window is too small for curses > > (In reply to comment #0) > > Description of problem: > > Sometimes pygrub dies with a traceback instead of starting the VM. > > > > Version-Release number of selected component (if applicable): > > xen-3.0.3-120.el5_6.3 ( 5.6 / updates ) > > xen-3.0.3-132.el5 ( 5.7 / base ) > > > > How reproducible: > > sometimes > > > > Steps to Reproduce: > > 1. install CentOS 5.7 with xen packages and kernel-xen > > 2. user virt-manager to create a VM; finish the installation normally > > 3. attempt to start the newly created VM > > > > Actual results: > > [root@pc68 ~]# xm create -c test1 > > > > Using config file "/etc/xen/test1". > > > > Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst > > Traceback (most recent call last): > > File "/usr/bin/pygrub", line 691, in ? > > chosencfg = run_grub(file, entry, fs) > > File "/usr/bin/pygrub", line 547, in run_grub > > curses.wrapper(run_main) > > File "/usr/lib64/python2.4/curses/wrapper.py", line 44, in wrapper > > return func(stdscr, *args, **kwds) > > File "/usr/bin/pygrub", line 543, in run_main > > sel = g.run() > > File "/usr/bin/pygrub", line 418, in run > > self.run_main(timeout) > > File "/usr/bin/pygrub", line 451, in run_main > > self.screen.addstr(20, 5, "Will boot selected entry in %2d seconds" > > _curses.error: addstr() returned ERR > > > > No handlers could be found for logger "xend" > > > > > It's possible that the the actual error is being masked by this error (which > already happened with one or more bugs and it take a longer time to see the > actual cause because the real error has been masked by others) that the window > size is too small. I'm attaching the patch to switch back to non-interactive > behaviour but go on when this happens so if pygrub will meet some other error > during the process you should be able to see it with this patch applied. > > Michal The test RPMs are present at: http://people.redhat.com/minovotn/xen/new Michal
Michal, is http://wdl.lug.ro/Image1074.jpg the intended output of the test rpms ? [root@pc69 ~]# rpm -qa xen\* kernel\* kernel-2.6.18-274.el5 kernel-xen-2.6.18-274.3.1.el5 xen-libs-3.0.3-134.el5virttest0.g57d6661 kernel-xen-2.6.18-274.el5 xen-3.0.3-134.el5virttest0.g57d6661
PS: yes, I know rpms should not be there :)
On a more serious note: I am performing tests with xen-3.0.3-134.el5virttest0.g57d6661. Thank you for making the packages available.
The new version of pygrub boots fine. When the window is very small, it proceeds with booting the default selected kernel. Otherwise it beheves as usual. Unfortunately I have not managed yet to reproduce the bug which originated this bug report. What I am 100% sure of is that in all of the occasions when pygrub failed there was enough free space ( 40+ GB on pc68, > 3 GB on the xenhNN servers ). I'll keep trying to create a reproducer.
(In reply to comment #16) > The new version of pygrub boots fine. When the window is very small, it > proceeds with booting the default selected kernel. Otherwise it beheves as > usual. > > > Unfortunately I have not managed yet to reproduce the bug which originated this > bug report. What I am 100% sure of is that in all of the occasions when pygrub > failed there was enough free space ( 40+ GB on pc68, > 3 GB on the xenhNN > servers ). I'll keep trying to create a reproducer. Right Manuel. The thing is that the xen-libs-3.0.3-134.el5virttest0.g57d6661 is not a real patch rather than the triage to help the issue be unmasked to see what's really going on. Please keep trying using my package until you meet the exception/error and please post related details to this bugzilla once you got them. Thanks, Michal
Hi Manuel, any progress with reproducing the problem?
The short answer is no. Unfortunately I could not test on any of the servers where the problem was initially spotted. All of them are production servers and I can not "play" with them at will. I'd love to reboot at least one of them but this was not an option so far.
Closing as no updates on issue and we are not able to proceed without reproducer. In case new data are available feel free to reopen bug with attached data.
Makes sense. I'll reopen the bug if I manage to decommission one of the affected servers and I find a reproducer using CentOS 5.8 Thank you for attention so far.