Bug 736610 - pygrub does not start VMs unless the logs are deleted
Summary: pygrub does not start VMs unless the logs are deleted
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-08 08:40 UTC by manuel wolfshant
Modified: 2012-03-19 07:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-19 06:32:56 UTC


Attachments (Terms of Use)
Fix pygrub when window is too small for curses (1.96 KB, patch)
2011-09-14 11:45 UTC, Michal Novotny
no flags Details | Diff

Description manuel wolfshant 2011-09-08 08:40:30 UTC
Description of problem:
Sometimes pygrub dies with a traceback instead of starting the VM.

Version-Release number of selected component (if applicable):
xen-3.0.3-120.el5_6.3 ( 5.6 / updates ) 
xen-3.0.3-132.el5 ( 5.7 / base )

How reproducible:
sometimes

Steps to Reproduce:
1. install CentOS 5.7 with xen packages and kernel-xen
2. user virt-manager to create a VM; finish the installation normally
3. attempt to start the newly created VM
  
Actual results:
[root@pc68 ~]# xm create -c test1
                                                                   
Using config file "/etc/xen/test1".
                                                                      
Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst
Traceback (most recent call last):                            
  File "/usr/bin/pygrub", line 691, in ?    
    chosencfg = run_grub(file, entry, fs)   
  File "/usr/bin/pygrub", line 547, in run_grub
    curses.wrapper(run_main)
  File "/usr/lib64/python2.4/curses/wrapper.py", line 44, in wrapper
    return func(stdscr, *args, **kwds)
  File "/usr/bin/pygrub", line 543, in run_main
    sel = g.run()                                  
  File "/usr/bin/pygrub", line 418, in run
   self.run_main(timeout)                                     
  File "/usr/bin/pygrub", line 451, in run_main
   self.screen.addstr(20, 5, "Will boot selected entry in %2d seconds"
_curses.error: addstr() returned ERR
         
No handlers could be found for logger "xend"
      
Error: Boot loader didn't return any data!
      
Usage: xm create <ConfigFile> [options] [vars]
[...] usage instructions printed here

Expected results:
The VM should start and the standard selection of kernels offered by pygrub should appear

Additional info:
1) The VM is started normally after deleting the xen logs:
[root@pc68 ~]# ls /var/log/xen/
console/ qemu-dm.3929.log  xend-debug.log xen-hotplug.log domain-builder-ng.log  qemu-dm.4235.log xend.log
[root@pc68 ~]# rm /var/log/xen/*log -f                    
[root@pc68 ~]# xm create -c test1
Using config file "/etc/xen/test1".
Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst 
Started domain test1 
2) The problem was first noticed on several CentOS 5.6 servers, one week ago, after a maintenance window when several physical servers were rebooted to activate the new versions of kernel and xen installed from 5.6/updates.

Comment 1 manuel wolfshant 2011-09-08 08:53:49 UTC
sorry for the initial component selection, it should have been xen but my my mouse played ugly tricks.

Comment 2 Miroslav Rezanina 2011-09-08 12:35:20 UTC
This error happens usualy when terminal is too small to display grub menu.


What size is terminal when error occurs? Does it occur when you do not open console (not use -c param)?

Comment 3 manuel wolfshant 2011-09-08 12:56:24 UTC
(In reply to comment #2)
> This error happens usualy when terminal is too small to display grub menu.
> 
> 
> What size is terminal when error occurs?

large enough to work correctly after I delete the logs


> Does it occur when you do not open
> console (not use -c param)?
yes

Comment 4 manuel wolfshant 2011-09-08 13:00:37 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > This error happens usualy when terminal is too small to display grub menu.
> > 
> > 
> > What size is terminal when error occurs?
> 
> large enough to work correctly after I delete the logs
> 
> 
> > Does it occur when you do not open
> > console (not use -c param)?
> yes

As I have said, the problem was first seen when several VMs did not autostart after the physical machines were rebooted. Odd thing is that I found no obvious reason, on the same server some VMs did start, some did not. And those which would not start ( either with xm create or with xm create -c ) did start after either of the next two actions took place
- remove logs
- replace the pygrub line in the config with kernel+ramdisk.

Comment 5 Laszlo Ersek 2011-09-08 13:20:26 UTC
ENOSPC on /var/log/xen? (Asking just to exclude it.)

Comment 6 Miroslav Rezanina 2011-09-08 13:26:42 UTC
Yes, not enough space on /var partition is probably the problem here as pygrub excract kernel and ramdisk to /var/lib/xen. Can you verify this?

Comment 7 manuel wolfshant 2011-09-08 13:52:08 UTC
I did think of that but 

a) Today's event happened on a fresh install of CentOS 5.7 with more than plenty room left in all partitions ( 120 GB disk, 30 GB allocated to /var/lib/xen/images ot of which 20GB taken by the test VM; all the rest went into / )
Meanwhile the machine was reassigned to another purpose and reformatted but if mandatory I can try to postpone the new assignment and recreate the problem.

b) one of the servers which failed last week ( still at 5.6 level) has

[root@xenh12 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              9.5G  1.3G  8.3G  14% /
/dev/md2              4.8G  1.1G  3.4G  25% /var
/dev/md1              965M   18M  898M   2% /tmp
/dev/md4               99M   17M   78M  18% /boot
tmpfs                 3.8G     0  3.8G   0% /dev/shm
/dev/md5               46G   20G   27G  43% /var/lib/xen/images
/dev/sdc5              92G   79G   14G  86% /var/lib/xen/images/stuff
none                  3.8G  216K  3.8G   1% /var/lib/xenstored
[root@xenh12 ~]# du -sh /var/log/xen/
1.2M    /var/log/xen/

No changes occured on this server after I managed to start the VMs, 10 days ago.

Comment 8 manuel wolfshant 2011-09-08 14:05:46 UTC
From another server that failed:

[root@xenh03 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/xen_group-root
                      4.8G  1.3G  3.5G  27% /
/dev/mapper/xen_group-xen_images
                      281G  248G   19G  94% /var/lib/xen/images
/dev/sda1              99M   31M   64M  33% /boot
tmpfs                 128M     0  128M   0% /dev/shm
/dev/mapper/VolGroup02-Vol01
                       74G   25G   46G  36% /var/lib/xen/images2
none                  128M   88K  128M   1% /var/lib/xenstored

[root@xenh03 ~]# du -sh /var/log/xen/
920K    /var/log/xen/

Comment 9 Miroslav Rezanina 2011-09-09 11:51:58 UTC
Can you please provided following information after fail to create guest:

- xend log file (xend.log)
- output of `df /var/lib/xen`
- output of `xm dmesg` a `dmesg`
- /var/log/audit/audit.log

Comment 10 manuel wolfshant 2011-09-09 12:00:53 UTC
I'll try to reproduce the problem and provide the info that you have asked for. Unfortunately now all the servers are in normal production mode and it is a bit difficult to take the VMs offline for testing purposes.

Comment 11 Michal Novotny 2011-09-14 11:45:46 UTC
Created attachment 523129 [details]
Fix pygrub when window is too small for curses

(In reply to comment #0)
> Description of problem:
> Sometimes pygrub dies with a traceback instead of starting the VM.
> 
> Version-Release number of selected component (if applicable):
> xen-3.0.3-120.el5_6.3 ( 5.6 / updates ) 
> xen-3.0.3-132.el5 ( 5.7 / base )
> 
> How reproducible:
> sometimes
> 
> Steps to Reproduce:
> 1. install CentOS 5.7 with xen packages and kernel-xen
> 2. user virt-manager to create a VM; finish the installation normally
> 3. attempt to start the newly created VM
> 
> Actual results:
> [root@pc68 ~]# xm create -c test1
> 
> Using config file "/etc/xen/test1".
> 
> Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst
> Traceback (most recent call last):                            
>   File "/usr/bin/pygrub", line 691, in ?    
>     chosencfg = run_grub(file, entry, fs)   
>   File "/usr/bin/pygrub", line 547, in run_grub
>     curses.wrapper(run_main)
>   File "/usr/lib64/python2.4/curses/wrapper.py", line 44, in wrapper
>     return func(stdscr, *args, **kwds)
>   File "/usr/bin/pygrub", line 543, in run_main
>     sel = g.run()                                  
>   File "/usr/bin/pygrub", line 418, in run
>    self.run_main(timeout)                                     
>   File "/usr/bin/pygrub", line 451, in run_main
>    self.screen.addstr(20, 5, "Will boot selected entry in %2d seconds"
> _curses.error: addstr() returned ERR
> 
> No handlers could be found for logger "xend"
> 


It's possible that the the actual error is being masked by this error (which already happened with one or more bugs and it take a longer time to see the actual cause because the real error has been masked by others) that the window size is too small. I'm attaching the patch to switch back to non-interactive behaviour but go on when this happens so if pygrub will meet some other error during the process you should be able to see it with this patch applied.

Michal

Comment 12 Michal Novotny 2011-09-14 13:02:59 UTC
(In reply to comment #11)
> Created attachment 523129 [details]
> Fix pygrub when window is too small for curses
> 
> (In reply to comment #0)
> > Description of problem:
> > Sometimes pygrub dies with a traceback instead of starting the VM.
> > 
> > Version-Release number of selected component (if applicable):
> > xen-3.0.3-120.el5_6.3 ( 5.6 / updates ) 
> > xen-3.0.3-132.el5 ( 5.7 / base )
> > 
> > How reproducible:
> > sometimes
> > 
> > Steps to Reproduce:
> > 1. install CentOS 5.7 with xen packages and kernel-xen
> > 2. user virt-manager to create a VM; finish the installation normally
> > 3. attempt to start the newly created VM
> > 
> > Actual results:
> > [root@pc68 ~]# xm create -c test1
> > 
> > Using config file "/etc/xen/test1".
> > 
> > Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst
> > Traceback (most recent call last):                            
> >   File "/usr/bin/pygrub", line 691, in ?    
> >     chosencfg = run_grub(file, entry, fs)   
> >   File "/usr/bin/pygrub", line 547, in run_grub
> >     curses.wrapper(run_main)
> >   File "/usr/lib64/python2.4/curses/wrapper.py", line 44, in wrapper
> >     return func(stdscr, *args, **kwds)
> >   File "/usr/bin/pygrub", line 543, in run_main
> >     sel = g.run()                                  
> >   File "/usr/bin/pygrub", line 418, in run
> >    self.run_main(timeout)                                     
> >   File "/usr/bin/pygrub", line 451, in run_main
> >    self.screen.addstr(20, 5, "Will boot selected entry in %2d seconds"
> > _curses.error: addstr() returned ERR
> > 
> > No handlers could be found for logger "xend"
> > 
> 
> 
> It's possible that the the actual error is being masked by this error (which
> already happened with one or more bugs and it take a longer time to see the
> actual cause because the real error has been masked by others) that the window
> size is too small. I'm attaching the patch to switch back to non-interactive
> behaviour but go on when this happens so if pygrub will meet some other error
> during the process you should be able to see it with this patch applied.
> 
> Michal

The test RPMs are present at: http://people.redhat.com/minovotn/xen/new

Michal

Comment 13 manuel wolfshant 2011-09-14 13:18:58 UTC
Michal, is http://wdl.lug.ro/Image1074.jpg the intended output of the test rpms ?

[root@pc69 ~]# rpm -qa xen\* kernel\*
kernel-2.6.18-274.el5
kernel-xen-2.6.18-274.3.1.el5
xen-libs-3.0.3-134.el5virttest0.g57d6661
kernel-xen-2.6.18-274.el5
xen-3.0.3-134.el5virttest0.g57d6661

Comment 14 manuel wolfshant 2011-09-14 13:20:19 UTC
PS: yes, I know rpms should not be there :)

Comment 15 manuel wolfshant 2011-09-14 13:32:03 UTC
On a more serious note: I am performing tests with xen-3.0.3-134.el5virttest0.g57d6661. Thank you for making the packages available.

Comment 16 manuel wolfshant 2011-09-14 14:05:32 UTC
The new version of pygrub boots fine. When the window is very small, it proceeds with booting the default selected kernel. Otherwise it beheves as usual.


Unfortunately I have not managed yet to reproduce the bug which originated this bug report. What I am 100% sure of is that in all of the occasions when pygrub failed there was enough free space ( 40+ GB on pc68, > 3 GB on the xenhNN servers ). I'll keep trying to create a reproducer.

Comment 17 Michal Novotny 2011-09-19 07:55:42 UTC
(In reply to comment #16)
> The new version of pygrub boots fine. When the window is very small, it
> proceeds with booting the default selected kernel. Otherwise it beheves as
> usual.
> 
> 
> Unfortunately I have not managed yet to reproduce the bug which originated this
> bug report. What I am 100% sure of is that in all of the occasions when pygrub
> failed there was enough free space ( 40+ GB on pc68, > 3 GB on the xenhNN
> servers ). I'll keep trying to create a reproducer.

Right Manuel. The thing is that the xen-libs-3.0.3-134.el5virttest0.g57d6661 is not a real patch rather than the triage to help the issue be unmasked to see what's really going on. Please keep trying using my package until you meet the exception/error and please post related details to this bugzilla once you got them.

Thanks,
Michal

Comment 18 Miroslav Rezanina 2011-10-26 12:24:49 UTC
Hi Manuel,
any progress with reproducing the problem?

Comment 19 manuel wolfshant 2011-10-26 12:39:50 UTC
The short answer is no.

Unfortunately I could not test on any of the servers where the problem was initially spotted. All of them are production servers and I can not "play" with them at will. I'd love to reboot at least one of them but this was not an option so far.

Comment 20 Miroslav Rezanina 2012-03-19 06:32:56 UTC
Closing as no updates on issue and we are not able to proceed without reproducer. In case new data are available feel free to reopen bug with attached data.

Comment 21 manuel wolfshant 2012-03-19 07:02:15 UTC
Makes sense. I'll reopen the bug if I manage to decommission one of the affected servers and I find a reproducer using CentOS 5.8


Thank you for attention so far.


Note You need to log in before you can comment on or make changes to this bug.