Bug 1498169

Summary: grubby fatal error: unable to find a suitable template -- with a reproducer
Product: Red Hat Enterprise Linux 7 Reporter: R P Herrold <herrold>
Component: grubbyAssignee: Peter Jones <pjones>
Status: CLOSED WONTFIX QA Contact: Release Test Team <release-test-team-automation>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: adm.fkt.physik, chayang, herrold, juzhang, pjones, qzhang, release-test-team-automation, rharwood, sluo, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1034591 Environment:
Last Closed: 2021-01-15 07:43:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1034591    
Bug Blocks:    
Attachments:
Description Flags
from the noise producing update
none
from the noise peoducing update
none
from the noise producing update
none
from the noise producing update
none
from the noise producing update pre update
none
second instance README-20171109 manifest of saved matter for this report part
none
second instance yum.log-PRE
none
second instance yum.log-POST
none
second instance grubby-PRE
none
second instance grubby-POST
none
second instance grubby_prune_debug-PRE
none
second instance grubby_prune_debug-POST none

Description R P Herrold 2017-10-03 15:45:21 UTC
Created attachment 1333788 [details]
from the noise producing update

+++ This bug was initially created as a clone of Bug #1034591 +++

--- Additional comment from Peter Jones on 2014-01-10 10:48:21 EST ---

Alright, well if you can't reproduce it any more, and since all these log entries seem to be for successful usage, I'm going to close this.  Please re-open with fresh logs if you encounter the problem again.

===============================

You mentioned re-opening when a reproducible case was 'in hand'  I have a reproducible case on this, and if you wish a copy of the VM provoking it, I can get you one without charge (the pre-update image) so you may deploy, test a fix, and then re-start if the fix fails, etc

[root@billing log]# pwd
/var/log
[root@billing log]# ls *log
boot.log  lastlog  maillog  tallylog  yum.log
[root@billing log]# cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core) 
[root@billing log]# 

early on, running a long overdue:
    yum clean all ; yum update

   ...
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : libgcc-4.8.5-16.el7.x86_64                                                                                                               1/436 
  Installing : 1:grub2-common-2.02-0.64.el7.centos.noarch                                                                                               2/436 
  Updating   : centos-release-7-4.1708.el7.centos.x86_64                                                                                                3/436 
...                                                                                                     
  171/436 
  Updating   : kbd-1.15.5-13.el7.x86_64                                                                                                               172/436 
  Installing : kernel-3.10.0-693.2.2.el7.x86_64                                                                                                       173/436 
grubby fatal error: unable to find a suitable template
  Updating   : tuned-2.8.0-5.el7.noarch                                                                                                               174/436 
...                                                                                                179/436 
  Updating   : fail2ban-0.9.7-1.el7.noarch                                                                                                            180/436 
  Updating   : ntp-4.2.6p5-25.el7.centos.2.x86_64                                                                                                     181/436 
  Installing : 1:grub2-2.02-0.64.el7.centos.x86_64                                                                                                    182/436 
  Updating   : sendmail-cf-8.14.7-5.el7.noarch                                                                                                        183/436 
  Updating   : 1:NetworkManager-bluetooth-1.8.0-9.el7.x86_64
...                                                                                          190/436 
  Updating   : audit-2.7.6-3.el7.x86_64                                                                                                               191/436 
  Installing : 1:grub2-tools-efi-2.02-0.64.el7.centos.x86_64                                                                                          192/436 
  Updating   : dracut-config-rescue-033-502.el7.x86_64                                                                                                193/436 
  Updating   : net-tools-2.0-0.22.20131004git.el7.x86_64      

and silent of similar error noise to EOJ

Replaced:
  NetworkManager.x86_64 1:1.4.0-17.el7_3 grub2.x86_64 1:2.02-0.44.el7.centos grub2-tools.x86_64 1:2.02-0.44.el7.centos pygobject3-base.x86_64 0:3.14.0-3.el7

Complete!
[root@billing log]# 

-----------

[root@billing log]# wc -l grubby*
  572 grubby
    4 grubby_prune_debug
  576 total
[root@billing log]# 
grep: mariadb: Is a directory
messages:Oct  3 11:11:04 billing yum[9883]: Installed: 1:grub2-common-2.02-0.64.el7.centos.noarch
messages:Oct  3 11:11:07 billing yum[9883]: Installed: 1:grub2-pc-modules-2.02-0.64.el7.centos.noarch
messages:Oct  3 11:13:42 billing yum[9883]: Updated: grubby-8.28-23.el7.x86_64
messages:Oct  3 11:15:28 billing yum[9883]: Installed: 1:grub2-tools-minimal-2.02-0.64.el7.centos.x86_64
messages:Oct  3 11:15:43 billing yum[9883]: Installed: 1:grub2-tools-extra-2.02-0.64.el7.centos.x86_64
messages:Oct  3 11:15:52 billing yum[9883]: Installed: 1:grub2-tools-2.02-0.64.el7.centos.x86_64
messages:Oct  3 11:15:54 billing yum[9883]: Installed: 1:grub2-pc-2.02-0.64.el7.centos.x86_64
messages:Oct  3 11:20:58 billing yum[9883]: Installed: 1:grub2-2.02-0.64.el7.centos.x86_64
messages:Oct  3 11:21:09 billing yum[9883]: Installed: 1:grub2-tools-efi-2.02-0.64.el7.centos.x86_64
grep: ntpstats: Is a directory


Your sosreport has been generated and saved in:
  /var/tmp/sosreport-rherrold.new-bug-20171003114040.tar.xz

The checksum is: 023ac404ddd79990b2857a725036de69

Comment 2 R P Herrold 2017-10-03 15:48:45 UTC
Created attachment 1333789 [details]
from the noise peoducing update

Comment 3 R P Herrold 2017-10-03 15:49:35 UTC
Created attachment 1333790 [details]
from the noise producing update

Comment 4 R P Herrold 2017-10-03 15:50:02 UTC
Created attachment 1333791 [details]
from the noise producing update

Comment 5 R P Herrold 2017-10-03 15:50:32 UTC
Created attachment 1333792 [details]
from the noise producing update pre update

Comment 6 R P Herrold 2017-10-03 15:51:46 UTC
the SOS report is rather fat and may contain credentials so I do not add it to a public bug

Comment 7 R P Herrold 2017-10-03 15:52:19 UTC
oops

did not attach the size indication

[root@billing log]# ls -alh /var/tmp/sosreport-rherrold.new-bug-20171003114040.tar.xz
-rw-------. 1 root root 11M Oct  3 11:42 /var/tmp/sosreport-rherrold.new-bug-20171003114040.tar.xz

Comment 8 R P Herrold 2017-10-03 15:57:34 UTC
I have retained a local copy so it does not get aged away at:

[herrold@centos-7 1498169]$ pwd
/home/herrold/grubby/1498169
[herrold@centos-7 1498169]$ ls -al
total 10532
drwxrwxr-x. 2 herrold herrold     4096 Oct  3 11:53 .
drwxrwxr-x. 3 herrold herrold     4096 Oct  3 11:53 ..
-rw-r--r--. 1 herrold herrold    32376 Oct  3 11:48 grubby
-rw-r--r--. 1 herrold herrold      296 Oct  3 11:37 grubby_prune_debug
-rw-r--r--. 1 herrold herrold   127502 Oct  3 11:44 messages
-rw-------. 1 herrold herrold 10551828 Oct  3 11:53 sosreport-rherrold.new-bug-20171003114040.tar.xz
-rw-r--r--. 1 herrold herrold    25947 Oct  3 11:38 yum.log
-rw-r--r--. 1 herrold herrold    12843 Oct  3 11:37 yum.log-20151117
[herrold@centos-7 1498169]$ 


the underlying VM is:

CentOS 7 x86_64
Deployed: 2015-11-16
768 MB RAM
12 GB HD
128 kbps BW

Comment 9 R P Herrold 2017-10-03 16:05:23 UTC
internal identifying information

VM Owner: herrold@col
VM Name: vm_36023
Dom0: kvm-n026
VM Arch: x86_64
Virt. Type: kvm
Monthly VM cost: 5200

Comment 10 R P Herrold 2017-10-03 16:06:26 UTC
Please ask if you need any further information, and I can provide it as well

When the next kernel update comes out, I will 'watch' it as well and report new stderr matter as well

Comment 11 R P Herrold 2017-10-03 16:07:51 UTC
snapping a post update backup

Oct  3 12:07:09 secure pmmanLog[8896]: pmmanLog ( | _event_id: 10 | _owner_id: 14 | _vm_id: 772 | _message: [A] VM system quiescent backup (unnamed) has been ordered | _admin: -NULL- | _thread_id:

Comment 12 R P Herrold 2017-11-09 16:50:38 UTC
the problem occurred on another unit, also without a separate /boot (by design in the VM model we are using)

our identifier is: vm_19276 and I have saved the items in the README I will attach in a moment so it does not get aged away

I updated grub*, and kernel-tools*  before letting the kernel upgrade run this time so PRE and POST in the file snapshots from /var/log

Comment 13 R P Herrold 2017-11-09 16:52:58 UTC
Created attachment 1350024 [details]
second instance README-20171109 manifest of saved matter for this report part

Comment 14 R P Herrold 2017-11-09 16:54:47 UTC
Created attachment 1350025 [details]
second instance  yum.log-PRE

Comment 15 R P Herrold 2017-11-09 16:55:54 UTC
Created attachment 1350026 [details]
second instance yum.log-POST

Comment 16 R P Herrold 2017-11-09 16:56:31 UTC
Created attachment 1350028 [details]
second instance grubby-PRE

Comment 17 R P Herrold 2017-11-09 16:57:09 UTC
Created attachment 1350029 [details]
second instance grubby-POST

Comment 18 R P Herrold 2017-11-09 16:57:48 UTC
Created attachment 1350030 [details]
second instance grubby_prune_debug-PRE

Comment 19 R P Herrold 2017-11-09 16:58:23 UTC
Created attachment 1350031 [details]
second instance grubby_prune_debug-POST

Comment 20 R P Herrold 2017-11-09 16:59:26 UTC
if there is any way to 'dial up' logging, please let me know and I will use it.

Comment 21 R P Herrold 2017-11-09 17:01:59 UTC
the problem is of course that the newly installed kernel is not being pointed to - -this after a SELinux relabel and reboot

[root@nagios log]# rpm -qa kernel\* ; uname -a
kernel-tools-libs-3.10.0-693.5.2.el7.x86_64
kernel-3.10.0-693.2.2.el7.x86_64
kernel-3.10.0-514.26.2.el7.x86_64
kernel-tools-3.10.0-693.5.2.el7.x86_64
kernel-3.10.0-693.5.2.el7.x86_64
kernel-3.10.0-123.el7.x86_64
kernel-3.10.0-514.10.2.el7.x86_64
Linux nagios.pmman.net 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@nagios log]# date
Thu Nov  9 12:00:53 EST 2017
[root@nagios log]# w
 12:01:37 up 57 min,  1 user,  load average: 0.17, 0.07, 0.10
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    router.owlriver. 11:31    1.00s  0.10s  0.02s w
[root@nagios log]#

Comment 22 R P Herrold 2017-11-20 19:20:56 UTC
two more 'aid to memory' links

https://bugzilla.redhat.com/show_bug.cgi?id=864198

https://github.com/rhboot/grubby/issues/22

The "'there is just one partition' and so looking for a /boot mount" approach is recurring more in new approaches to partition layout.  Is there anything I may do not help here?

Comment 23 R P Herrold 2018-01-04 20:50:56 UTC
With the kernel side channel cache leakage exploits, there are now kernel packages in my 'to be installed' queue'

Is there anything I can do to provide more information to help get this fixed, when running these changes?

Comment 24 R P Herrold 2018-01-30 19:41:28 UTC
unlike bug #1177843

this '7 unit is already at the offered grubby-8.28-23 level

[root@nagios ~]# rpm -q grubby
grubby-8.28-23.el7.x86_64

so, proceeding through the tests at comment #27 in that prior bug

a. - no kernel line

MAYBE -- see the bottom of this update: 

[herrold@centos-7 sysconfig]$ grep kernel grub 
GRUB_CMDLINE_LINUX="vconsole.keymap=us crashkernel=auto  vconsole.font=latarcyrheb-sun16 rhgb quiet rootflags=nouuid"
[herrold@centos-7 sysconfig]$ cd ../../boot/grub2/
[herrold@centos-7 grub2]$ grep kernel *
grub.cfg:       linux16 /boot/vmlinuz-3.10.0-327.10.1.el7.x86_64 root=/dev/vda1 ro vconsole.keymap=us crashkernel=auto  vconsole.font=latarcyrheb-sun16 rhgb quiet rootflags=nouuid
grub.cfg:       linux16 /boot/vmlinuz-3.10.0-123.13.2.el7.x86_64 root=/dev/vda1 ro vconsole.keymap=us crashkernel=auto  vconsole.font=latarcyrheb-sun16 rhgb quiet rootflags=nouuid
grub.cfg:       linux16 /boot/vmlinuz-0-rescue-16290318d1874648e91e93d3be661d78 root=/dev/vda1 ro vconsole.keymap=us crashkernel=auto  vconsole.font=latarcyrheb-sun16 rhgb quiet rootflags=nouuid
[herrold@centos-7 grub2]$ 

b. - access(kernel_path, R_OK) fails 

no -- the permissions are 'stock'


c.  - no root= line and root= isn't on the kernel arguments line

nope

[herrold@centos-7 grub2]$ grep "root=" *
grub.cfg:       set root='hd0,msdos1'
grub.cfg:       linux16 /boot/vmlinuz-3.10.0-327.10.1.el7.x86_64 root=/dev/vda1 ro vconsole.keymap=us crashkernel=auto  vconsole.font=latarcyrheb-sun16 rhgb quiet rootflags=nouuid


d.  - root= can't be resolved to a device

no, as above

perhaps as an aside, 'sosreport' did not save a copy of /etc/mtab
  I will file this separately

also, when unpacking the tarball as a non-root user from sosreport, it failed on the mknode for ./dev.null ... out of scope here, however

-------------

e. - we couldn't parse /etc/mtab

I know there was a normally permissioned /etc/mtab there, but as noted, 'sosreport' did not save a copy, or the copy is later in the sosreport tarball, so it was not reached ... will test by unpacking as root, in a moment

actually I see the mtab detail as to the 'root' device in another part of the sosreport

sos_commands/filesys/df_-al

Filesystem     1K-blocks    Used Available Use% Mounted on
rootfs                 -       -         -    - /
sysfs                  0       0         0    - /sys
proc                   0       0         0    - /proc
devtmpfs          369956       0    369956   0% /dev
securityfs             0       0         0    - /sys/kernel/security
tmpfs             379424       0    379424   0% /dev/shm
devpts                 0       0         0    - /dev/pts
tmpfs             379424   38228    341196  11% /run
tmpfs             379424       0    379424   0% /sys/fs/cgroup
 ... snip a bunch of cgroup stuff
configfs               0       0         0    - /sys/kernel/config
/dev/vda1       12571648 3076084   9495564  25% /
selinuxfs              0       0         0    - /sys/fs/selinux

f. -- UUID matters

no UUID on this unit


As always, if there is other information I can provide, please let me know

I have amended my checklist to manually make copies, pre and post of:

    /etc/mtab
    /etc/sysconfig/grub*

    show the accessibility of the kernel

and show the pre and post (here, post a reboot) of:

    grubby --info=` grubby --default-kernel `

which should show all this more concisely 



INTERESTINGLY, on the unit initially evidencing this, running that command, I get this:

[root@nagios ~]# grubby --info=` grubby --default-kernel `
grubby: kernel not found
[root@nagios ~]# grubby --default-kernel
[root@nagios ~]# rpm -qa kernel
kernel-3.10.0-693.2.2.el7.x86_64
kernel-3.10.0-514.26.2.el7.x86_64
kernel-3.10.0-693.5.2.el7.x86_64
kernel-3.10.0-123.el7.x86_64
kernel-3.10.0-514.10.2.el7.x86_64
[root@nagios ~]# 

[root@nagios ~]# rpm -V kernel
.......T.    /lib/modules/3.10.0-123.el7.x86_64/modules.devname
.......T.    /lib/modules/3.10.0-123.el7.x86_64/modules.softdep
[root@nagios ~]# 

so somewhere along the way, your bullet item 1 seems to have become unresolveable to 'grubby', and the updates stopped being applied ... see that the running kernel is the oldest installed kernel

[root@nagios proc]# cat cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-123.el7.x86_64 root=/dev/vda1 ro vconsole.keymap=us crashkernel=auto vconsole.font=latarcyrheb-sun16 rhgb quiet rootflags=nouuid
[root@nagios proc]# uname -a
Linux nagios.pmman.net 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@nagios proc]#

Comment 25 R P Herrold 2018-02-01 18:53:43 UTC
I pulled the 7.5 beta SRPMs -- do you wish me to build, and then update grubby, and possibly more, before re-testing?

Comment 26 R P Herrold 2018-03-22 20:23:42 UTC
issue observed again today

Comment 27 R P Herrold 2018-06-18 22:51:44 UTC
is this patch suitable for RHEL 7 space?

https://src.fedoraproject.org/rpms/grub2/c/db7cf3a089075af0f4a8b955af508aea38
93465a

(this from the mailing list thread:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/m
essage/UGUSZIBOLV4XUZPKZ3ZTYZ2HJO36KPES/

Comment 29 RHEL Program Management 2021-01-15 07:43:03 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.