RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1411576 - [LLNL 7.4 FEAT] Don't exit if /boot is not writable
Summary: [LLNL 7.4 FEAT] Don't exit if /boot is not writable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kexec-tools
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 7.4
Assignee: Bhupesh Sharma
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1384121 1384257 1446211
TreeView+ depends on / blocked
 
Reported: 2017-01-10 01:59 UTC by Ben Woodard
Modified: 2017-09-21 09:07 UTC (History)
6 users (show)

Fixed In Version: kexec-tools-2.0.14-7.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 09:31:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2300 0 normal SHIPPED_LIVE kexec-tools bug fix and enhancement update 2017-08-01 12:40:58 UTC

Description Ben Woodard 2017-01-10 01:59:49 UTC
Description of problem:
The logic in systemd's kdumpctl file aborts and exits if /boot is not writable. We believe that this is not necessarily the best approach. 

Not being able to rebuild the kdump kernel doesn't mean that the kdump service cannot run, it simply means that whatever kdump kernel is there is the one that must be used. 

In pseudocode:

if /boot is writable
  if kdump.conf includes force_rebuild
    rebuild_kdump_initrd=1
  else if kdump.conf is newer than kdump.initrd
    rebuild_kdump_initrd=1
  else if <whatever other tests>
    rebuild_kdump_initrd=1
  fi
else # /boot is not writable
  if kdump.conf includes force_rebuild
    exit 1 #this is a legitimate error case
  rebuild_kdump_initrd=0
fi

if rebuild_kdump_initrd == 1
  rebuild kdump.initrd
fi

if kernel-version-kdump.initrd does not exist
  # something either went wrong building the kdump initrd
  # or /boot was not writable and a kdump initrd did not exist for this kernel version
  exit 1 
fi
load_kdump_kernel
<...>

The point is just skip all the logic testing if you need to rebuild the kdump initrd if  /boot is not writable. This allows a system administrator to place a suitable kdump initrd in a boot directory and not have to worry about whatever logic you are using to decide if you need to rebuild the kdump initrd. It is there call, it is there choice for what they put there, it is their responsibility. If it doesn't work, it is their fault and they will fix it. You do not need to worry about that. 

As it is, they must go through and reverse engineer all the tests that you make and carefully make sure to disarm them so that the kdump logic ends up deciding that it doesn't need to make a new kdump initrd.

You may ask, why /boot is not writable? These are diskless systems that netboot. 

This is a continuation of previous case. The changes in the logic that you made from 7.2 to 7.3 did not sufficiently address the issue.

Comment 1 Ben Woodard 2017-01-10 02:02:00 UTC
As a hack to work around this problem we are currently running with the following patch.

--- kexec-tools_a/kdumpctl      2016-11-23 10:30:14.196149000 -0800
+++ kexec-tools_b/kdumpctl      2016-11-23 11:06:24.771011000 -0800
@@ -590,13 +595,13 @@
        fi
 
        if [[ ! -w "$KDUMP_BOOTDIR" ]];then
-               echo "$KDUMP_BOOTDIR does not have write permission. Can not rebuild $TARGET_INITRD"
-               return 1
+               echo "$KDUMP_BOOTDIR does not have write permission. Will not rebuild $TARGET_INITRD"
+               return 0
+       else
+               echo "Rebuilding $TARGET_INITRD"
+               rebuild_initrd
+               return $?
        fi
-
-       echo "Rebuilding $TARGET_INITRD"
-       rebuild_initrd
-       return $?
 }

Comment 2 Pratyush Anand 2017-01-10 11:35:11 UTC
(In reply to Ben Woodard from comment #0)
> Description of problem:
> The logic in systemd's kdumpctl file aborts and exits if /boot is not
> writable. We believe that this is not necessarily the best approach. 
> 
> Not being able to rebuild the kdump kernel doesn't mean that the kdump
> service cannot run, it simply means that whatever kdump kernel is there is
> the one that must be used. 
> 


Following is the current implementation:
--------------------------------------------------------------------------------
        if [ $image_time -eq 0 ]; then
                echo  -n "No kdump initial ramdisk found."; echo
        elif [ $DEFAULT_DUMP_MODE == "fadump" ] && [ "$initramfs_has_fadump" -eq "0" ]; then
                echo "$TARGET_INITRD has no fadump support"
        elif [ "$force_rebuild" != "0" ]; then
                echo -n "Force rebuild $TARGET_INITRD"; echo
        elif [ "$system_modified" != "0" ]; then
                :
        else
                return 0
        fi

        if [[ ! -w "$KDUMP_BOOTDIR" ]];then
                echo "$KDUMP_BOOTDIR does not have write permission. Can not rebuild $TARGET_INITRD"
                return 1
        fi
--------------------------------------------------------------------------------

Code flow reaches to check for writeable /boot directory  only when it seems that a rebuild is needed, ie system was modified. So, kdumpctrl will still be successful if there was no rebuild condition true.
Not sure, if it will be a good idea to return success without attempting rebuild when there was a newer kernel available even for RO /boot. Last time [1] we had discussed to fail the service in such scenario.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1322621#c9

Comment 4 Ben Woodard 2017-01-10 17:16:15 UTC
You are missing the point of what I was saying!
You are deciding that a rebuild is necessary BEFORE looking if you can make a rebuild and store it. 

Look at it this way:
The system administrator (i.e. God for that system) has decided that the kdump initrd that he has placed in the read-only /boot directory is one that will work for that system and is appropriate for the kernel. End of story. 

There is ABSOLUTELY NO REASON!!!! that some dumb kdump service script should EVER think that it knows better than the system administrator and through some primitive logic and come to believe a rebuild is appropriate. So first check that /boot is writable and if it is, THEN you can go through all of your logic to decide if you should rebuild.

If /boot is not writable load the kdump initrd for the booted kernel. If it doesn't work, it is not your problem. It is the system administrator's problem. He put it there and made /boot read-only.

Comment 5 Pratyush Anand 2017-01-17 08:05:03 UTC
(In reply to Ben Woodard from comment #4)
> 
> Look at it this way:
> The system administrator (i.e. God for that system) has decided that the
> kdump initrd that he has placed in the read-only /boot directory is one that
> will work for that system and is appropriate for the kernel. End of story. 
>

So it would be fine to have a kdump.conf option like force_not_rebuild. When this option is enabled we load the already available kdump initrd.

Comment 6 Ben Woodard 2017-01-17 20:14:50 UTC
force_not_rebuild is one way to look at but that would add more fiddly options which adds complexity to the configuration, requires documentation...The approach that I've been advocating seems much more elegant. What is wrong with this logic?

if /boot is writable
  if you need to rebuild
    rebuild kdump initrd
else # you can't rebuild even if you wanted to so don't bother testing
  if force_rebuild
    exit with an error

# at this point we either have
# a) an initrd which didn't need to be rebuilt
# b) an initrd which was just rebuilt
# c) an initrd which was provided by the sysadmin in a RO /boot directory

load the kdump initrd
if load_failed
  exit with an error

Comment 7 Dave Young 2017-01-23 06:47:53 UTC
There could be risk to silently go ahead for a readonly mounted /boot, sometimes a corrupt disk will be mounted as ro automaticlly or for some other reason one do not intend to do this. For those cases failout should be better.

Add a force_rebuild would benefit people who want a perticular initrd and no need  any change. For example for me I do not need kernel modules and I always built in most of functionalities in a bzImage, so when I test a new kernel, rebuild kdump initrd is useless.

Comment 8 Pratyush Anand 2017-02-21 05:59:50 UTC
So as per comment 7,it would be good to go with a kdump.conf option like force_not_rebuild. If there is no more conflict then, I will cook a patch for it.

Comment 9 Ben Woodard 2017-02-22 03:00:07 UTC
That should be OK.

Comment 10 Ben Woodard 2017-02-22 03:04:41 UTC
...As long as you do the logic correct so that it doesn't exit if /boot is RO and force_not_rebuild is set.

This unequivocally must be tested to work on a diskless machine with a read-only root and boot. That is the fundamental requirement.

Comment 16 errata-xmlrpc 2017-08-01 09:31:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2300


Note You need to log in before you can comment on or make changes to this bug.