Bug 1091785 - dracut kdump does not work for nfs shares anymore
Summary: dracut kdump does not work for nfs shares anymore
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 20
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-28 03:25 UTC by Oleg Drokin
Modified: 2015-06-29 20:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-29 20:19:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
failure session (54.07 KB, text/plain)
2014-05-16 17:18 UTC, Oleg Drokin
no flags Details

Description Oleg Drokin 2014-04-28 03:25:26 UTC
Recent change to dracut broke dumping to NFS.
This is dracut-037-11.git20140402.fc20.x86_64

It used to work in February when I last needed this functionality.

The symptom is such that the nfs share is not being mounted (not even attempted) and subsequently the dump fails.
kdump.conf excerpt:
nfs 192.168.10.1:/exports/crashdumps
path /


After some tracing I found this code in /usr/lib/dracut/modules.d/95fstab-sys/mount-sys.sh:

# systemd will mount and run fsck from /etc/fstab and we don't want to
# run into a race condition.
if [ -z "$DRACUT_SYSTEMD" ]; then
    [ -f /etc/fstab ] && fstab_mount /etc/fstab
fi

If I make the condition to be always true (but e.g. pointing at a certainly non-existing variable) then the mount happens and subsequently crashdump will work.

So either systemd is not really trying to perform any mounts now or reduced kdump initrd does not ship this functionality.

Comment 1 Vivek Goyal 2014-05-16 13:29:34 UTC
This change was recently introduced. Idea was that systemd will mount the nfs, run fsck and after that kdump script will save dump.

So looks like something is going wrong and systemd is not mounting nfs. Either /etc/fstab is not being generated properly, or respective .mount unit file is not being generated properly or something else. I am not sure.

Comment 2 Vivek Goyal 2014-05-16 13:31:44 UTC
commit e920bfb1e8a5917e7b0f360d1c51d200db3acbfd
Author: WANG Chao <chaowang>
Date:   Tue Apr 1 15:20:49 2014 +0800

    fstab: do not mount and fsck from fstab if using systemd
    
    If using systemd in initramfs, we could run into a race condition when
    dracut and systemd both are trying to mount and run fsck for the same
    filesystem, and mount or fsck could be a failure.
    
    To fix such failure, we should use systemd to mount/fsck from /etc/fstab
    only.
    
    v2: check $DRACUT_SYSTEMD suggested by Alexander Tsoy
    
    Signed-off-by: WANG Chao <chaowang>

Comment 3 Vivek Goyal 2014-05-16 13:32:11 UTC
Chao, would you have any idea what's going on? I think you had tested the patch and it worked for you?

Comment 4 Oleg Drokin 2014-05-16 16:36:29 UTC
Additional bit of info:
I have both NFSROOT and crasdump to nfs.
nfsroot mounts fine, but not the crashdump.

fstab appears to be generated correctly, manual mount of /var/crash works (once you mkdir /var/crash that does not exist)

Comment 5 Vivek Goyal 2014-05-16 17:01:54 UTC
Oleg,

Would you mind pasting console logs. Trying to figure out exactly where it failed.

Also what version of kexec-tools are you using. Can you give latest kexec-tools version a try. (Release 2.0.6-5).

Comment 6 Oleg Drokin 2014-05-16 17:18:55 UTC
Created attachment 896484 [details]
failure session

Hm, there was not anything specific on the console.

The kdump image would boot, then sit idly (waiting for /var/crsh mount to actually appear it looks like) and then complain that it did not appear.

I also checked the rdsosreport.txt and it also does not contain anything interesting (see attached - this is from before I found what the issue was).
This is with 2.0.4-27.fc20 that is the latest fedora20 kexec-tools package available.
I'll try to ssee if 2.0.6-5 will install.

Comment 7 Vivek Goyal 2014-05-19 12:47:05 UTC
I saw following error message in the logs.


systemd-fstab-generator[336]: Checking was requested for "192.168.10.1:/exports/crashdumps", but it is not a device

systemd fstab generator somehow things that nfs server you mentioned is not a valid device.

Comment 8 Vivek Goyal 2014-05-19 12:50:15 UTC
How does your network configuration look like? Are you using a bond device? Can you give some details of your network configuration.

Also can you change "default" to "shell" in /etc/kdump.conf. That will place you on bash shell upon error, instead of rebooting. And then you can look around a bit and see if right network interfaces have come up or not.

Comment 9 WANG Chao 2014-05-19 14:21:04 UTC
The following two patches are also needed, along with commit e920bfb ("fstab: do not mount and fsck from fstab if using systemd"):

1) [PATCH] dracut-initqueue service runs before remote-fs-pre.target
- http://thread.gmane.org/gmane.linux.kernel.initramfs/3680

2) [PATCH] dracut-pre-pivot pulls in remote-fs.target
- http://thread.gmane.org/gmane.linux.kernel.initramfs/3683

1) is merged in dracut:
commit b31250e
Author: WANG Chao <chaowang>
Date:   Thu Apr 3 15:49:26 2014 +0800

    dracut-initqueue service runs before remote-fs-pre.target

2) has not, I'll ping harald about it.

If both of those two are backported to F20, the problem should be solved.

Thanks
WANG Chao

Comment 11 Harald Hoyer 2014-05-20 08:28:04 UTC
(In reply to Vivek Goyal from comment #7)
> I saw following error message in the logs.
> 
> 
> systemd-fstab-generator[336]: Checking was requested for
> "192.168.10.1:/exports/crashdumps", but it is not a device
> 
> systemd fstab generator somehow things that nfs server you mentioned is not
> a valid device.

It probably should have "0 0" at the end of the fstab entry.

Comment 12 Oleg Drokin 2014-05-20 17:53:20 UTC
This fstab is autogenerated.
My real fstab has 0 0 at the end of course:
192.168.10.1:/exports/crashdumps /var/crash nfs defaults,vers=3 0 0

Autogenerated fstab looks like this:
# cat etc/fstab
192.168.10.1:/exports/crashdumps /var/crash nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.10.1,mountvers=3,mountport=47701,mountproto=udp,local_lock=none,addr=192.168.10.1,nofail 0 2

Also answering earlier question - there's nothign strange about network configuration. Regular network device, ip by dhcp, it does come up because when I do mkdir /var/crash ; mount /var/crash once the auto dump fails (I do have default target of shell to try this), the mount works.

BTW, I also wanted to note this unrelated fact: unlike earlier versions (say RHEL6) where you just specify dump location in kdump.conf and it works, Fedora 20 insists that the /var/crash is actually mounted on the node during normal operations, otherwise dump initrd generation fails.
I don't really need /var/crash mounted on my worker nodes, though, as I examine crashdumps on a separate system setup for that purpose (it also happens to be the dump server).

Comment 13 Vivek Goyal 2014-05-20 18:02:49 UTC
(In reply to Oleg Drokin from comment #12)
> 
> BTW, I also wanted to note this unrelated fact: unlike earlier versions (say
> RHEL6) where you just specify dump location in kdump.conf and it works,
> Fedora 20 insists that the /var/crash is actually mounted on the node during
> normal operations, otherwise dump initrd generation fails.
> I don't really need /var/crash mounted on my worker nodes, though, as I
> examine crashdumps on a separate system setup for that purpose (it also
> happens to be the dump server).

Yep, this is behavior change. Previously we will mount nfs and make sure it can be mounted and one can create a directory at specified location.

Now we have dropped the code which mounts nfs automatically instead we expect that user has already it mounted.

Comment 14 WANG Chao 2014-05-21 03:30:04 UTC
(In reply to WANG Chao from comment #9)
> The following two patches are also needed, along with commit e920bfb
> ("fstab: do not mount and fsck from fstab if using systemd"):
> 
> 1) [PATCH] dracut-initqueue service runs before remote-fs-pre.target
> - http://thread.gmane.org/gmane.linux.kernel.initramfs/3680
> 
> 2) [PATCH] dracut-pre-pivot pulls in remote-fs.target
> - http://thread.gmane.org/gmane.linux.kernel.initramfs/3683
> 
> 1) is merged in dracut:
> commit b31250e
> Author: WANG Chao <chaowang>
> Date:   Thu Apr 3 15:49:26 2014 +0800
> 
>     dracut-initqueue service runs before remote-fs-pre.target
> 
> 2) has not, I'll ping harald about it.
> 
> If both of those two are backported to F20, the problem should be solved.

Hi, Harald

Do you plan to backport these two commits to F20? With these two, nfs mounts should be ready before entering dracut-pre-pivot hook.

Thanks
WANG Chao

Comment 15 Fedora End Of Life 2015-05-29 11:41:38 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2015-06-29 20:19:32 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.