Description of problem:
We were trying to reproduce a issue for nfs-ganesha, where nfs-ganesha daemon segfaults and should coredump, but we could not find it anywhere. Once we disabled vsdmd on RHGS node(based on RHEL7.1) we were able to get coredump as well.
Hence somehow vdsmd is not allowing to get the coredump
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create a volume of 6x2 type
2. configure nfs-ganesha
3. on a RHEL client, execute cthon lock test using the command,
time ./server -l -o vers=3 -p /<volname> -m /mnt -N 1 <server-name/IP-address>
4. check if the core is generated in /var/log/core
5. disable vdsmd using command "systemctl disable vdsmd" and reboot the node
6. again repeat step 3.
7. check if core generated in /var/spool/abrt
cthon test fails and nfs-ganesha seg-faults, core should get generated
no core in /var/log/core
core found after disbaling vdsmd reboot and repeating step 3 in /var/spool/abrt
vdsm should not be a deterrent for other processes to dump core.
In order to get the core dump I had also tried to update the file,
with variable "DefaultLimitCORE=infinity"
then execute, systemctl daemon-reexec.
This workaround was tried out without disabling vdsmd.
As per Dan, we need to set core_dump_enable=false in vdsm.conf
Testing is even easier as the steps listed in comment #0. It is sufficient to do this (no configuration needed):
# systemctl start nfs-ganesha
# killall -s SIGSEGV ganesha.nfsd
The coredump should get mentioned in /var/log/messages. When abrtd captures the core, there will be additional messages from abrtd.
nfs-ganesha is just an example, I expect that segfaults of any other daemon is affected too.
I have set core_dump_enable=false in vdsm.conf and restart the vdsmd and supervdsmd service. But still i could not able to get core dump when something is wrong.
Also I am unable to get core dump ever after i disable vdsm by following "Steps to Reproduce"
It works only after i try "ulimit -c unlimited"
[root@dhcp43-8 test]# ./seg # (Seg is a test program to produce coredump)
[root@dhcp43-8 test]# ulimit -c
[root@dhcp43-8 test]# ulimit -c unlimited
[root@dhcp43-8 test]# ./seg
Segmentation fault (core dumped)
When you change the vdsm settings, could you restart the abrtd-ccpp service (or reboot)? That service reconfigures the kernel.core_pattern and enables capturing of userspace coredumps by abrt.
Yes it works only after i restart abrtd-ccpp service. What about the node which does not have this abrtd-ccpp service?
[root@dhcp43-8 test]# systemdctl restart abrtd-ccpp.service
-bash: systemdctl: command not found
[root@dhcp43-8 test]# systemctl restart abrtd-ccpp.service
Failed to issue method call: Unit abrtd-ccpp.service failed to load: No such file or directory.
[root@dhcp43-8 test]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.1 (Maipo)
(In reply to Niels de Vos from comment #4)
> Testing is even easier as the steps listed in comment #0. It is sufficient
> to do this (no configuration needed):
> # systemctl start nfs-ganesha
> # killall -s SIGSEGV ganesha.nfsd
> The coredump should get mentioned in /var/log/messages. When abrtd captures
> the core, there will be additional messages from abrtd.
> nfs-ganesha is just an example, I expect that segfaults of any other daemon
> is affected too.
FYI, tested this with glusterd ( killall -s SIGSEGV glusterd)- and core dump is generated with vdsm service enabled. (abrtd service was not running
Did some more testing, and I'm not very sure the issue reported is to do with vdsm. (Testing was done with upstream bits)
On a Centos 7 installation (no vdsm installed), installed nfs-ganesha from https://copr.fedoraproject.org/coprs/devos/nfs-ganesha/
[root@d ~]# cat /proc/sys/kernel/core_pattern
[root@d ~]# systemctl start nfs-ganesha
[root@d ~]# killall -s SIGSEGV ganesha.nfsd
[root@d ~]# ls -la / | grep core
-- No core dump files
I tried editing /usr/lib/systemd/system/nfs-ganesha.service
-- but no core dump
Only after editing DefaultLimitCORE=infinity in /etc/systemd/system.conf was the core dump generated.
However for other services like glusterd core dump was generated without the above setting.
Neils, do you know more on this?
I'm removing the blocker flag from this bug as it's unclear if it's a vdsm issue.
I'm not sure how you installed your CentOS-7 system. My guess is that abrt is not installed. Once you install abrt (and reboot), nfs-ganesha will generate a core when you kill it with SIGSEGV. You can then use "abrt-cli list" to list the cores that abrt captured.
This only works because the sysctl is configured:
[root@vm018 ~]# sysctl kernel.core_pattern
kernel.core_pattern = |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I
If vdsm overwrites the kernel.core_pattern that abrt configured, the standard RHEL tools will become non-functional.
Could you check that, Sahina?
What I was missing was the abrt-cli package. Post installing and restarting abrtd service, I see the kernel.core_pattern has changed - however no core dumps.
[root@dhcp43-106 ~]# cat /proc/sys/kernel/core_pattern
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I
[root@dhcp43-106 ~]# systemctl restart abrt-ccpp
[root@dhcp43-106 ~]# systemctl restart abrtd
[root@dhcp43-106 ~]# systemctl start nfs-ganesha
[root@dhcp43-106 ~]# killall -s SIGSEGV ganesha.nfsd
[root@dhcp43-106 ~]# abrt-cli list
I'm sure I'm missing something!
Btw, mine was a minimal CentOS install.
by default abrt is configured to only capture cores from binaries that come from gpg-signed packages. The packages from a copr repository are not signed. I think /var/log/messages mentioned in your case that the core is skipped/dropped or something.
The easiest is to test with packages from the Storage SIG. These can be installed directly from CentOS repositories and are signed:
# yum install centos-release-gluster
# yum install nfs-ganesha
Te 1st command adds a new .repo file for yum, the 2nd uses the new repository to get nfs-ganesha and dependencies.
This worked for me, could you try that too?
Yes, it did! That was indeed the issue.
Thanks for clarifying.
Tim, with vdsm in RHGS rebased to 4.19, and the abrt integration available - this issue is probably fixed? Could you check?