1259402 – vdsm: vdsmd does not allow core dump for other processes

Bug 1259402 - vdsm: vdsmd does not allow core dump for other processes

Summary: vdsm: vdsmd does not allow core dump for other processes

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	vdsm
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sahina Bose
QA Contact:	RHS-C QE
Docs Contact:
URL:
Whiteboard:
Depends On:	917062
Blocks:	1322672
TreeView+	depends on / blocked

Reported:	2015-09-02 14:09 UTC by Saurabh
Modified:	2023-09-14 03:04 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	When vdsmd and abrt are installed alongside each other, vdsmd overwrites abrt core dump configuration in /proc/sys/kernel/core_pattern. This prevents NFS-Ganesha from generating core dumps. Workaround: Disable core dumps in /etc/vdsm/vdsm.conf: core_dump_enable = false Then restart the abrt-ccpp service: # systemctl restart abrt-ccpp
Clone Of:
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Saurabh 2015-09-02 14:09:31 UTC

Description of problem:
We were trying to reproduce a issue for nfs-ganesha, where nfs-ganesha daemon segfaults and should coredump, but we could not find it anywhere. Once we disabled vsdmd on RHGS node(based on RHEL7.1)  we were able to get coredump as well.

Hence somehow vdsmd is not allowing to get the coredump

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-14.el7rhgs.x86_64
nfs-ganesha-2.2.0-7.el7rhgs.x86_64
vdsm-4.16.20-1.2.el7rhgs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create a volume of 6x2 type
2. configure nfs-ganesha
3. on a RHEL client, execute cthon lock test using the command,
cd cthon04
time ./server -l -o vers=3 -p /<volname> -m /mnt -N 1 <server-name/IP-address>
4. check if the core is generated in /var/log/core
5. disable vdsmd using command "systemctl disable vdsmd" and reboot the node
6. again repeat step 3.
7. check if core generated in /var/spool/abrt

Actual results:
step 3,
cthon test fails and nfs-ganesha seg-faults, core should get generated
step 4,
no core in /var/log/core
step 6,
core found after disbaling vdsmd reboot and repeating step 3 in /var/spool/abrt

Expected results:
vdsm should not be a deterrent for other processes to dump core. 

Additional info:
In order to get the core dump I had also tried to update the file,
/etc/systemd/system.conf
with variable "DefaultLimitCORE=infinity"
then execute, systemctl daemon-reexec.

This workaround was tried out without disabling vdsmd.

Comment 3 Sahina Bose 2015-09-03 11:07:29 UTC

As per Dan, we need to set core_dump_enable=false in vdsm.conf

Comment 4 Niels de Vos 2015-09-03 12:35:24 UTC

Testing is even easier as the steps listed in comment #0. It is sufficient to do this (no configuration needed):

  # systemctl start nfs-ganesha
  # killall -s SIGSEGV ganesha.nfsd

The coredump should get mentioned in /var/log/messages. When abrtd captures the core, there will be additional messages from abrtd.

nfs-ganesha is just an example, I expect that segfaults of any other daemon is affected too.

Comment 5 Timothy Asir 2015-09-04 07:28:30 UTC

I have set core_dump_enable=false in vdsm.conf and restart the vdsmd and supervdsmd service. But still i could not able to get core dump when something is wrong.

Also I am unable to get core dump ever after i disable vdsm by following "Steps to Reproduce"

It works only after i try "ulimit -c unlimited"

[root@dhcp43-8 test]# ./seg  # (Seg is a test program to produce coredump)
Segmentation fault
[root@dhcp43-8 test]# ulimit -c
0
[root@dhcp43-8 test]# ulimit -c unlimited
[root@dhcp43-8 test]# ./seg
Segmentation fault (core dumped)
[root@dhcp43-8 test]#

Comment 6 Niels de Vos 2015-09-04 07:47:35 UTC

When you change the vdsm settings, could you restart the abrtd-ccpp service (or reboot)? That service reconfigures the kernel.core_pattern and enables capturing of userspace coredumps by abrt.

Thanks!

Comment 7 Timothy Asir 2015-09-08 10:29:38 UTC

Yes it works only after i restart abrtd-ccpp service. What about the node which does not have this abrtd-ccpp service?

[root@dhcp43-8 test]# systemdctl restart abrtd-ccpp.service
-bash: systemdctl: command not found
[root@dhcp43-8 test]# systemctl restart abrtd-ccpp.service
Failed to issue method call: Unit abrtd-ccpp.service failed to load: No such file or directory.
[root@dhcp43-8 test]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

Comment 10 Sahina Bose 2016-01-05 11:57:29 UTC

(In reply to Niels de Vos from comment #4)
> Testing is even easier as the steps listed in comment #0. It is sufficient
> to do this (no configuration needed):
> 
>   # systemctl start nfs-ganesha
>   # killall -s SIGSEGV ganesha.nfsd
> 
> The coredump should get mentioned in /var/log/messages. When abrtd captures
> the core, there will be additional messages from abrtd.
> 
> nfs-ganesha is just an example, I expect that segfaults of any other daemon
> is affected too.

FYI, tested this with glusterd ( killall -s SIGSEGV glusterd)- and core dump is generated with vdsm service enabled. (abrtd service was not running

Comment 11 Sahina Bose 2016-01-06 13:44:03 UTC

Did some more testing, and I'm not very sure the issue reported is to do with vdsm. (Testing was done with upstream bits)

On a Centos 7 installation (no vdsm installed), installed nfs-ganesha from https://copr.fedoraproject.org/coprs/devos/nfs-ganesha/

[root@d ~]# cat  /proc/sys/kernel/core_pattern
core
[root@d ~]# systemctl start nfs-ganesha
[root@d ~]# killall -s SIGSEGV ganesha.nfsd
[root@d ~]# ls -la / | grep core
-- No core dump files

I tried editing /usr/lib/systemd/system/nfs-ganesha.service
LimitCORE=infinity 
 
-- but no core dump

Only after editing DefaultLimitCORE=infinity in /etc/systemd/system.conf was the core dump generated.

However for other services like glusterd core dump was generated without the above setting.

Neils, do you know more on this?
I'm removing the blocker flag from this bug as it's unclear if it's a vdsm issue.

Comment 12 Niels de Vos 2016-01-06 15:52:05 UTC

I'm not sure how you installed your CentOS-7 system. My guess is that abrt is not installed. Once you install abrt (and reboot), nfs-ganesha will generate a core when you kill it with SIGSEGV. You can then use "abrt-cli list" to list the cores that abrt captured.

This only works because the sysctl is configured:

  [root@vm018 ~]# sysctl kernel.core_pattern
  kernel.core_pattern = |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I

If vdsm overwrites the kernel.core_pattern that abrt configured, the standard RHEL tools will become non-functional.

Could you check that, Sahina?

Comment 13 Sahina Bose 2016-01-06 17:05:50 UTC

Thanks, Neils. 
What I was missing was the abrt-cli package. Post installing and restarting abrtd service, I see the kernel.core_pattern has changed - however no core dumps.

[root@dhcp43-106 ~]# cat  /proc/sys/kernel/core_pattern
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I
[root@dhcp43-106 ~]# systemctl restart abrt-ccpp
[root@dhcp43-106 ~]# systemctl restart abrtd
[root@dhcp43-106 ~]# systemctl start nfs-ganesha
[root@dhcp43-106 ~]# killall -s SIGSEGV ganesha.nfsd
[root@dhcp43-106 ~]# abrt-cli list
[root@dhcp43-106 ~]# 

I'm sure I'm missing something!
Btw, mine was a minimal CentOS install.

Comment 14 Niels de Vos 2016-01-07 08:44:08 UTC

Hi Sahina,

by default abrt is configured to only capture cores from binaries that come from gpg-signed packages. The packages from a copr repository are not signed. I think /var/log/messages mentioned in your case that the core is skipped/dropped or something.

The easiest is to test with packages from the Storage SIG. These can be installed directly from CentOS repositories and are signed:

  # yum install centos-release-gluster
  # yum install nfs-ganesha

Te 1st command adds a new .repo file for yum, the 2nd uses the new repository to get nfs-ganesha and dependencies.

This worked for me, could you try that too?

Comment 15 Sahina Bose 2016-01-07 09:07:50 UTC

Yes, it did! That was indeed the issue.
Thanks for clarifying.

Comment 21 Sahina Bose 2018-03-21 08:06:50 UTC

Tim, with vdsm in RHGS rebased to 4.19, and the abrt integration available - this issue is probably fixed? Could you check?

Comment 24 Red Hat Bugzilla 2023-09-14 03:04:46 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.