Bug 1298076 - Gluster crashes when adding volume with wrong brick hostnames
Gluster crashes when adding volume with wrong brick hostnames
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Atin Mukherjee
: Triaged
Depends On:
  Show dependency treegraph
Reported: 2016-01-13 03:10 EST by Arik
Modified: 2017-07-24 23:25 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-01-23 00:21:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
glusterd logs around the crash (16.99 KB, application/x-gzip)
2016-01-19 18:01 EST, Arik
no flags Details

  None (edit)
Description Arik 2016-01-13 03:10:31 EST
Description of problem:

I have a 4 node gluster setup for use with oVirt and was adding the iso domain. When I did so, I accidentally used a script from different environment with the wrong hostnames. All other paramters were correct, just the hostnames of each brick in the new replica 2 volume I was trying to add.

I corrected the error and tried to add it again, but at that point glusterd processes on two servers had crashed, and those nodes had come out of the cluster in peer status. The fix was to remove the /var/lib/glusterd on those nodes and re-probe, but not before causing a fair amount of chaos.

Version-Release number of selected component (if applicable):


How reproducible:

Unfortunately I cannot tell. Our development environment has somehow become too critical. Will reproduce time permitting

Steps to Reproduce:
1. Add new volume but misspell brick hostname
2. Add same volume but with correct hostname

Actual results:
gluster crashes on at least some servers

Expected results:
An client error message is returned indicating hosts are not peers in cluster, but otherwise nothing.

Additional info:

Servers are up-to-date CentOS-7.1.1503 linux 3.10.0-229.20.1.el7.x86_64 on DELL R730 with recent firmware updates
Comment 1 Jiffin 2016-01-19 07:13:48 EST
Can you attach the gluterd logs(/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) ?

Usually for wrong hostname, it should throw host not in cluster instead of crash
Comment 2 Arik 2016-01-19 18:01 EST
Created attachment 1116408 [details]
glusterd logs around the crash

I did the volume add on 2016-01-12 so I am sending all logs on that day removing lines with "nfs" as most log lines are those benign warnings.

Looking back, it looks rather like trying to start a volume that does not exist is what triggered the crash. I ran this from a script that is used to create volume and add properties for oVirt.

Appologies, I should have checked more carefully before making the report since it does look like gluster refused the wrong hostname, but I think this is still just as serious a bug.
Comment 3 Atin Mukherjee 2016-01-20 00:18:25 EST
Could you also attach the core file as without it we can not analyse the reason of the crash.
Comment 4 Atin Mukherjee 2017-01-23 00:21:19 EST
Given we have not received sufficient information (especially the core file) closing this bug now, please reopen if the issue persists.

Note You need to log in before you can comment on or make changes to this bug.