Bug 1298076 - Gluster crashes when adding volume with wrong brick hostnames
Summary: Gluster crashes when adding volume with wrong brick hostnames
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.7.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-13 08:10 UTC by Arik
Modified: 2017-07-25 03:25 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-01-23 05:21:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterd logs around the crash (16.99 KB, application/x-gzip)
2016-01-19 23:01 UTC, Arik
no flags Details

Description Arik 2016-01-13 08:10:31 UTC
Description of problem:

I have a 4 node gluster setup for use with oVirt and was adding the iso domain. When I did so, I accidentally used a script from different environment with the wrong hostnames. All other paramters were correct, just the hostnames of each brick in the new replica 2 volume I was trying to add.

I corrected the error and tried to add it again, but at that point glusterd processes on two servers had crashed, and those nodes had come out of the cluster in peer status. The fix was to remove the /var/lib/glusterd on those nodes and re-probe, but not before causing a fair amount of chaos.

Version-Release number of selected component (if applicable):

3.7.6


How reproducible:

Unfortunately I cannot tell. Our development environment has somehow become too critical. Will reproduce time permitting

Steps to Reproduce:
1. Add new volume but misspell brick hostname
2. Add same volume but with correct hostname

Actual results:
gluster crashes on at least some servers

Expected results:
An client error message is returned indicating hosts are not peers in cluster, but otherwise nothing.

Additional info:

Servers are up-to-date CentOS-7.1.1503 linux 3.10.0-229.20.1.el7.x86_64 on DELL R730 with recent firmware updates

Comment 1 Jiffin 2016-01-19 12:13:48 UTC
Can you attach the gluterd logs(/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) ?

Usually for wrong hostname, it should throw host not in cluster instead of crash

Comment 2 Arik 2016-01-19 23:01:00 UTC
Created attachment 1116408 [details]
glusterd logs around the crash

I did the volume add on 2016-01-12 so I am sending all logs on that day removing lines with "nfs" as most log lines are those benign warnings.

Looking back, it looks rather like trying to start a volume that does not exist is what triggered the crash. I ran this from a script that is used to create volume and add properties for oVirt.

Appologies, I should have checked more carefully before making the report since it does look like gluster refused the wrong hostname, but I think this is still just as serious a bug.

Comment 3 Atin Mukherjee 2016-01-20 05:18:25 UTC
Could you also attach the core file as without it we can not analyse the reason of the crash.

Comment 4 Atin Mukherjee 2017-01-23 05:21:19 UTC
Given we have not received sufficient information (especially the core file) closing this bug now, please reopen if the issue persists.


Note You need to log in before you can comment on or make changes to this bug.