Red Hat Bugzilla – Bug 1298076
Gluster crashes when adding volume with wrong brick hostnames
Last modified: 2017-07-24 23:25:49 EDT
Description of problem:
I have a 4 node gluster setup for use with oVirt and was adding the iso domain. When I did so, I accidentally used a script from different environment with the wrong hostnames. All other paramters were correct, just the hostnames of each brick in the new replica 2 volume I was trying to add.
I corrected the error and tried to add it again, but at that point glusterd processes on two servers had crashed, and those nodes had come out of the cluster in peer status. The fix was to remove the /var/lib/glusterd on those nodes and re-probe, but not before causing a fair amount of chaos.
Version-Release number of selected component (if applicable):
Unfortunately I cannot tell. Our development environment has somehow become too critical. Will reproduce time permitting
Steps to Reproduce:
1. Add new volume but misspell brick hostname
2. Add same volume but with correct hostname
gluster crashes on at least some servers
An client error message is returned indicating hosts are not peers in cluster, but otherwise nothing.
Servers are up-to-date CentOS-7.1.1503 linux 3.10.0-229.20.1.el7.x86_64 on DELL R730 with recent firmware updates
Can you attach the gluterd logs(/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) ?
Usually for wrong hostname, it should throw host not in cluster instead of crash
Created attachment 1116408 [details]
glusterd logs around the crash
I did the volume add on 2016-01-12 so I am sending all logs on that day removing lines with "nfs" as most log lines are those benign warnings.
Looking back, it looks rather like trying to start a volume that does not exist is what triggered the crash. I ran this from a script that is used to create volume and add properties for oVirt.
Appologies, I should have checked more carefully before making the report since it does look like gluster refused the wrong hostname, but I think this is still just as serious a bug.
Could you also attach the core file as without it we can not analyse the reason of the crash.
Given we have not received sufficient information (especially the core file) closing this bug now, please reopen if the issue persists.