Bug 763606 - (GLUSTER-1874) need a way to untangle glusterd
need a way to untangle glusterd
Status: CLOSED NOTABUG
Product: GlusterFS
Classification: Community
Component: cli (Show other bugs)
3.1-beta
All Linux
high Severity high
: ---
: ---
Assigned To: Amar Tumballi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-08 17:59 EDT by Allen Lu
Modified: 2015-12-01 11:45 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Allen Lu 2010-10-08 17:59:37 EDT
In my testing of 3.1qa43 build, I managed to render glusterd inoperative, even after a restart /etc/init.d/glusterd.

Support will need a method to recover glusterd when it gets into this state. How do we reset glusterd?

gluster> volume log locate mirrorvol1 10.1.30.126
wrong brick type: 10.1.30.126, use <HOSTNAME>:<export-dir-abs-path>
getting log file location information failed
gluster> volume log locate mirrorvol1 10.1.30.126:/mnt2
log file location: /etc/glusterd/logs/bricks
gluster> quit
[root@alu-vm1 glusterd]# cd /etc/glusterd/logs/bricks
-bash: cd: /etc/glusterd/logs/bricks: No such file or directory
[root@alu-vm1 glusterd]# gluster volume info all

Volume Name: mirrorvol1
Type: Distributed-Replicate
Status: Created
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.30.127:/mnt1
Brick2: 10.1.30.127:/mnt2
Brick3: 10.1.30.126:/mnt1
Brick4: 10.1.30.126:/mnt2
[root@alu-vm1 glusterd]# gluster volume start mirrorvol1
Starting volume mirrorvol1 has been unsuccessful

[root@alu-vm1 glusterd]# gluster volume info all

Volume Name: mirrorvol1
Type: Distributed-Replicate
Status: Created
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.30.127:/mnt1
Brick2: 10.1.30.127:/mnt2
Brick3: 10.1.30.126:/mnt1
Brick4: 10.1.30.126:/mnt2
[root@alu-vm1 glusterd]# gluster volume info all

Volume Name: mirrorvol1
Type: Distributed-Replicate
Status: Created
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.30.127:/mnt1
Brick2: 10.1.30.127:/mnt2
Brick3: 10.1.30.126:/mnt1
Brick4: 10.1.30.126:/mnt2
[root@alu-vm1 glusterd]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [  OK  ]
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterd]# gluster volume info all

[root@alu-vm1 glusterd]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [FAILED]
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterd]# gluster volume info all

[root@alu-vm1 glusterd]#

[root@alu-vm1 glusterd]# cd /etc/glusterd
[root@alu-vm1 glusterd]# ls -la
total 44
drwxr-xr-x  5 root root  4096 Oct  8 11:21 .
drwxr-xr-x 97 root root 12288 Oct  8 13:53 ..
-rw-r--r--  1 root root  7194 Oct  8 14:36 .cmd_log_history
-rw-r--r--  1 root root    42 Oct  8 11:21 glusterd.info
lrwxrwxrwx  1 root root    18 Oct  8 11:21 logs -> /var/log/glusterfs
drwxr-xr-x  3 root root  4096 Oct  8 14:24 nfs
drwxr-xr-x  2 root root  4096 Oct  8 12:39 peers
drwxr-xr-x  3 root root  4096 Oct  8 14:31 vols
[root@alu-vm1 glusterd]# cd vols
[root@alu-vm1 vols]# ls -la
total 12
drwxr-xr-x 3 root root 4096 Oct  8 14:31 .
drwxr-xr-x 5 root root 4096 Oct  8 11:21 ..
drwxr-xr-x 4 root root 4096 Oct  8 14:32 mirrorvol1
[root@alu-vm1 vols]# cd mirrorvol1
[root@alu-vm1 mirrorvol1]# ls -la
total 44
drwxr-xr-x 4 root root 4096 Oct  8 14:32 .
drwxr-xr-x 3 root root 4096 Oct  8 14:31 ..
drwxr-xr-x 2 root root 4096 Oct  8 14:31 bricks
-rw-r--r-- 1 root root   15 Oct  8 14:31 cksum
-rw-r--r-- 1 root root  214 Oct  8 14:31 info
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.126.mnt1.vol
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.126.mnt2.vol
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.127.mnt1.vol
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.127.mnt2.vol
-rw-r--r-- 1 root root 1546 Oct  8 14:31 mirrorvol1-fuse.vol
drwxr-xr-x 2 root root 4096 Oct  8 14:32 run

[root@alu-vm1 mirrorvol1]# cd run
[root@alu-vm1 run]# ls -la
total 8
drwxr-xr-x 2 root root 4096 Oct  8 14:32 .
drwxr-xr-x 4 root root 4096 Oct  8 14:32 ..
[root@alu-vm1 run]#

[root@alu-vm1 logs]# tail etc-glusterfs-glusterd.vol.log
[2010-10-08 14:39:00.186690] E [glusterfsd.c:323:get_volfp] glusterfsd: /etc/glusterfs/glusterd.vol: No such file or directory
[2010-10-08 14:39:00.186956] E [glusterfsd.c:1356:glusterfs_volumes_init] glusterfsd: Cannot reach volume specification file
[2010-10-08 14:41:50.136349] E [glusterfsd.c:323:get_volfp] glusterfsd: /etc/glusterfs/glusterd.vol: No such file or directory
[2010-10-08 14:41:50.136643] E [glusterfsd.c:1356:glusterfs_volumes_init] glusterfsd: Cannot reach volume specification file
[root@alu-vm1 logs]# cd /etc/glusterfs

[root@alu-vm1 glusterfs]# /etc/init.d/glusterd stop
Stopping glusterd:                                         [FAILED]
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3486     1  0 14:24 ?        00:00:00 /usr/sbin/glusterfsd --xlator-option mirrorvol1-server.listen-port=24010 -s localhost --volfile-id mirrorvol1.10.1.30.127.mnt2 -p /etc/glusterd/vols/mirrorvol1/run/10.1.30.127-mnt2.pid --brick-name /mnt2 --brick-port 24010 -l /etc/glusterd/logs/bricks/mnt2.log
root      3781  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# kill -9 3486
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3784  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# /etc/init.d/glusterd start
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3793  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3795  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [FAILED]
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3815  3312  0 14:55 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# gluster volume info
Comment 1 Lakshmipathi G 2010-10-09 00:11:17 EDT
Allen,
Could you please attach the glusterd log file ?
Comment 2 Allen Lu 2010-10-09 07:07:21 EDT
There was no logs to be found. In order to get things back to normal, I had to reinstall the core package.

A suitable fix to this is to allow admins a way to recreate the data structures needed for glusterd to be functional without reinstalling.
Comment 3 Amar Tumballi 2010-10-10 22:54:25 EDT
> [root@alu-vm1 logs]# tail etc-glusterfs-glusterd.vol.log
> [2010-10-08 14:39:00.186690] E [glusterfsd.c:323:get_volfp] glusterfsd:
> /etc/glusterfs/glusterd.vol: No such file or directory
> [2010-10-08 14:39:00.186956] E [glusterfsd.c:1356:glusterfs_volumes_init]
> glusterfsd: Cannot reach volume specification file
> [2010-10-08 14:41:50.136349] E [glusterfsd.c:323:get_volfp] glusterfsd:
> /etc/glusterfs/glusterd.vol: No such file or directory
> [2010-10-08 14:41:50.136643] E [glusterfsd.c:1356:glusterfs_volumes_init]
> glusterfsd: Cannot reach volume specification file
 

Thinking of having 'glusterd.vol' in memory itself instead of loading it from volfile. Team, please revert back with your ideas..
Comment 4 Amar Tumballi 2010-10-11 06:27:55 EDT
(In reply to comment #2)
> There was no logs to be found. In order to get things back to normal, I had to
> reinstall the core package.
> 
> A suitable fix to this is to allow admins a way to recreate the data structures
> needed for glusterd to be functional without reinstalling.

We can work around this problem by just copying the 'glusterd.vol' file in proper path.. no need of re-install of glusterfs.
Comment 5 Vijay Bellur 2010-10-28 04:32:38 EDT
glusterd.vol was missing and hence the behavior. Resolving as invalid.

Note You need to log in before you can comment on or make changes to this bug.