Bug 763606 (GLUSTER-1874)

Summary: need a way to untangle glusterd
Product: [Community] GlusterFS Reporter: Allen Lu <allen>
Component: cliAssignee: Amar Tumballi <amarts>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.1-betaCC: gluster-bugs, lakshmipathi, vijay, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Allen Lu 2010-10-08 21:59:37 UTC
In my testing of 3.1qa43 build, I managed to render glusterd inoperative, even after a restart /etc/init.d/glusterd.

Support will need a method to recover glusterd when it gets into this state. How do we reset glusterd?

gluster> volume log locate mirrorvol1 10.1.30.126
wrong brick type: 10.1.30.126, use <HOSTNAME>:<export-dir-abs-path>
getting log file location information failed
gluster> volume log locate mirrorvol1 10.1.30.126:/mnt2
log file location: /etc/glusterd/logs/bricks
gluster> quit
[root@alu-vm1 glusterd]# cd /etc/glusterd/logs/bricks
-bash: cd: /etc/glusterd/logs/bricks: No such file or directory
[root@alu-vm1 glusterd]# gluster volume info all

Volume Name: mirrorvol1
Type: Distributed-Replicate
Status: Created
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.30.127:/mnt1
Brick2: 10.1.30.127:/mnt2
Brick3: 10.1.30.126:/mnt1
Brick4: 10.1.30.126:/mnt2
[root@alu-vm1 glusterd]# gluster volume start mirrorvol1
Starting volume mirrorvol1 has been unsuccessful

[root@alu-vm1 glusterd]# gluster volume info all

Volume Name: mirrorvol1
Type: Distributed-Replicate
Status: Created
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.30.127:/mnt1
Brick2: 10.1.30.127:/mnt2
Brick3: 10.1.30.126:/mnt1
Brick4: 10.1.30.126:/mnt2
[root@alu-vm1 glusterd]# gluster volume info all

Volume Name: mirrorvol1
Type: Distributed-Replicate
Status: Created
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.30.127:/mnt1
Brick2: 10.1.30.127:/mnt2
Brick3: 10.1.30.126:/mnt1
Brick4: 10.1.30.126:/mnt2
[root@alu-vm1 glusterd]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [  OK  ]
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterd]# gluster volume info all

[root@alu-vm1 glusterd]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [FAILED]
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterd]# gluster volume info all

[root@alu-vm1 glusterd]#

[root@alu-vm1 glusterd]# cd /etc/glusterd
[root@alu-vm1 glusterd]# ls -la
total 44
drwxr-xr-x  5 root root  4096 Oct  8 11:21 .
drwxr-xr-x 97 root root 12288 Oct  8 13:53 ..
-rw-r--r--  1 root root  7194 Oct  8 14:36 .cmd_log_history
-rw-r--r--  1 root root    42 Oct  8 11:21 glusterd.info
lrwxrwxrwx  1 root root    18 Oct  8 11:21 logs -> /var/log/glusterfs
drwxr-xr-x  3 root root  4096 Oct  8 14:24 nfs
drwxr-xr-x  2 root root  4096 Oct  8 12:39 peers
drwxr-xr-x  3 root root  4096 Oct  8 14:31 vols
[root@alu-vm1 glusterd]# cd vols
[root@alu-vm1 vols]# ls -la
total 12
drwxr-xr-x 3 root root 4096 Oct  8 14:31 .
drwxr-xr-x 5 root root 4096 Oct  8 11:21 ..
drwxr-xr-x 4 root root 4096 Oct  8 14:32 mirrorvol1
[root@alu-vm1 vols]# cd mirrorvol1
[root@alu-vm1 mirrorvol1]# ls -la
total 44
drwxr-xr-x 4 root root 4096 Oct  8 14:32 .
drwxr-xr-x 3 root root 4096 Oct  8 14:31 ..
drwxr-xr-x 2 root root 4096 Oct  8 14:31 bricks
-rw-r--r-- 1 root root   15 Oct  8 14:31 cksum
-rw-r--r-- 1 root root  214 Oct  8 14:31 info
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.126.mnt1.vol
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.126.mnt2.vol
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.127.mnt1.vol
-rw-r--r-- 1 root root  662 Oct  8 14:31 mirrorvol1.10.1.30.127.mnt2.vol
-rw-r--r-- 1 root root 1546 Oct  8 14:31 mirrorvol1-fuse.vol
drwxr-xr-x 2 root root 4096 Oct  8 14:32 run

[root@alu-vm1 mirrorvol1]# cd run
[root@alu-vm1 run]# ls -la
total 8
drwxr-xr-x 2 root root 4096 Oct  8 14:32 .
drwxr-xr-x 4 root root 4096 Oct  8 14:32 ..
[root@alu-vm1 run]#

[root@alu-vm1 logs]# tail etc-glusterfs-glusterd.vol.log
[2010-10-08 14:39:00.186690] E [glusterfsd.c:323:get_volfp] glusterfsd: /etc/glusterfs/glusterd.vol: No such file or directory
[2010-10-08 14:39:00.186956] E [glusterfsd.c:1356:glusterfs_volumes_init] glusterfsd: Cannot reach volume specification file
[2010-10-08 14:41:50.136349] E [glusterfsd.c:323:get_volfp] glusterfsd: /etc/glusterfs/glusterd.vol: No such file or directory
[2010-10-08 14:41:50.136643] E [glusterfsd.c:1356:glusterfs_volumes_init] glusterfsd: Cannot reach volume specification file
[root@alu-vm1 logs]# cd /etc/glusterfs

[root@alu-vm1 glusterfs]# /etc/init.d/glusterd stop
Stopping glusterd:                                         [FAILED]
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3486     1  0 14:24 ?        00:00:00 /usr/sbin/glusterfsd --xlator-option mirrorvol1-server.listen-port=24010 -s localhost --volfile-id mirrorvol1.10.1.30.127.mnt2 -p /etc/glusterd/vols/mirrorvol1/run/10.1.30.127-mnt2.pid --brick-name /mnt2 --brick-port 24010 -l /etc/glusterd/logs/bricks/mnt2.log
root      3781  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# kill -9 3486
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3784  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# /etc/init.d/glusterd start
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3793  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3795  3312  0 14:54 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# /etc/init.d/glusterd restart
Stopping glusterd:                                         [FAILED]
Starting glusterd:                                         [  OK  ]
[root@alu-vm1 glusterfs]# ps -ef | grep gluster
root      3815  3312  0 14:55 pts/0    00:00:00 grep gluster
[root@alu-vm1 glusterfs]# gluster volume info

Comment 1 Lakshmipathi G 2010-10-09 04:11:17 UTC
Allen,
Could you please attach the glusterd log file ?

Comment 2 Allen Lu 2010-10-09 11:07:21 UTC
There was no logs to be found. In order to get things back to normal, I had to reinstall the core package.

A suitable fix to this is to allow admins a way to recreate the data structures needed for glusterd to be functional without reinstalling.

Comment 3 Amar Tumballi 2010-10-11 02:54:25 UTC
> [root@alu-vm1 logs]# tail etc-glusterfs-glusterd.vol.log
> [2010-10-08 14:39:00.186690] E [glusterfsd.c:323:get_volfp] glusterfsd:
> /etc/glusterfs/glusterd.vol: No such file or directory
> [2010-10-08 14:39:00.186956] E [glusterfsd.c:1356:glusterfs_volumes_init]
> glusterfsd: Cannot reach volume specification file
> [2010-10-08 14:41:50.136349] E [glusterfsd.c:323:get_volfp] glusterfsd:
> /etc/glusterfs/glusterd.vol: No such file or directory
> [2010-10-08 14:41:50.136643] E [glusterfsd.c:1356:glusterfs_volumes_init]
> glusterfsd: Cannot reach volume specification file
 

Thinking of having 'glusterd.vol' in memory itself instead of loading it from volfile. Team, please revert back with your ideas..

Comment 4 Amar Tumballi 2010-10-11 10:27:55 UTC
(In reply to comment #2)
> There was no logs to be found. In order to get things back to normal, I had to
> reinstall the core package.
> 
> A suitable fix to this is to allow admins a way to recreate the data structures
> needed for glusterd to be functional without reinstalling.

We can work around this problem by just copying the 'glusterd.vol' file in proper path.. no need of re-install of glusterfs.

Comment 5 Vijay Bellur 2010-10-28 08:32:38 UTC
glusterd.vol was missing and hence the behavior. Resolving as invalid.