Bug 1218573

Summary: [Snapshot] Scheduled job is not processed when one of the node of shared storage volume is down
Product: [Community] GlusterFS Reporter: Anil Shah <ashah>
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: asengupt, bugs, gluster-bugs, rjoseph
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: Scheduler
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1223205 1230399 (view as bug list) Environment:
Last Closed: 2016-06-16 12:58:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1223205, 1230399    

Description Anil Shah 2015-05-05 09:42:46 UTC
Description of problem:

Scheduler is not picking scheduled jobs, when one of the storage node of shared storage volume is down.

Version-Release number of selected component (if applicable):

[root@localhost glusterfs]# rpm -qa | grep glusterfs
glusterfs-debuginfo-3.7.0alpha0-0.9.git989bea3.el7.centos.x86_64
glusterfs-libs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-fuse-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-extra-xlators-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-geo-replication-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-cli-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-api-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-server-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-devel-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64


How reproducible:

100%

Steps to Reproduce:

1. Create 2*2 distributed replicate volume.

2. Create  shared storage replicate volume on storage node which is not part of volume whose snapshot is scheduled. and mount on each storage node on path /var/run/gluster/shared_storage 
3. initialize scheduler on each storage node e.g run snap_scheduler.py init  command 
4. Enable scheduler on storage nodes e.g run snap_scheduler.py enable 
5. Add jobs to create snapshot of volume, with interval of 5  min. e.g snap_scheduler.py add job1 "*/5 * * * *" testvol
6. bring down the both shared storage node.
7. Bring up any  one of the shared storage node.

Actual results:

Scheduled job is not picked by scheduler  

Expected results:

Scheduler should pick the scheduled jobs


Additional info:

[root@localhost glusterfs]# gluster v info testvol
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: f5eed851-6f24-4cde-903e-7669f5437bc9
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.143:/rhs/brick1/b1
Brick2: 10.70.47.145:/rhs/brick1/b2
Brick3: 10.70.47.150:/rhs/brick1/b3
Brick4: 10.70.47.151:/rhs/brick1/b4
Options Reconfigured:
features.quota: on
features.quota-deem-statfs: on
features.uss: enable
features.barrier: disable
====================================
Shared storage volume

[root@localhost ~]# gluster v info meta
 
Volume Name: meta
Type: Replicate
Volume ID: b07daf4e-891d-4022-972a-af181250dc07
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.46.248:/rhs/brick1/b1
Brick2: 10.70.46.251:/rhs/brick1/b2

Comment 1 senaik 2015-05-08 09:45:30 UTC
Version : glusterfs 3.7.0beta1 built on May  7 2015
=======

Another scenario where jobs are not picked up:

1) Create a dist-rep volume and mount it
 
2) Create a shared storage and mount it 

Enable Scheduler and schedule jobs on the volumes 
snap_scheduler.py add "A1"  "*/5 * * * * " "vol1"
snap_scheduler: Successfully added snapshot schedule

snap_scheduler.py add "A2"  "*/10 * * * * " "vol2"
snap_scheduler: Successfully added snapshot schedule

3) Take a snapshot of the shared storage 
gluster snapshot create MV_Snap gluster_shared_storage 
snapshot create: success: Snap MV_Snap_GMT-2015.05.08-09.20.26 created successfully

4)Add some more jobs - A3 and A4 

5)Stop the volume and see that at the next scheduled time no job is picked up.

6)Restore the shared storage to the snap taken and start the volume 

7)After restoring the Scheduler lists A1 and A2 jobs, but none of them are picked up

Comment 2 Anand Avati 2015-06-09 13:29:35 UTC
REVIEW: http://review.gluster.org/11139 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#1) for review on master by Avra Sengupta (asengupt)

Comment 3 Anand Avati 2015-06-09 15:02:27 UTC
REVIEW: http://review.gluster.org/11139 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#2) for review on master by Avra Sengupta (asengupt)

Comment 4 Avra Sengupta 2015-07-05 06:51:20 UTC
Moving it to assigned, as the shared storage brick is wiped clean on node reboot. This happens bcoz shared storage brick is now present at /var/run/gluster/ss_brick, which is a tmpfs

Comment 5 Anand Avati 2015-07-05 06:55:39 UTC
REVIEW: http://review.gluster.org/11533 (glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared stroage's brick) posted (#1) for review on master by Avra Sengupta (asengupt)

Comment 6 Anand Avati 2015-07-06 05:37:26 UTC
REVIEW: http://review.gluster.org/11533 (glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared storage's brick) posted (#2) for review on master by Avra Sengupta (asengupt)

Comment 7 Nagaprasad Sathyanarayana 2015-10-25 14:57:30 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 8 Niels de Vos 2016-06-16 12:58:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user