1230399 – [Snapshot] Scheduled job is not processed when one of the node of shared storage volume is down

Bug 1230399 - [Snapshot] Scheduled job is not processed when one of the node of shared storage volume is down

Summary: [Snapshot] Scheduled job is not processed when one of the node of shared stor...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	snapshot
Sub Component:
Version:	3.7.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Avra Sengupta
QA Contact:
Docs Contact:
URL:
Whiteboard:	Scheduler
Depends On:	1218573
Blocks:	1223205
TreeView+	depends on / blocked

Reported:	2015-06-10 19:51 UTC by Avra Sengupta
Modified:	2015-07-30 09:51 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.3
Clone Of:	1218573
Environment:
Last Closed:	2015-07-30 09:49:41 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Avra Sengupta 2015-06-10 19:51:57 UTC

+++ This bug was initially created as a clone of Bug #1218573 +++

Description of problem:

Scheduler is not picking scheduled jobs, when one of the storage node of shared storage volume is down.

Version-Release number of selected component (if applicable):

[root@localhost glusterfs]# rpm -qa | grep glusterfs
glusterfs-debuginfo-3.7.0alpha0-0.9.git989bea3.el7.centos.x86_64
glusterfs-libs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-fuse-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-extra-xlators-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-geo-replication-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-cli-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-api-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-server-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64
glusterfs-devel-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64


How reproducible:

100%

Steps to Reproduce:

1. Create 2*2 distributed replicate volume.

2. Create  shared storage replicate volume on storage node which is not part of volume whose snapshot is scheduled. and mount on each storage node on path /var/run/gluster/shared_storage 
3. initialize scheduler on each storage node e.g run snap_scheduler.py init  command 
4. Enable scheduler on storage nodes e.g run snap_scheduler.py enable 
5. Add jobs to create snapshot of volume, with interval of 5  min. e.g snap_scheduler.py add job1 "*/5 * * * *" testvol
6. bring down the both shared storage node.
7. Bring up any  one of the shared storage node.

Actual results:

Scheduled job is not picked by scheduler  

Expected results:

Scheduler should pick the scheduled jobs


Additional info:

[root@localhost glusterfs]# gluster v info testvol
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: f5eed851-6f24-4cde-903e-7669f5437bc9
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.47.143:/rhs/brick1/b1
Brick2: 10.70.47.145:/rhs/brick1/b2
Brick3: 10.70.47.150:/rhs/brick1/b3
Brick4: 10.70.47.151:/rhs/brick1/b4
Options Reconfigured:
features.quota: on
features.quota-deem-statfs: on
features.uss: enable
features.barrier: disable
====================================
Shared storage volume

[root@localhost ~]# gluster v info meta
 
Volume Name: meta
Type: Replicate
Volume ID: b07daf4e-891d-4022-972a-af181250dc07
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.46.248:/rhs/brick1/b1
Brick2: 10.70.46.251:/rhs/brick1/b2

--- Additional comment from  on 2015-05-08 05:45:30 EDT ---

Version : glusterfs 3.7.0beta1 built on May  7 2015
=======

Another scenario where jobs are not picked up:

1) Create a dist-rep volume and mount it
 
2) Create a shared storage and mount it 

Enable Scheduler and schedule jobs on the volumes 
snap_scheduler.py add "A1"  "*/5 * * * * " "vol1"
snap_scheduler: Successfully added snapshot schedule

snap_scheduler.py add "A2"  "*/10 * * * * " "vol2"
snap_scheduler: Successfully added snapshot schedule

3) Take a snapshot of the shared storage 
gluster snapshot create MV_Snap gluster_shared_storage 
snapshot create: success: Snap MV_Snap_GMT-2015.05.08-09.20.26 created successfully

4)Add some more jobs - A3 and A4 

5)Stop the volume and see that at the next scheduled time no job is picked up.

6)Restore the shared storage to the snap taken and start the volume 

7)After restoring the Scheduler lists A1 and A2 jobs, but none of them are picked up

--- Additional comment from Anand Avati on 2015-06-09 09:29:35 EDT ---

REVIEW: http://review.gluster.org/11139 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#1) for review on master by Avra Sengupta (asengupt)

--- Additional comment from Anand Avati on 2015-06-09 11:02:27 EDT ---

REVIEW: http://review.gluster.org/11139 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#2) for review on master by Avra Sengupta (asengupt)

Comment 1 Anand Avati 2015-06-10 19:59:22 UTC

REVIEW: http://review.gluster.org/11168 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#1) for review on release-3.7 by Avra Sengupta (asengupt)

Comment 2 Anand Avati 2015-06-15 13:09:14 UTC

REVIEW: http://review.gluster.org/11168 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#2) for review on release-3.7 by Avra Sengupta (asengupt)

Comment 3 Anand Avati 2015-06-19 11:57:55 UTC

COMMIT: http://review.gluster.org/11168 committed in release-3.7 by Rajesh Joseph (rjoseph) 
------
commit 6dd6a7157a3b8e0532b20bb5033fcd146aacc1e6
Author: Avra Sengupta <asengupt>
Date:   Tue Jun 9 18:00:57 2015 +0530

    snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available
    
         Backport of http://review.gluster.org/#/c/11139/
    
    If shared storage is not accessible, create a flag in /var/run/gluster/
    So that when /etc/cron.d/glusterfs_snap_cron_tasks is
    available again, the flag will tell us, to reload
    /etc/cron.d/glusterfs_snap_cron_tasks.
    
    Change-Id: I41b19f57ff0b8f7e0b820eaf592b0fdedb0a5d86
    BUG: 1230399
    Signed-off-by: Avra Sengupta <asengupt>
    Reviewed-on: http://review.gluster.org/11168
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Rajesh Joseph <rjoseph>

Comment 4 Avra Sengupta 2015-07-05 06:56:17 UTC

Moving it to assigned, as the shared storage brick is wiped clean on node reboot. This happens bcoz shared storage brick is now present at /var/run/gluster/ss_brick, which is a tmpfs

Comment 5 Anand Avati 2015-07-05 06:57:21 UTC

REVIEW: http://review.gluster.org/11534 (glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared stroage's brick) posted (#1) for review on release-3.7 by Avra Sengupta (asengupt)

Comment 6 Anand Avati 2015-07-06 05:43:05 UTC

REVIEW: http://review.gluster.org/11534 (glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared storage's brick) posted (#2) for review on release-3.7 by Avra Sengupta (asengupt)

Comment 7 Anand Avati 2015-07-07 05:55:55 UTC

COMMIT: http://review.gluster.org/11534 committed in release-3.7 by Rajesh Joseph (rjoseph) 
------
commit 0efffcdb4eccac48d9eac26d7715ce24493ed753
Author: Avra Sengupta <asengupt>
Date:   Sun Jul 5 12:21:31 2015 +0530

    glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared storage's brick
    
         Backport of http://review.gluster.org/#/c/11533/
    
    The brick path we use to create shared storage is
    /var/run/gluster/ss_brick.
    
    The problem with using this brick path is /var/run/gluster
    is a tmpfs and all the brick/shared storage data will be wiped
    off when the node restarts. Hence using /var/lib/glusterd/ss_brick
    as the brick path for shared storage volume as this brick and
    the shared storage volume is internally created by us (albeit on
    user's request), and contains only internal state data and no user data.
    
    Change-Id: I808d1aa3e204a5d2022086d23bdbfdd44a2cfb1c
    BUG: 1230399
    Signed-off-by: Avra Sengupta <asengupt>
    Reviewed-on: http://review.gluster.org/11534
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Rajesh Joseph <rjoseph>

Comment 8 Kaushal 2015-07-30 09:49:41 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 9 Kaushal 2015-07-30 09:51:32 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.