+++ This bug was initially created as a clone of Bug #1218573 +++ Description of problem: Scheduler is not picking scheduled jobs, when one of the storage node of shared storage volume is down. Version-Release number of selected component (if applicable): [root@localhost glusterfs]# rpm -qa | grep glusterfs glusterfs-debuginfo-3.7.0alpha0-0.9.git989bea3.el7.centos.x86_64 glusterfs-libs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-fuse-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-extra-xlators-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-geo-replication-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-cli-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-api-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-server-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 glusterfs-devel-3.7.0beta1-0.14.git09bbd5c.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create 2*2 distributed replicate volume. 2. Create shared storage replicate volume on storage node which is not part of volume whose snapshot is scheduled. and mount on each storage node on path /var/run/gluster/shared_storage 3. initialize scheduler on each storage node e.g run snap_scheduler.py init command 4. Enable scheduler on storage nodes e.g run snap_scheduler.py enable 5. Add jobs to create snapshot of volume, with interval of 5 min. e.g snap_scheduler.py add job1 "*/5 * * * *" testvol 6. bring down the both shared storage node. 7. Bring up any one of the shared storage node. Actual results: Scheduled job is not picked by scheduler Expected results: Scheduler should pick the scheduled jobs Additional info: [root@localhost glusterfs]# gluster v info testvol Volume Name: testvol Type: Distributed-Replicate Volume ID: f5eed851-6f24-4cde-903e-7669f5437bc9 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.143:/rhs/brick1/b1 Brick2: 10.70.47.145:/rhs/brick1/b2 Brick3: 10.70.47.150:/rhs/brick1/b3 Brick4: 10.70.47.151:/rhs/brick1/b4 Options Reconfigured: features.quota: on features.quota-deem-statfs: on features.uss: enable features.barrier: disable ==================================== Shared storage volume [root@localhost ~]# gluster v info meta Volume Name: meta Type: Replicate Volume ID: b07daf4e-891d-4022-972a-af181250dc07 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.46.248:/rhs/brick1/b1 Brick2: 10.70.46.251:/rhs/brick1/b2 --- Additional comment from on 2015-05-08 05:45:30 EDT --- Version : glusterfs 3.7.0beta1 built on May 7 2015 ======= Another scenario where jobs are not picked up: 1) Create a dist-rep volume and mount it 2) Create a shared storage and mount it Enable Scheduler and schedule jobs on the volumes snap_scheduler.py add "A1" "*/5 * * * * " "vol1" snap_scheduler: Successfully added snapshot schedule snap_scheduler.py add "A2" "*/10 * * * * " "vol2" snap_scheduler: Successfully added snapshot schedule 3) Take a snapshot of the shared storage gluster snapshot create MV_Snap gluster_shared_storage snapshot create: success: Snap MV_Snap_GMT-2015.05.08-09.20.26 created successfully 4)Add some more jobs - A3 and A4 5)Stop the volume and see that at the next scheduled time no job is picked up. 6)Restore the shared storage to the snap taken and start the volume 7)After restoring the Scheduler lists A1 and A2 jobs, but none of them are picked up --- Additional comment from Anand Avati on 2015-06-09 09:29:35 EDT --- REVIEW: http://review.gluster.org/11139 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#1) for review on master by Avra Sengupta (asengupt) --- Additional comment from Anand Avati on 2015-06-09 11:02:27 EDT --- REVIEW: http://review.gluster.org/11139 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#2) for review on master by Avra Sengupta (asengupt)
REVIEW: http://review.gluster.org/11168 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#1) for review on release-3.7 by Avra Sengupta (asengupt)
REVIEW: http://review.gluster.org/11168 (snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available) posted (#2) for review on release-3.7 by Avra Sengupta (asengupt)
COMMIT: http://review.gluster.org/11168 committed in release-3.7 by Rajesh Joseph (rjoseph) ------ commit 6dd6a7157a3b8e0532b20bb5033fcd146aacc1e6 Author: Avra Sengupta <asengupt> Date: Tue Jun 9 18:00:57 2015 +0530 snapshot/scheduler: Reload /etc/cron.d/glusterfs_snap_cron_tasks when shared storage is available Backport of http://review.gluster.org/#/c/11139/ If shared storage is not accessible, create a flag in /var/run/gluster/ So that when /etc/cron.d/glusterfs_snap_cron_tasks is available again, the flag will tell us, to reload /etc/cron.d/glusterfs_snap_cron_tasks. Change-Id: I41b19f57ff0b8f7e0b820eaf592b0fdedb0a5d86 BUG: 1230399 Signed-off-by: Avra Sengupta <asengupt> Reviewed-on: http://review.gluster.org/11168 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Rajesh Joseph <rjoseph>
Moving it to assigned, as the shared storage brick is wiped clean on node reboot. This happens bcoz shared storage brick is now present at /var/run/gluster/ss_brick, which is a tmpfs
REVIEW: http://review.gluster.org/11534 (glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared stroage's brick) posted (#1) for review on release-3.7 by Avra Sengupta (asengupt)
REVIEW: http://review.gluster.org/11534 (glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared storage's brick) posted (#2) for review on release-3.7 by Avra Sengupta (asengupt)
COMMIT: http://review.gluster.org/11534 committed in release-3.7 by Rajesh Joseph (rjoseph) ------ commit 0efffcdb4eccac48d9eac26d7715ce24493ed753 Author: Avra Sengupta <asengupt> Date: Sun Jul 5 12:21:31 2015 +0530 glusterd/shared_storage: Use /var/lib/glusterd/ss_brick as shared storage's brick Backport of http://review.gluster.org/#/c/11533/ The brick path we use to create shared storage is /var/run/gluster/ss_brick. The problem with using this brick path is /var/run/gluster is a tmpfs and all the brick/shared storage data will be wiped off when the node restarts. Hence using /var/lib/glusterd/ss_brick as the brick path for shared storage volume as this brick and the shared storage volume is internally created by us (albeit on user's request), and contains only internal state data and no user data. Change-Id: I808d1aa3e204a5d2022086d23bdbfdd44a2cfb1c BUG: 1230399 Signed-off-by: Avra Sengupta <asengupt> Reviewed-on: http://review.gluster.org/11534 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Rajesh Joseph <rjoseph>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report. glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user