Bug 1223205
Summary: | [Snapshot] Scheduled job is not processed when one of the node of shared storage volume is down | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Avra Sengupta <asengupt> |
Component: | snapshot | Assignee: | Avra Sengupta <asengupt> |
Status: | CLOSED ERRATA | QA Contact: | senaik |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.0 | CC: | annair, ashah, asrivast, nsathyan, rcyriac, rhs-bugs, rjoseph, senaik, storage-qa-internal, vagarwal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | RHGS 3.1.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | Scheduler | ||
Fixed In Version: | glusterfs-3.7.1-8 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1218573 | Environment: | |
Last Closed: | 2015-07-29 04:43:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1218573, 1230399 | ||
Bug Blocks: | 1202842, 1223636 |
Description
Avra Sengupta
2015-05-20 06:06:39 UTC
Upstream Url:http://review.gluster.org/#/c/11139/ RHS 3.1 Url:http://review.gluster.org/#/c/11168/ RHGS 3.1 Url:https://code.engineering.redhat.com/gerrit/#/c/50514/ Version : glusterfs-3.7.1-7.el6rhs.x86_64 Created shared storage using gluster v set all cluster.shared-storage enable which creates shared storage volume with bricks present at /var/run/gluster/ss_brick, which is a tmpfs and on a node reboot shared storage brick is wiped clean and all the jobs created are lost. Proposing this bug as a Blocker since all data on the shared storage volume is lost when the nodes are rebooted. Steps followed : =============== 1) Create a 2 node (Node1 and Node2)cluster and run gluster v set all cluster.shared-storage enable - this creates 1x2 dist rep volume with bricks present at /var/run/gluster/ss_brick gluster v info Volume Name: gluster_shared_storage Type: Replicate Volume ID: be49ed27-8cb3-4ae3-9d20-f5d8f375c0c9 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhs-arch-srv2.lab.eng.blr.redhat.com:/var/run/gluster/ss_brick Brick2: 10.70.34.50:/var/run/gluster/ss_brick Options Reconfigured: performance.readdir-ahead: on cluster.enable-shared-storage: enable 2) Attach Node3 and Node4 to the cluster and mount shared storage on /var/run/gluster/shared_storage 3) Create a volume with bricks from Node3 and Node4 4) Initialise scheduler on all nodes and enable it. Check status from all nodes- it shows enabled 5) Now add job every 10 min on volume 6) Power off Node1 and Node2 (the nodes which host bricks for the shared storage volume) 7) Power on Node2 and check snap_scheduler status on srv2 - disabled 8) check snap_scheduler status on srv3 and srv4 - disabled 9) snap_scheduler list shows no jobs! Moving the bug back to Assigned and proposing it as a 'Blocker' Version : glusterfs-3.7.1-8.el6rhs.x86_64 Followed steps mentioned in Comment 4 and after reboot of the nodes, jobs are listed and scheduler continues to create snapshots snap_scheduler.py list JOB_NAME SCHEDULE OPERATION VOLUME NAME -------------------------------------------------------------------- J1 */5 * * * * Snapshot Create vol0 rebooted the nodes after 5 snapshots were created . After nodes were back up snapshot creation continued. gluster snapshot list |wc -l 184 gluster v info Volume Name: gluster_shared_storage Type: Replicate Volume ID: 1002e97f-2f03-4040-a3f9-a403995a35fa Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhs-arch-srv2.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Brick2: 10.70.34.50:/var/lib/glusterd/ss_brick Options Reconfigured: performance.readdir-ahead: on cluster.enable-shared-storage: enable Marking the bug 'Verified' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html |