1785225 – [snapshot] failed with "Snapshot Commit Failed" error

Bug 1785225 - [snapshot] failed with "Snapshot Commit Failed" error

Summary: [snapshot] failed with "Snapshot Commit Failed" error

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	snapshot
Sub Component:
Version:	4.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-19 12:29 UTC by haoqing
Modified:	2020-03-12 12:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-12 12:34:48 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description haoqing 2019-12-19 12:29:21 UTC

Description of problem:
I have a 3 node glusterfs cluster, with version 4.1.5, the daemon is running inside container. When try to do snapshot, failed with Snapshot command failed error. No idea how to debug. Plz help, thanks!

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
sh-4.2# gluster volume status
Status of volume: xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.28.211:/var/lib/heketi/mounts/
vg_56a228b70115bf46499a5075b0304a59/brick_4
cf7c1c0b1a469130163d635fe98b48a/brick       49152     0          Y       13558

Task Status of Volume xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2
------------------------------------------------------------------------------
There are no active volume tasks

2. 
sh-4.2# gluster volume list
xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2

3.
sh-4.2# gluster  snapshot create snap1 xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2
snapshot create: failed: Commit failed on localhost. Please check log file for details.
Snapshot command failed

4.
check the glusterd.log, showing below error, but no idea about the root cause

[2019-12-17 01:33:26.497606] W [MSGID: 106122] [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2019-12-17 01:33:26.497628] E [MSGID: 106122] [glusterd-mgmt.c:1637:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node
[2019-12-17 01:33:26.497648] E [MSGID: 106122] [glusterd-mgmt.c:2539:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
The message "W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.5/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 3 times between [2019-12-17 01:33:26.334197] and [2019-12-17 01:33:26.498294]
[2019-12-17 01:33:26.553252] E [MSGID: 106061] [glusterd-snapshot.c:6413:glusterd_do_snap_cleanup] 0-glusterd: Unable to get volume name
[2019-12-17 01:33:26.553309] W [MSGID: 106039] [glusterd-snapshot.c:8263:glusterd_snapshot_create_postvalidate] 0-management: cleanup operation failed
[2019-12-17 01:33:26.553324] W [MSGID: 106029] [glusterd-snapshot.c:9246:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed
[2019-12-17 01:33:26.553338] W [MSGID: 106120] [glusterd-mgmt.c:469:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed
[2019-12-17 01:33:26.553359] E [MSGID: 106120] [glusterd-mgmt.c:1893:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node
[2019-12-17 01:33:26.553375] E [MSGID: 106121] [glusterd-mgmt.c:2598:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed


Could you help on this?

Comment 1 Sahina Bose 2019-12-23 11:27:53 UTC

Rafi, can you take a look?

Comment 2 Sunny Kumar 2020-02-04 15:27:00 UTC

Hi,

IMO, this issue has been addressed by patch[1].

[1].https://review.gluster.org/#/c/glusterfs/+/20854/.


Can you please check and confirm also please upgrade to our latest release your workload is running on very older version.

/sunny

Comment 3 haoqing 2020-02-19 05:50:40 UTC

@Sunny Kumar thanks, will have a try with latest release. 

For the original issue, I find when glusterfs is running inside container, the created logical volumes are not shown in `lvdisplay` on the host, then snapshot failed. After running `vgscan --cache` on the host, snapshot works.

Comment 4 Worker Ant 2020-03-12 12:34:48 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/909, and will be tracked there from now on. Visit GitHub issues URL for further details

Note You need to log in before you can comment on or make changes to this bug.