Bug 1785225

Summary: [snapshot] failed with "Snapshot Commit Failed" error
Product: [Community] GlusterFS Reporter: haoqing <haoqing>
Component: snapshotAssignee: bugs <bugs>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 4.1CC: bugs, pasik, rkavunga, sabose, sunkumar
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:34:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description haoqing 2019-12-19 12:29:21 UTC
Description of problem:
I have a 3 node glusterfs cluster, with version 4.1.5, the daemon is running inside container. When try to do snapshot, failed with Snapshot command failed error. No idea how to debug. Plz help, thanks!

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
sh-4.2# gluster volume status
Status of volume: xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.28.211:/var/lib/heketi/mounts/
vg_56a228b70115bf46499a5075b0304a59/brick_4
cf7c1c0b1a469130163d635fe98b48a/brick       49152     0          Y       13558

Task Status of Volume xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2
------------------------------------------------------------------------------
There are no active volume tasks

2. 
sh-4.2# gluster volume list
xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2

3.
sh-4.2# gluster  snapshot create snap1 xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2
snapshot create: failed: Commit failed on localhost. Please check log file for details.
Snapshot command failed

4.
check the glusterd.log, showing below error, but no idea about the root cause

[2019-12-17 01:33:26.497606] W [MSGID: 106122] [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2019-12-17 01:33:26.497628] E [MSGID: 106122] [glusterd-mgmt.c:1637:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node
[2019-12-17 01:33:26.497648] E [MSGID: 106122] [glusterd-mgmt.c:2539:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
The message "W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.5/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 3 times between [2019-12-17 01:33:26.334197] and [2019-12-17 01:33:26.498294]
[2019-12-17 01:33:26.553252] E [MSGID: 106061] [glusterd-snapshot.c:6413:glusterd_do_snap_cleanup] 0-glusterd: Unable to get volume name
[2019-12-17 01:33:26.553309] W [MSGID: 106039] [glusterd-snapshot.c:8263:glusterd_snapshot_create_postvalidate] 0-management: cleanup operation failed
[2019-12-17 01:33:26.553324] W [MSGID: 106029] [glusterd-snapshot.c:9246:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed
[2019-12-17 01:33:26.553338] W [MSGID: 106120] [glusterd-mgmt.c:469:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed
[2019-12-17 01:33:26.553359] E [MSGID: 106120] [glusterd-mgmt.c:1893:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node
[2019-12-17 01:33:26.553375] E [MSGID: 106121] [glusterd-mgmt.c:2598:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed


Could you help on this?

Comment 1 Sahina Bose 2019-12-23 11:27:53 UTC
Rafi, can you take a look?

Comment 2 Sunny Kumar 2020-02-04 15:27:00 UTC
Hi,

IMO, this issue has been addressed by patch[1].

[1].https://review.gluster.org/#/c/glusterfs/+/20854/.


Can you please check and confirm also please upgrade to our latest release your workload is running on very older version.

/sunny

Comment 3 haoqing 2020-02-19 05:50:40 UTC
@Sunny Kumar thanks, will have a try with latest release. 

For the original issue, I find when glusterfs is running inside container, the created logical volumes are not shown in `lvdisplay` on the host, then snapshot failed. After running `vgscan --cache` on the host, snapshot works.

Comment 4 Worker Ant 2020-03-12 12:34:48 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/909, and will be tracked there from now on. Visit GitHub issues URL for further details