Description of problem: I have a 3 node glusterfs cluster, with version 4.1.5, the daemon is running inside container. When try to do snapshot, failed with Snapshot command failed error. No idea how to debug. Plz help, thanks! Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. sh-4.2# gluster volume status Status of volume: xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.16.28.211:/var/lib/heketi/mounts/ vg_56a228b70115bf46499a5075b0304a59/brick_4 cf7c1c0b1a469130163d635fe98b48a/brick 49152 0 Y 13558 Task Status of Volume xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2 ------------------------------------------------------------------------------ There are no active volume tasks 2. sh-4.2# gluster volume list xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2 3. sh-4.2# gluster snapshot create snap1 xxx_kube-system_glusterpvc_a2e0e477-216d-11ea-9927-0016ac101cd2 snapshot create: failed: Commit failed on localhost. Please check log file for details. Snapshot command failed 4. check the glusterd.log, showing below error, but no idea about the root cause [2019-12-17 01:33:26.497606] W [MSGID: 106122] [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed [2019-12-17 01:33:26.497628] E [MSGID: 106122] [glusterd-mgmt.c:1637:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node [2019-12-17 01:33:26.497648] E [MSGID: 106122] [glusterd-mgmt.c:2539:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed The message "W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.5/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 3 times between [2019-12-17 01:33:26.334197] and [2019-12-17 01:33:26.498294] [2019-12-17 01:33:26.553252] E [MSGID: 106061] [glusterd-snapshot.c:6413:glusterd_do_snap_cleanup] 0-glusterd: Unable to get volume name [2019-12-17 01:33:26.553309] W [MSGID: 106039] [glusterd-snapshot.c:8263:glusterd_snapshot_create_postvalidate] 0-management: cleanup operation failed [2019-12-17 01:33:26.553324] W [MSGID: 106029] [glusterd-snapshot.c:9246:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed [2019-12-17 01:33:26.553338] W [MSGID: 106120] [glusterd-mgmt.c:469:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed [2019-12-17 01:33:26.553359] E [MSGID: 106120] [glusterd-mgmt.c:1893:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node [2019-12-17 01:33:26.553375] E [MSGID: 106121] [glusterd-mgmt.c:2598:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed Could you help on this?
Rafi, can you take a look?
Hi, IMO, this issue has been addressed by patch[1]. [1].https://review.gluster.org/#/c/glusterfs/+/20854/. Can you please check and confirm also please upgrade to our latest release your workload is running on very older version. /sunny
@Sunny Kumar thanks, will have a try with latest release. For the original issue, I find when glusterfs is running inside container, the created logical volumes are not shown in `lvdisplay` on the host, then snapshot failed. After running `vgscan --cache` on the host, snapshot works.
This bug is moved to https://github.com/gluster/glusterfs/issues/909, and will be tracked there from now on. Visit GitHub issues URL for further details