Description of problem: Running snapshot creation in loop fails some of the snap creation with pre-validation failed message on tiered volume while large file creation is in progress. Version-Release number of selected component (if applicable): glusterfs-3.7.5-5 How reproducible: Always Steps to Reproduce: 1.Create a tiered volume with 2x(4+2) EC cold tier and 2x2 dist-rep hot tier 2.FUSE mount the volume on a client. 3.Start creating 20000 files on the mount point each of 100KB. 4.Simultaneously start creating 100 snapshots in loop with sleep of 5 in 2 snap creation. 5.Observe that few of the snapshot creation fails while the IO is in progress. (in my case 11 out of 100 snapshots fails with pre-validation failed message). 6.Below are the failed messages in logs: [2015-11-05 21:56:02.975294] E [MSGID: 106062] [glusterd-snapshot.c:1504:glusterd_snap_create_clone_pre_val_use_rsp_dict] 0-management: failed to get the volume count [2015-11-05 21:56:02.975451] E [MSGID: 106062] [glusterd-snapshot.c:1813:glusterd_snap_pre_validate_use_rsp_dict] 0-management: Unable to use rsp dict [2015-11-05 21:56:02.975459] E [MSGID: 106122] [glusterd-mgmt.c:600:glusterd_pre_validate_aggr_rsp_dict] 0-management: Failed to aggregate prevalidate response dictionaries. [2015-11-05 21:56:02.975467] E [MSGID: 106108] [glusterd-mgmt.c:701:gd_mgmt_v3_pre_validate_cbk_fn] 0-management: Failed to aggregate response from node/brick [2015-11-05 21:56:02.975497] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on 10.70.35.140. Please check log file for details. [2015-11-05 21:56:05.833521] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/11f5a41d4df7a19d42d4e641eb784bfa.socket failed (Invalid argument) The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 28 times between [2015-11-05 21:54:39.740819] and [2015-11-05 21:56:05.833576] [2015-11-05 21:56:08.702620] E [MSGID: 106122] [glusterd-mgmt.c:883:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2015-11-05 21:56:08.702694] E [MSGID: 106122] [glusterd-mgmt.c:2164:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed [2015-11-05 22:04:59.652100] E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null) [2015-11-05 22:04:59.652159] E [MSGID: 106572] [glusterd-snapshot.c:2592:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate. [2015-11-05 22:04:59.652201] W [MSGID: 106030] [glusterd-snapshot.c:8380:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2015-11-05 22:04:59.652215] W [MSGID: 106122] [glusterd-mgmt.c:166:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2015-11-05 22:04:59.652228] E [MSGID: 106122] [glusterd-mgmt.c:820:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node [2015-11-05 22:04:59.652247] E [MSGID: 106122] [glusterd-mgmt.c:2164:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed
REVIEW: http://review.gluster.org/12877 (cluster/tier: fix loading tier.so into glusterd) posted (#1) for review on release-3.7 by N Balachandran (nbalacha)
REVIEW: http://review.gluster.org/12877 (cluster/tier: fix loading tier.so into glusterd) posted (#2) for review on release-3.7 by N Balachandran (nbalacha)
COMMIT: http://review.gluster.org/12877 committed in release-3.7 by Dan Lambright (dlambrig) ------ commit 0ef60a5c371359d2a5d0d8684a8a58f1f5801525 Author: N Balachandran <nbalacha> Date: Fri Dec 4 10:34:37 2015 +0530 cluster/tier: fix loading tier.so into glusterd The glusterd process loads the shared libraries of client translators. This failed for tiering due to a reference to dht_methods which is defined as a global variable which is not necessary. The global variable has been removed and this is now a member of dht_conf and is now initialised in the *_init calls. > Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953 > Signed-off-by: N Balachandran <nbalacha> > Reviewed-on: http://review.gluster.org/12863 > Tested-by: NetBSD Build System <jenkins.org> > Tested-by: Gluster Build System <jenkins.com> > Reviewed-by: Dan Lambright <dlambrig> >Tested-by: Dan Lambright <dlambrig> (cherry picked from commit 96fc7f64da2ef09e82845a7ab97574f511a9aae5) Change-Id: If3cc908ebfcd1f165504f15db2e3079d97f3132e BUG: 1288352 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: http://review.gluster.org/12877 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig> Tested-by: Dan Lambright <dlambrig>
REVIEW: http://review.gluster.org/13199 (cluster/tier: allow db queries to be interruptable) posted (#1) for review on release-3.7 by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/13199 (cluster/tier: allow db queries to be interruptable) posted (#2) for review on release-3.7 by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/13199 (cluster/tier: allow db queries to be interruptable) posted (#3) for review on release-3.7 by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/13199 (cluster/tier: allow db queries to be interruptable) posted (#4) for review on release-3.7 by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/13199 (cluster/tier: allow db queries to be interruptable) posted (#5) for review on release-3.7 by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/13199 (cluster/tier: allow db queries to be interruptable) posted (#6) for review on release-3.7 by Dan Lambright (dlambrig)
COMMIT: http://review.gluster.org/13199 committed in release-3.7 by Dan Lambright (dlambrig) ------ commit 92d08cee31044af4b792ed283011bf7287b00883 Author: Dan Lambright <dlambrig> Date: Mon Dec 28 10:57:53 2015 -0500 cluster/tier: allow db queries to be interruptable A query to the database may take a long time if the database has many entries. The tier daemon also sends IPC calls to the bricks which can run slowly, espcially in RHEL6. While it is possible to track down each such instance, the snapshot feature should not be affected by database operations. It requires no migration be underway. Therefore it is okay to pause tiering at any time except when DHT is moving a file. This fix implements this strategy by monitoring when control passes to DHT to migrate a file using the GF_XATTR_FILE_MIGRATE_KEY trigger. If it is not, the pause operation is successful. > Change-Id: I21f168b1bd424077ad5f38cf82f794060a1fabf6 > BUG: 1287842 > Signed-off-by: Dan Lambright <dlambrig> > Reviewed-on: http://review.gluster.org/13104 > Reviewed-by: Joseph Fernandes > Tested-by: Gluster Build System <jenkins.com> Signed-off-by: Dan Lambright <dlambrig> Change-Id: I667e0af24eaa66afefa860c4d73b324e4f39b997 BUG: 1288352 Signed-off-by: Dan Lambright <dlambrig> Reviewed-on: http://review.gluster.org/13199 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report. glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report. glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user