Description of problem: Running snapshot creation in loop fails some of the snap creation with pre-validation failed message on tiered volume while large file creation is in progress. Version-Release number of selected component (if applicable): glusterfs-3.7.5-5 How reproducible: Always Steps to Reproduce: 1.Create a tiered volume with 2x(4+2) EC cold tier and 2x2 dist-rep hot tier 2.FUSE mount the volume on a client. 3.Start creating 20000 files on the mount point each of 100KB. 4.Simultaneously start creating 100 snapshots in loop with sleep of 5 in 2 snap creation. 5.Observe that few of the snapshot creation fails while the IO is in progress. (in my case 11 out of 100 snapshots fails with pre-validation failed message). 6.Below are the failed messages in logs: [2015-11-05 21:56:02.975294] E [MSGID: 106062] [glusterd-snapshot.c:1504:glusterd_snap_create_clone_pre_val_use_rsp_dict] 0-management: failed to get the volume count [2015-11-05 21:56:02.975451] E [MSGID: 106062] [glusterd-snapshot.c:1813:glusterd_snap_pre_validate_use_rsp_dict] 0-management: Unable to use rsp dict [2015-11-05 21:56:02.975459] E [MSGID: 106122] [glusterd-mgmt.c:600:glusterd_pre_validate_aggr_rsp_dict] 0-management: Failed to aggregate prevalidate response dictionaries. [2015-11-05 21:56:02.975467] E [MSGID: 106108] [glusterd-mgmt.c:701:gd_mgmt_v3_pre_validate_cbk_fn] 0-management: Failed to aggregate response from node/brick [2015-11-05 21:56:02.975497] E [MSGID: 106116] [glusterd-mgmt.c:134:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on 10.70.35.140. Please check log file for details. [2015-11-05 21:56:05.833521] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/11f5a41d4df7a19d42d4e641eb784bfa.socket failed (Invalid argument) The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 28 times between [2015-11-05 21:54:39.740819] and [2015-11-05 21:56:05.833576] [2015-11-05 21:56:08.702620] E [MSGID: 106122] [glusterd-mgmt.c:883:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2015-11-05 21:56:08.702694] E [MSGID: 106122] [glusterd-mgmt.c:2164:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed [2015-11-05 22:04:59.652100] E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null) [2015-11-05 22:04:59.652159] E [MSGID: 106572] [glusterd-snapshot.c:2592:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate. [2015-11-05 22:04:59.652201] W [MSGID: 106030] [glusterd-snapshot.c:8380:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2015-11-05 22:04:59.652215] W [MSGID: 106122] [glusterd-mgmt.c:166:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2015-11-05 22:04:59.652228] E [MSGID: 106122] [glusterd-mgmt.c:820:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node [2015-11-05 22:04:59.652247] E [MSGID: 106122] [glusterd-mgmt.c:2164:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed Actual results: Snapshot creation fails with IO running on tiered volume. Expected results: Snapshot creation should be successful without any issues. Additional info:
Can you provide all the logs , including the tier logs and glusterd complete. Ideally you could also set the client log level to DEBUG.
Created attachment 1094861 [details] logs from one node
This issue is still reproducible with the latest build where few of the snapshots fail when there is a promotion or demotion happening on the volume. Steps to reproduce: 1) Write around 1000 files of 100KB on the tiered volume and make sure these files are in hot tier initially. 2) Now as soon as demotion of these files starts, start creating snapshots in loop with time-interval of 10 secs between each snap create. 3) Observe that the first snapshot fails with pre-validation failed message as "failed to pause tier in snap prevalidate" [2015-12-01 09:14:12.968667] E [MSGID: 106122] [glusterd-mgmt-handler.c:337:glusterd_handle_pre_validate_fn] 0-management: Pre Validation failed on operation Snapshot [2015-12-01 09:14:24.649486] W [MSGID: 101095] [xlator.c:143:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so: undefined symbol: dht_methods [2015-12-01 09:16:17.069099] E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null) [2015-12-01 09:16:17.069171] E [MSGID: 106572] [glusterd-snapshot.c:2592:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate. [2015-12-01 09:16:17.069195] W [MSGID: 106030] [glusterd-snapshot.c:8390:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2015-12-01 09:16:17.069211] W [MSGID: 106122] [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed Also, sometimes snapshots in between fails with the same error. Attached sos report from one node
Created attachment 1100826 [details] sos report from one node
I found tier.so was not loaded by glusterd properly and wrote fix 12840. This seems to be the root cause of the problem, but I have not confirmed.
http://review.gluster.org/12840
This issue is still reproducible with the latest build glusterfs-3.7.5-10. [root@dhcp35-141 b3]# gluster --version glusterfs 3.7.5 built on Dec 8 2015 01:14:33 [root@dhcp35-141 b3]# rpm -qa|grep glusterfs glusterfs-libs-3.7.5-10.el7rhgs.x86_64 glusterfs-api-3.7.5-10.el7rhgs.x86_64 glusterfs-rdma-3.7.5-10.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-10.el7rhgs.x86_64 glusterfs-3.7.5-10.el7rhgs.x86_64 glusterfs-fuse-3.7.5-10.el7rhgs.x86_64 glusterfs-cli-3.7.5-10.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-10.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-10.el7rhgs.x86_64 glusterfs-server-3.7.5-10.el7rhgs.x86_64 Steps followed are as below: 1) Created a tiered volume, started it,enabled quota and attached tier to it. 2) Set the cluster.watermark-low as 1 and cluster.watermark-hi as 2 for the tiered volume. Volume Name: tiervolume Type: Tier Volume ID: 70b879ba-067d-498b-a29e-4003c650b68b Status: Started Number of Bricks: 16 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.35.140:/bricks/brick3/b3 Brick2: 10.70.35.142:/bricks/brick3/b3 Brick3: 10.70.35.141:/bricks/brick3/b3 Brick4: 10.70.35.228:/bricks/brick3/b3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick5: 10.70.35.228:/bricks/brick0/b0 Brick6: 10.70.35.141:/bricks/brick0/b0 Brick7: 10.70.35.142:/bricks/brick0/b0 Brick8: 10.70.35.140:/bricks/brick0/b0 Brick9: 10.70.35.228:/bricks/brick1/b1 Brick10: 10.70.35.141:/bricks/brick1/b1 Brick11: 10.70.35.142:/bricks/brick1/b1 Brick12: 10.70.35.140:/bricks/brick1/b1 Brick13: 10.70.35.228:/bricks/brick2/b2 Brick14: 10.70.35.141:/bricks/brick2/b2 Brick15: 10.70.35.142:/bricks/brick2/b2 Brick16: 10.70.35.140:/bricks/brick2/b2 Options Reconfigured: cluster.watermark-hi: 2 cluster.watermark-low: 1 features.barrier: disable cluster.tier-mode: cache features.ctr-enabled: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on cluster.enable-shared-storage: enable auto-delete: enable snap-activate-on-create: disable 3) Created around 20000 (100KB) files from the mount point. 4) As soon as the demotion of files started, ran a script for creating 50 snapshots with a time interval of 10 sec between each creation. 5) Observed that every second snap creation failed with "failed to pause tier in snap prevalidate" message [2015-12-08 10:22:26.442693] E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null) [2015-12-08 10:22:26.443009] E [MSGID: 106572] [glusterd-snapshot.c:2586:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate. [2015-12-08 10:22:26.443037] W [MSGID: 106030] [glusterd-snapshot.c:8386:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2015-12-08 10:22:26.443330] W [MSGID: 106122] [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2015-12-08 10:22:26.443365] E [MSGID: 106122] [glusterd-mgmt-handler.c:337:glusterd_handle_pre_validate_fn] 0-management: Pre Validation failed on operation Snapshot [2015-12-08 10:23:04.746618] E [MSGID: 106122] [glusterd-mgmt-handler.c:337:glusterd_handle_pre_validate_fn] 0-management: Pre Validation failed on operation Snapshot [2015-12-08 10:23:15.549670] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "W [MSGID: 106122] [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed" repeated 2 times between [2015-12-08 10:22:26.443330] and [2015-12-08 10:23:55.522315] [2015-12-08 10:23:55.522320] E [MSGID: 106122] [glusterd-mgmt-handler.c:337:glusterd_handle_pre_validate_fn] 0-management: Pre Validation failed on operation Snapshot [2015-12-08 10:23:15.552542] I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req The message "E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null)" repeated 2 times between [2015-12-08 10:22:26.442693] and [2015-12-08 10:23:55.522289] The message "E [MSGID: 106572] [glusterd-snapshot.c:2586:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate." repeated 2 times between [2015-12-08 10:22:26.443009] and [2015-12-08 10:23:55.522304] The message "W [MSGID: 106030] [glusterd-snapshot.c:8386:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed" repeated 2 times between [2015-12-08 10:22:26.443037] and [2015-12-08 10:23:55.522312] 6) Out of 50 snapshots, 25 failed with the above error messages in logs. Note: sosreports are at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1278798/New
Tested by increasing the time duration to 60sec between 2 snap creations and with continuous promotions and demotions. And out of 30, around 10 snapshot creation failed with the same error message "Failed to pause tier".
With the latest build glusterfs-3.7.5-11, snapshot creations scheduled via snap_scheduler at every 30 mins fails with the below error messages [2015-12-15 11:00:03.488940] E [MSGID: 106062] [glusterd-snapshot.c:1504:glusterd_snap_create_clone_pre_val_use_rsp_dict] 0-management: failed to get the volume count [2015-12-15 11:00:03.488966] E [MSGID: 106062] [glusterd-snapshot.c:1813:glusterd_snap_pre_validate_use_rsp_dict] 0-management: Unable to use rsp dict [2015-12-15 11:00:03.489100] E [MSGID: 106122] [glusterd-mgmt.c:658:glusterd_pre_validate_aggr_rsp_dict] 0-management: Failed to aggregate prevalidate response dictionaries. [2015-12-15 11:00:03.489109] E [MSGID: 106108] [glusterd-mgmt.c:760:gd_mgmt_v3_pre_validate_cbk_fn] 0-management: Failed to aggregate response from node/brick [2015-12-15 11:00:03.489158] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on 10.70.35.140. Please check log file for details. [2015-12-15 11:00:03.489207] E [MSGID: 106122] [glusterd-mgmt.c:942:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2015-12-15 11:00:03.489226] E [MSGID: 106122] [glusterd-mgmt.c:2224:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed [2015-12-15 11:00:03.487870] E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null) [2015-12-15 11:00:03.488025] E [MSGID: 106572] [glusterd-snapshot.c:2586:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate. [2015-12-15 11:00:03.488066] W [MSGID: 106030] [glusterd-snapshot.c:8386:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2015-12-15 11:00:03.488079] W [MSGID: 106122] [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2015-12-15 11:00:03.488104] E [MSGID: 106122] [glusterd-mgmt-handler.c:337:glusterd_handle_pre_validate_fn] 0-management: Pre Validation failed on operation Snapshot
Verified this bug again with 3.7.5-13 build and the issue is still reproducible with very inconsistent behavior. [root@dhcp35-228 ~]# rpm -qa|grep glusterfs glusterfs-libs-3.7.5-13.el7rhgs.x86_64 glusterfs-api-3.7.5-13.el7rhgs.x86_64 glusterfs-rdma-3.7.5-13.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-12.el7rhgs.x86_64 glusterfs-3.7.5-13.el7rhgs.x86_64 glusterfs-fuse-3.7.5-13.el7rhgs.x86_64 glusterfs-cli-3.7.5-13.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-13.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-13.el7rhgs.x86_64 glusterfs-server-3.7.5-13.el7rhgs.x86_64 1) With a time interval of 30s between each snap creation and running 50 snapshot creations in loop. Result: 3 out of 50 snapshots fails with pre-validation failed message. 2) With a time interval of 60s between each snap creation and running 50 snapshot creations in loop Result: 13 out of 50 snapshots fails with pre-validation failed message. 3) With a time interval of 300s between each snap creation and running 10 snapshots creation in loop. Result: 3 out of 10 snapshot fails with pre-validation failed message. All the snap creations were performed while promotion and demotion of files are going on. The failures are mixed of below two error messages: 1) [2015-12-23 09:28:30.682950] E [MSGID: 106062] [glusterd-snapshot.c:1504:glusterd_snap_create_clone_pre_val_use_rsp_dict] 0-management: failed to get the volume count [2015-12-23 09:28:30.683168] E [MSGID: 106062] [glusterd-snapshot.c:1813:glusterd_snap_pre_validate_use_rsp_dict] 0-management: Unable to use rsp dict [2015-12-23 09:28:30.683185] E [MSGID: 106122] [glusterd-mgmt.c:658:glusterd_pre_validate_aggr_rsp_dict] 0-management: Failed to aggregate prevalidate response dictionaries. [2015-12-23 09:28:30.683201] E [MSGID: 106108] [glusterd-mgmt.c:760:gd_mgmt_v3_pre_validate_cbk_fn] 0-management: Failed to aggregate response from node/brick [2015-12-23 09:28:30.683255] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on 10.70.35.140. Please check log file for details. [2015-12-23 09:28:32.680682] E [MSGID: 106122] [glusterd-mgmt.c:942:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2015-12-23 09:28:32.680733] E [MSGID: 106122] [glusterd-mgmt.c:2224:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed 2) [2015-12-23 09:28:29.253855] E [MSGID: 106572] [glusterd-snapshot.c:1998:glusterd_snapshot_pause_tier] 0-management: Failed to pause tier. Errstr=(null) [2015-12-23 09:28:30.681882] E [MSGID: 106572] [glusterd-snapshot.c:2586:glusterd_snapshot_create_prevalidate] 0-management: Failed to pause tier in snap prevalidate. [2015-12-23 09:28:30.681918] W [MSGID: 106030] [glusterd-snapshot.c:8386:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2015-12-23 09:28:30.681937] W [MSGID: 106122] [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2015-12-23 09:28:30.681972] E [MSGID: 106122] [glusterd-mgmt-handler.c:337:glusterd_handle_pre_validate_fn] 0-management: Pre Validation failed on operation Snapshot. The failure is very inconsistent and we cant predict when it will fail. With the above observations, assigning the bug back. let me know in case of any more information which is required from my side.
Can you recreate sos reports with DEBUG logs ?
Can you get the debug logs and sos reports for the same?
sosreport from two nodes are at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1278798/Latest/. facing some issues while generating sosreport for other two nodes as it is getting hanged. Will resolve the issue and update them as well. Meanwhile you can look into the provided data.
*** Bug 1286761 has been marked as a duplicate of this bug. ***
When the database gets populated with many entries, queries take longer. It is possible the timeout will be exceeded at that time. A callback function is called for every entry found in the query. We should check in that function is pause tier has been invoked. Another rarer issue exists in which a file being migrated at the same time as being updated takes a long time. This also appears to lead to timeouts. The problem appears to be embedded in legacy DHT logic and present on brick add/removal. As its longstanding it will not be covered in this bug.
Performed the test with 3.7.5-14 build packages with were provided and out of 30, 12 snapshots failed with "Failed to pause tier". This test was performed on nodes created over VM's and i am yet to try this on physical machines. sosreport from the test is placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1278798/3.7.5-14/ Will update once i have the results from physcial machine runs.
Was running a script to create 100000 (100KB) files Continuous IO was going on during snap creation and demotions/promotions started at around ~40000 (100KB) files. Kept watermark-low as 5 and watermark-hi as 20.
Are you doing write updates to existing files, or 100% new file creates? I suspect you hit issue #2 in comment 20 and will see if I can confirm in the sos logs.
Can you rerun this with DEBUG logs and post them? gluster v set <vol> diagnostics.client-log-level DEBUG Here is how I recreated the original problem last week, 1. two physical nodes with tiered volume 2. creating 40000 files 3. in parallel, write update the same 100 files in a loop, 3. create a snapshot every 30 seconds, delete it , repeat I ran in tiering "test" mode to force promotion/demotion to happen frequently. I'll try 100K files. Are there any other things you are doing that I am not doing? In the sos logs I do not see the same patterns as I did earlier. I do not see any errors in the server side logs this time.
Once the demotions starts, i write to specific set of files as well. the steps you mentioned above looks pretty much the same, apart from i am running the tests in cache mode with watermark-low as 5 and watermark-hi as 20. Will share the results with debug logs as well.
Results from physical machine runs: 11 out of 50 snapshots failures. The logs from the physical machine run with test mode can be found under: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1278798/Final/
Wrote fix 13104 upstream. 1. Borrowed performance machines and created thin volumes over physical drives, rather than loopback devices. 2. Created 200K files. 3. Continuously updated 100 files in a loop, slept, repeated every minute. 3. Ran overnight creating snapshots every 45 seconds. With fix 13104 saw no errors on step #3. I'll bump step #2 to 500K files. I suspect the loopback devices I had earlier created in step #1 masked problems in my earlier tests as they are slower. Please let me know if this differs from your tests? I'll start getting the fix reviewed. I may ask you to run the fix for sanity and will work through that with Vivek.
Verified this bug with the latest build glusterfs-3.7.5-15 and the results look pretty good with this build. None of the snap creation failed (out of 70), during continuous demotions/promotions and with the same set of tests which caused failures earlier. Build details below: [root@dhcp35-141 b3]# gluster --version glusterfs 3.7.5 built on Jan 7 2016 23:32:29 [root@dhcp35-141 b3]# rpm -qa|grep glusterfs glusterfs-client-xlators-3.7.5-15.el7rhgs.x86_64 glusterfs-cli-3.7.5-15.el7rhgs.x86_64 glusterfs-ganesha-3.7.5-15.el7rhgs.x86_64 glusterfs-3.7.5-15.el7rhgs.x86_64 glusterfs-api-3.7.5-15.el7rhgs.x86_64 glusterfs-fuse-3.7.5-15.el7rhgs.x86_64 glusterfs-server-3.7.5-15.el7rhgs.x86_64 glusterfs-api-devel-3.7.5-15.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-15.el7rhgs.x86_64 glusterfs-libs-3.7.5-15.el7rhgs.x86_64 glusterfs-devel-3.7.5-15.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-15.el7rhgs.x86_64 However, keeping this bug open and letting the state be ON_QA till we finish regression. Will be marked as verified once we complete the regression cycle.
Moving this bug to Verified state. Will be reopened if we hit this issue again during regression cycle.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html