Created attachment 1476762 [details] gfapi log Description of problem: Version-Release number of selected component (if applicable): How reproducible: There's mulitple application using gfapi concurrently creating file in the same directory (e51fd83622674cc9) and (e21ea6832d2b13d0) are log from different application processes. application log ----------------- timezone is GMT+8 2018-08-18 19:35:03,703 DEBUG -31021968- writing to file cluster=4 FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88 (app:1461)(e51fd83622674cc9) 2018-08-18 19:35:03,734 DEBUG -32369552- writing to file cluster=4 FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88 (app:1461)(e21ea6832d2b13d0) 2018-08-18 19:35:03,786 DEBUG -31022448- Create new directory [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m] on cluster [4] ((unknown file): 0)(e51fd83622674cc9) 2018-08-18 19:35:03,795 CRITICAL -31021968- Failed to open cluster [4] object [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/ 0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w]: [[Errno 5] Input/output error] (app:1461)(e51fd83622674cc9) 2018-08-18 19:35:03,903 DEBUG -32366672- Directory [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m] exists on cluster [4] ((unknown file): 0)(e21ea6832d2b13d0) 2018-08-18 19:35:03,945 DEBUG -32369552- Open cluster [4] file [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88] with mode [w] (app:1461)(e21ea6832d2b13d0) 2018-08-18 19:35:04,127 DEBUG -31021968- Open cluster [4] file [FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88] with mode [w] (app:1461)(e51fd83622674cc9) 2018-08-18 19:35:04,391 INFO -32369552- Rename file: cluster=4 src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001_bfafdf58a2da11e8a3196c92bf5c1b88 dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0001 (app:1461)(e21ea6832d2b13d0) 2018-08-18 19:35:04,485 INFO -31021968- Rename file: cluster=4 src=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004_bfab2d1ea2da11e8a3196c92bf5c1b88 dst=FS/rt/mbXx/service-log_0/0760dee6406533f5aefa43f83bdd8918_171654947375444628.m/0004 (app:1461)(e51fd83622674cc9) Actual results: IO error happended when creating file, success after retry dht-selfheal failure is observed in gfapi log, there is unmatched inode unlock request reported from brick. Expected results: Additional info: "gluster volume status" output is all ok, but runing "gluster volume heal vol0 info" blocks and no output gluster volume info -------------------- Volume Name: vol0 Type: Distributed-Replicate Volume ID: 18e1c05d-570a-4c97-aa91-ef984881c4f2 Status: Started Snapshot Count: 0 Number of Bricks: 36 x 3 = 108 Transport-type: tcp Options Reconfigured: locks.trace: false client.event-threads: 6 cluster.self-heal-daemon: enable performance.write-behind: True transport.keepalive: True cluster.rebal-throttle: lazy server.event-threads: 4 performance.io-cache: False nfs.disable: True cluster.quorum-type: auto network.ping-timeout: 120 features.cache-invalidation: False performance.read-ahead: False performance.client-io-threads: True cluster.server-quorum-type: none performance.md-cache-timeout: 0 performance.readdir-ahead: True
production cluster recently upgraded from 3.7.6 to gluster-3.10.12
Created attachment 1476763 [details] error log in brick no lock found when doing inode unlock, it seems unblock request was from dht self-heal
Created attachment 1476765 [details] glfsheal log there was many "has not responded in the last 120 seconds, disconnecting." in it
Input/output seen on 0.01% operations, reproduce in every hour
*** Bug 1618925 has been marked as a duplicate of this bug. ***
Host eviroment Centos 7.2 3.10.0-327.18.2.el7.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
I have another offline storage cluster to reproduce the "gluster heal info" hanging problem. "killall glusterd glusterfsd" then restart glusterd did not fix it. I have triggered state dump, attach them later.
Created attachment 1477213 [details] state dump of offline cluster Another offline cluster which upgraded from 3.7.6 to 3.10.12. "gluster volum heal info hang" after stress test. These are state dump file.
gdb on "glfsheal vol0" command, stuck at heal/src/glfs-heal.c:1401
upgraded to 3.12.11, "glfsheal vol0" stuck problem still persist
I have revert the following two commit in release-3.12 branch to get "gluster volume heal vol0 info" working properly. It may lead to blocking client.event-thread. The situation was described in https://lists.gluster.org/pipermail/gluster-users/2018-September/034871.html https://lists.gluster.org/pipermail/gluster-users/2018-September/034876.html commit 086f1d088e257a3abce68a1807692c36172a8c0f Author: Xavier Hernandez <jahernan> Date: Thu Feb 1 16:06:32 2018 +0100 cluster/afr: remove unnecessary child_up initialization The child_up array was initialized with all elements being -1 to allow afr_notify() to differentiate down bricks from bricks that haven't reported yet. With current implementation this is not needed anymore and it was causing unexpected results when other parts of the code considered that if child_up[i] != 0, it meant that it was up. Backport of: > BUG: 1541038 Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472 BUG: 1541930 Signed-off-by: Xavier Hernandez <jahernan> commit 94faf8c4a1f5a020593ca65f44d68e1ba2f632eb Author: karthik-us <ksubrahm> Date: Wed Sep 20 18:30:21 2017 +0530 cluster/afr: Sending subvol up/down events when subvol comes up or goes down > BUG: 1493539 (cherry picked from commit 3bbb4fe4b33dc3a3ed068ed2284077f2a4d8265a) Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec BUG: 1492066 Signed-off-by: karthik-us <ksubrahm>
hi, I went through the logs, as you pointed out, issue seems to be the ping-timeout. It is not clear as to why it is happening though. Ping timeout generally happens when the brick doesn't respond in ping-timeout seconds, which it looks to be configured as 120 seconds. Do you have this problem only with heal info command or the mount's I/O is also hanging (as per your bz description it doesn't look like it, but want a confirmation)? If it only happens with heal info, will it be possible for you to help find the RC? we need to check if there is any dead-lock in the heal-info process. Pranith
It only happend in "heal info", and always reproduciable. "gluster volume status" shows all bricks were online. Mount by fuse were instant. Write throught gfapi are OK most of the time. Except there were very rare case that gfapi write operation return after blocking several hours. It turned out that "heal info" problem was not related to IO error problem. Our heals are all normal for now (reverted the two commit I pointed out earlier). IO error problem still reproduciable for the moment.
>> will it be possible for you to help find the RC? yes, if there are future patches I'm happy to test them out
Created attachment 1485128 [details] glfs heal trace level log with version 3.12.11 using v3.12.11, change log to trace level, with some attitional debug log added.
Created attachment 1485129 [details] glfs heal trace level log with version 3.12.1 with version 3.12.1, sucessful run
(In reply to frostyplanet from comment #13) > Except there were very rare case that gfapi write operation return after > blocking several hours. Ignore this, I checked the log it was a bug not related to gluster
Previously in my debuging with v3.12.11, after setting "network.ping-timeout" to 0, the client side behaviour is looping in reconnecting to all the bricks, success after a long time. If set client.event-thread to 24 (equal to number of bricks), heal info success in 30 seconds. So it leads to my suspect. Afterwards I used git bisect to locate commit 94faf8c4a1 and 086f1d08 were the cause. commit 94faf8c4a1 added two events to fire, but EVENT_AFR_SUBVOL_UP/EVENT_AFR_SUBVOLS_DOWN are not processed in any where of the code. Commit 086f1d08 removed some variable initialization code, seams a regression.
Created attachment 1487508 [details] gfapi log for io error in 3.12.14 New sample of gfapi log in version 3.12.14. IO error happended while creating file
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
Nitya, As per the log in comment-19, dht_create() is failing as it couldn't find any subvolume for doing create I think. When can this happen?
If multiple clients are creating the same directory, it is possible that one client gets an incomplete layout as the mkdir is not complete. It should, however, heal it itself. If for some reason it could not heal the directory we could run into this issue. Can you set client-log-level to DEBUG and send us the logs once you hit this? Or better yet, is there a test program I can use to try this out?
(In reply to Nithya Balachandran from comment #22) > If multiple clients are creating the same directory, it is possible that one > client gets an incomplete layout as the mkdir is not complete. It should, > however, heal it itself. If for some reason it could not heal the directory > we could run into this issue. > > Can you set client-log-level to DEBUG and send us the logs once you hit > this? Or better yet, is there a test program I can use to try this out? Just noticed that trace logs are already provided. I will take a look and get back.
Changing the component to DHT as a part of triaging. Please assign back to replicate if you believe there is something that needs investigation in AFR.
(In reply to Nithya Balachandran from comment #23) > (In reply to Nithya Balachandran from comment #22) > > If multiple clients are creating the same directory, it is possible that one > > client gets an incomplete layout as the mkdir is not complete. It should, > > however, heal it itself. If for some reason it could not heal the directory > > we could run into this issue. > > > > Can you set client-log-level to DEBUG and send us the logs once you hit > > this? Or better yet, is there a test program I can use to try this out? > > Just noticed that trace logs are already provided. I will take a look and > get back. The logs indicate that the client version is 3.10. [2018-08-18 11:35:03.685415] D [write-behind.c:1696:wb_process_queue] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc082b0ffab] (--> /usr/lib64/glusterfs/3.10.12/xlator/performance/write-behind.so(+0x8adb)[0x7fc073b62adb] (--> /usr/lib64/glusterfs/3.10.12/xlator/performance/write-behind.so(+0xb880)[0x7fc073b65880] (--> /lib64/libglusterfs.so.0(default_lookup+0xbd)[0x7fc082b893bd] (--> /usr/lib64/glusterfs/3.10.12/xlator/performance/quick-read.so(+0x2596)[0x7fc07374c596] ))))) 0-vol0-write-behind: processing queues This could be because of https://bugzilla.redhat.com/show_bug.cgi?id=1455104 which was fixed in release 3.12. Do you still see the I/O errors with 3.12 or later?
(In reply to frostyplanet from comment #19) > Created attachment 1487508 [details] > gfapi log for io error in 3.12.14 > > New sample of gfapi log in version 3.12.14. IO error happended while > creating file I do not see IO errors in the gfapi log. Please provide debug/trace logs when the issue is seen in 3.12
This bug is moved to https://github.com/gluster/glusterfs/issues/957, and will be tracked there from now on. Visit GitHub issues URL for further details
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days