Description of problem: In Glusterfs, we have support the fencing feature support. With this we can suppor the ALUA feature in LIO/TCMU now. The fencing doc: https://review.gluster.org/#/c/glusterfs-specs/+/21925/6/accepted/fencing.md The fencing test example: https://review.gluster.org/#/c/glusterfs/+/21496/12/tests/basic/fencing/fence-basic.c The LIO/tcmu-runner PR of supporting the ALUA is : https://github.com/open-iscsi/tcmu-runner/pull/554. But currently when testing it based the above PR in tcmu-runner by shutting down of the HA node, and start it after 2~3 minutes, in all the HA nodes we can see that the glfs_file_lock() get stucked, the following is from the /var/log/tcmu-runner.log: ==== 2019-06-06 13:50:15.755 1316 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:50:15.757 1316 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 13:50:55.845 1316 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:50:55.847 1316 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block4: lock call done. lock state 1 2019-06-06 13:57:50.102 1315 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:57:50.103 1315 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 13:57:50.121 1315 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:57:50.132 1315 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block4: lock call done. lock state 1 2019-06-06 14:09:03.654 1328 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 14:09:03.662 1328 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 14:09:06.700 1328 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 ==== The lock operation is never returned. I am using the following glusterfs built by myself: # rpm -qa|grep glusterfs glusterfs-extra-xlators-7dev-0.0.el7.x86_64 glusterfs-api-devel-7dev-0.0.el7.x86_64 glusterfs-7dev-0.0.el7.x86_64 glusterfs-server-7dev-0.0.el7.x86_64 glusterfs-cloudsync-plugins-7dev-0.0.el7.x86_64 glusterfs-resource-agents-7dev-0.0.el7.noarch glusterfs-api-7dev-0.0.el7.x86_64 glusterfs-devel-7dev-0.0.el7.x86_64 glusterfs-regression-tests-7dev-0.0.el7.x86_64 glusterfs-gnfs-7dev-0.0.el7.x86_64 glusterfs-client-xlators-7dev-0.0.el7.x86_64 glusterfs-geo-replication-7dev-0.0.el7.x86_64 glusterfs-debuginfo-7dev-0.0.el7.x86_64 glusterfs-fuse-7dev-0.0.el7.x86_64 glusterfs-events-7dev-0.0.el7.x86_64 glusterfs-libs-7dev-0.0.el7.x86_64 glusterfs-cli-7dev-0.0.el7.x86_64 glusterfs-rdma-7dev-0.0.el7.x86_64 How reproducible: 30%. Steps to Reproduce: 1. create one rep volume(HA >= 2) with the mandantary lock enabled 2. create one gluster-blockd target 3. login and do the fio in the client node 4. shutdown one of the HA nodes, and wait 2 ~3 minutes and start it again Actual results: all the time the fio couldn't recovery and the rw BW will be 0kb/s, and we can see tons of log from /var/log/tcmu-runnner.log file: 2019-06-06 15:01:06.641 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.648 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.648 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.655 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.655 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.661 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.662 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. Expected results: just before the shutdown node is up, the fio could be recovery.
Created attachment 1577819 [details] pstack of on node rhel3
Created attachment 1577820 [details] pstack of on node rhel1
The bt output from the gbd: [root@rhel1 ~]# gdb -p 1325 (gdb) bt #0 0x00007fc7761baf47 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007fc7773de468 in event_dispatch_epoll (event_pool=0x559f03d4b560) at event-epoll.c:847 #2 0x0000559f02419658 in main (argc=21, argv=0x7fff9c6722c8) at glusterfsd.c:2871 (gdb) [root@rhel3 ~]# gdb -p 7669 (gdb) bt #0 0x00007fac80bd9f47 in pthread_join () from /usr/lib64/libpthread.so.0 #1 0x00007fac81dfd468 in event_dispatch_epoll (event_pool=0x55de6f845560) at event-epoll.c:847 #2 0x000055de6f143658 in main (argc=21, argv=0x7ffcafc3eff8) at glusterfsd.c:2871 (gdb) The pl_inode->fop_wind_count is: (gdb) thread 2 [Switching to thread 2 (Thread 0x7fc742184700 (LWP 1829))] #0 0x00007fc7761bd965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) frame 2 #2 0x00007fc76379c13b in pl_lk (frame=frame@entry=0x7fc750001128, this=this@entry=0x7fc75c0128f0, fd=fd@entry=0x7fc73c0977d8, cmd=cmd@entry=6, flock=flock@entry=0x7fc73c076938, xdata=xdata@entry=0x7fc73c071828) at posix.c:2637 2637 ret = pl_lock_preempt(pl_inode, reqlock); (gdb) p pl_inode->fop_wind_count $1 = -30 (gdb) The pstack logs please see the attachments Thanks. BRs
Just a small update: There are cases where fop_wind_count can go -ve. A basic fix will be never to bring its value down if it is zero. I will update more on this later as I am busy with a few other issues ATM. Susant
Hi Susant, Is there any new update about this ? Thanks.
(In reply to Xiubo Li from comment #5) > Hi Susant, > > Is there any new update about this ? > > Thanks. Hey Xiubo, most likely will be sending a patch by end of day today.
(In reply to Susant Kumar Palai from comment #6) > (In reply to Xiubo Li from comment #5) > > Hi Susant, > > > > Is there any new update about this ? > > > > Thanks. > > Hey Xiubo, most likely will be sending a patch by end of day today. Sure and take your time Susant please :-) Thanks very much. BRs
Moved to POST by mistake. Resetting.
REVIEW: https://review.gluster.org/23088 (locks/fencing: Address while lock preemption) posted (#1) for review on master by Susant Palai
Xiubo, if you could test it out, it would be great. (Make sure you enable fencing before you create any client)
(In reply to Susant Kumar Palai from comment #10) > Xiubo, if you could test it out, it would be great. (Make sure you enable > fencing before you create any client) @Susant, Yeah, thanks very much for your work on this. And I will test it late today or tomorrow. BRs Xiubo
@Susant, There 2 issues are found: 1, From my test the glfs_file_lock() sometimes it will takes around 43 seconds, is it normal ? And why ? 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: lxb--------------glfs_file_lock start ... 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: lxb--------------glfs_file_lock end 2, After the lock is broke and all the FIOs callback will return -1, the -EPERM, not the -EBUSY as we discussed before. Is there any change about the return value ? I am only checking the -EBUSY and -ENOTCONN then only after that the lock state in local tcmu node will be changed. Or the local state in tcmu-runner service will always in LOCKED state, but it actually already lost the lock and should be in UNLOCKED state, so all the IOs will fail. Thanks, BRs
(In reply to Xiubo Li from comment #12) > @Susant, > > There 2 issues are found: > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > seconds, is it normal ? And why ? > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > lxb--------------glfs_file_lock start > ... > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > lxb--------------glfs_file_lock end I wonder if it is related to draining of fops. Let me do some testing around this. > > 2, After the lock is broke and all the FIOs callback will return -1, the > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > that the lock state in local tcmu node will be changed. Or the local state > in tcmu-runner service will always in LOCKED state, but it actually already > lost the lock and should be in UNLOCKED state, so all the IOs will fail. This is interesting. Will get back after some code checking. > > > > Thanks, > BRs
(In reply to Susant Kumar Palai from comment #13) > (In reply to Xiubo Li from comment #12) > > @Susant, > > > > There 2 issues are found: > > > > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > > seconds, is it normal ? And why ? > > > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > > lxb--------------glfs_file_lock start > > ... > > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > > lxb--------------glfs_file_lock end > > I wonder if it is related to draining of fops. Let me do some testing around > this. > Sure. > > > > > > 2, After the lock is broke and all the FIOs callback will return -1, the > > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > > that the lock state in local tcmu node will be changed. Or the local state > > in tcmu-runner service will always in LOCKED state, but it actually already > > lost the lock and should be in UNLOCKED state, so all the IOs will fail. > > This is interesting. Will get back after some code checking. > Please take your time @Susant. Thanks, BRs
(In reply to Xiubo Li from comment #12) > @Susant, > > There 2 issues are found: > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > seconds, is it normal ? And why ? > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > lxb--------------glfs_file_lock start > ... > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > lxb--------------glfs_file_lock end Checked the time taken for file_lock and it completes immediately for me. ret = glfs_fsetxattr(fd1, GF_ENFORCE_MANDATORY_LOCK, "set", 8, 0); if (ret < 0) { LOG_ERR("glfs_fsetxattr", errno); ret = -1; goto out; } time(&before); /* take a write mandatory lock */ ret = glfs_file_lock(fd1, F_SETLKW, &lock, GLFS_LK_MANDATORY); if (ret) { LOG_ERR("glfs_file_lock", errno); goto out; } time(&after); diff = (unsigned long )after - before; fprintf(fp, "time %lu %lu %lu\n", diff, before, after); time 0 1563867824 1563867824 Can you attach the brick log here when you run the test next time? > > 2, After the lock is broke and all the FIOs callback will return -1, the > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > that the lock state in local tcmu node will be changed. Or the local state > in tcmu-runner service will always in LOCKED state, but it actually already > lost the lock and should be in UNLOCKED state, so all the IOs will fail. Please attach the brick log after enabling trace logging. <gluster v set <vol> brick-log-level TRACE> > > > > Thanks, > BRs
(In reply to Susant Kumar Palai from comment #15) > (In reply to Xiubo Li from comment #12) > > @Susant, > > > > There 2 issues are found: > > > > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > > seconds, is it normal ? And why ? > > > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > > lxb--------------glfs_file_lock start > > ... > > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > > lxb--------------glfs_file_lock end > > Checked the time taken for file_lock and it completes immediately for me. > > ret = glfs_fsetxattr(fd1, GF_ENFORCE_MANDATORY_LOCK, "set", 8, 0); > if (ret < 0) { > LOG_ERR("glfs_fsetxattr", errno); > ret = -1; > goto out; > } > > time(&before); > /* take a write mandatory lock */ > ret = glfs_file_lock(fd1, F_SETLKW, &lock, GLFS_LK_MANDATORY); > if (ret) { > LOG_ERR("glfs_file_lock", errno); > goto out; > } > time(&after); > diff = (unsigned long )after - before; > fprintf(fp, "time %lu %lu %lu\n", diff, before, after); > > time 0 1563867824 1563867824 > > Can you attach the brick log here when you run the test next time? > > > > > 2, After the lock is broke and all the FIOs callback will return -1, the > > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > > that the lock state in local tcmu node will be changed. Or the local state > > in tcmu-runner service will always in LOCKED state, but it actually already > > lost the lock and should be in UNLOCKED state, so all the IOs will fail. > > Please attach the brick log after enabling trace logging. <gluster v set > <vol> brick-log-level TRACE> > > > > > > Sure, I will do that after my current work handy, possibly late today or tomorrow morning. Thanks BRs
On the permission denied: I did not see any error related to EPERM but saw EBUSY in the brick logs. [2019-07-24 08:15:22.236283] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-07-24 08:15:46.083306] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 29: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.088292] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 31: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.119463] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 33: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.124067] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 35: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.294554] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 37: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.298672] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 39: READV 0 (7db899f8-bf56-4b Is it possible that the lower layer is converting the errnos to EPERM? Can you check gfapi logs and tcmu logs for corresponding error messages and confirm?
(In reply to Susant Kumar Palai from comment #17) > On the permission denied: > > I did not see any error related to EPERM but saw EBUSY in the brick logs. > > > [2019-07-24 08:15:22.236283] E [MSGID: 101191] > [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-07-24 08:15:46.083306] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 29: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.088292] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 31: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.119463] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 33: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.124067] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 35: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.294554] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 37: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.298672] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 39: READV 0 > (7db899f8-bf56-4b > > > Is it possible that the lower layer is converting the errnos to EPERM? Can > you check gfapi logs and tcmu logs for corresponding error messages and > confirm? If so maybe the gfapi is doing this. I will sent you the gfapi logs, the EPERM value comes from the gfapi directly and tcmu-runner do nothing with it. Checked the gfapi log, it is also full of: [2019-07-24 08:23:41.042339] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042381] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042556] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042574] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042655] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042671] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042709] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042722] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042784] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] Checked the gfapi source code: 677 out: 678 if (rsp.op_ret == -1) { 679 gf_msg(this->name, GF_LOG_WARNING, gf_error_to_errno(rsp.op_errno), 680 PC_MSG_REMOTE_OP_FAILED, "remote operation failed"); 681 } else if (rsp.op_ret >= 0) { 682 if (local->attempt_reopen) 683 client_attempt_reopen(local->fd, this); 684 } 685 CLIENT_STACK_UNWIND(writev, frame, rsp.op_ret, 686 gf_error_to_errno(rsp.op_errno), &prestat, &poststat, 687 xdata); 688 689 if (xdata) 690 dict_unref(xdata); It seems the return valume is coverted. Thanks, BRs
(In reply to Xiubo Li from comment #18) > (In reply to Susant Kumar Palai from comment #17) [...] > > Checked the gfapi source code: > > 677 out: > 678 if (rsp.op_ret == -1) { It seems returning the rsp.op_ret here to the callback: static void glfs_async_cbk(glfs_fd_t *fd, ssize_t ret, void *data) Not the rsp.op_errno. > 679 gf_msg(this->name, GF_LOG_WARNING, > gf_error_to_errno(rsp.op_errno), > > 680 PC_MSG_REMOTE_OP_FAILED, "remote operation failed"); > 681 } else if (rsp.op_ret >= 0) { > 682 if (local->attempt_reopen) > 683 client_attempt_reopen(local->fd, this); > 684 } > 685 CLIENT_STACK_UNWIND(writev, frame, rsp.op_ret, > 686 gf_error_to_errno(rsp.op_errno), &prestat, > &poststat, > 687 xdata); > 688 > 689 if (xdata) > 690 dict_unref(xdata); > > > It seems the return valume is coverted. > > Thanks, > BRs
When the ret == -1 and then check the errno directly will works for me now. But I can get both the -EAGAIN and -EBUSY, which only the -EBUSY is expected. Then the problem is why there will always be -EAGAIN every time before acquiring the lock ? Thanks BRs
REVIEW: https://review.gluster.org/23088 (locks/fencing: Address hang while lock preemption) merged (#4) on master by Amar Tumballi
@Susant, Since the Fencing patch has been into the release 6, so this fixing followed should be backported, right ? Thanks. BRs
(In reply to Xiubo Li from comment #22) > @Susant, > > Since the Fencing patch has been into the release 6, so this fixing followed > should be backported, right ? > > Thanks. > BRs Will backport to release 6 and 7.