Hi. Ubuntu 18.04 Description of problem: If you disconnect the second server from the network, then it is impossible to access the file on 1 server. The program freezes for the duration of "network.ping-timeout" or more. Version-Release number of selected component (if applicable): The problem is present on two versions I use: glusterfs 6.7 glusterfs 7.1 How reproducible: File 1.php for server 1: -------------------- <?php while(1) { file_put_contents("/mnt/gluster/text.txt", "text message"); $txt = file_get_contents("/mnt/gluster/text.txt"); echo date("H:i:s") . " : $txt \n"; usleep(490*1000); } -------------------- Steps to Reproduce: server 1: run "php 1.php" server 2: run "service networking stop" server 1: look at the output "php 1.php" my video: https://yadi.sk/i/v6ghR2ETk8wF_A Actual results: 17:54:24 : text message 17:54:24 : text message 17:54:25 : text message 17:54:25 : text message 17:54:26 : text message 17:54:26 : text message 17:54:27 : text message 17:54:27 : text message 17:54:28 : text message 17:54:28 : text message 17:54:29 : text message 17:54:48 : text message 17:54:48 : text message 17:54:49 : text message 17:54:49 : text message 17:54:50 : text message 17:54:50 : text message between 17:54:29 and 17:54:48, a pause of 19 seconds. Additional info: I think the program should not create a hang. Thx.
Hi, Can you please elobarate? Share the details like, how many nodes are in the cluster, configuration of volume which is facing the issue and what are the exact operations performed which led you to this? Thanks, Sanju
two servers: "s1" and "s2" install: add-apt-repository ppa:gluster/glusterfs-6 or add-apt-repository ppa:gluster/glusterfs-7 apt install glusterfs-server On s1 and s2: mkdir /mnt/dir1 On s1: gluster peer probe s2 gluster volume create vol02 replica 2 transport tcp s1:/mnt/dir1 s2:/mnt/dir1 force gluster volume set vol02 network.ping-timeout 10 On s1 and s2 check: gluster peer status Mount on s1 and s2: mkdir /mnt/gluster mount.glusterfs localhost:/vol02 /mnt/gluster
Hi, This is the expected bahaviour. A comment below from Raghavendra G explains it: " Maximum latency of a single fop from application/kernel during a single ungraceful shutdown (hard reboot/ethernet cable pull/hard power down etc) of a hyperconverged node (which has a brick and client of the same volume) is dependent on following things: 1. Time required for client to fail the operations pending on the rebooted brick. These operations can include lock and non-lock operations like (f)inodelk, write, lookup, (f)stat etc. Since this requires client to identify the unresponsive/dead brick it is bound by (2 * network.ping-timeout). 2. Time required for client to acquire a lock on an healthy brick (as clients can be doing transacations in afr). Note that the lock request could be conflicting with a lock already granted to the dead client on rebooted node. So, the lock request from healthy client to a healthy brick cannot proceed till the stale lock from dead client is cleaned up. This means the healthy brick needs to identify that client is dead. A brick can identify a client connected to it is dead using the combination of (tcp-user-timeout and keepalive) tunings on brick/server. There are quite a few scenarios in this case: 2a. Healthy brick never writes a response to dead client. In this case tcp-keepalive tunings on server ((server.keepalive-time + server.keepalive-interval * server.keepalive-count) seconds after last communication with dead client) bounds the maximum time required for brick to cleanup stale locks from dead client. server.tcp-user-timeout has no role in this case 2b. Healthy brick writes a response (maybe one of requests dead-client sent before it died) to socket. Note that writing a response to socket doesn't necessarily mean the dead-client read the response. 2b.i healthy brick tries to write a response after keepalive timer has expired since its last communication with dead client(In reality it can't as keepalive timer expiry would close the connection). In this case since keepalive timer has already closed the connection, maximum time for brick to identify dead client is bound by server.keepalive tunings 2b.ii healthy brick writes a response to socket immediately after last communication with dead-client (i.e., last acked communication with dead client). In this case healthy brick terminates connection to dead-client in server.tcp-user-timeout seconds since last successful communication with dead client. 2b.iii healthy brick writes a response before keepalive timer has expired since its last communication with dead client(case explained by comment #140). Where response is written after keepalive is triggered but before it expired. In this case, tcp-keepalive timer is stopped and tcp-user-timeout timer is started. So, the healthy brick can identify the dead client at a maximum of (server.tcp-user-timeout + server.keepalive) seconds after last communication with dead client Note that 1 and 2 can happen serially based on different transactions done by afr. So the worst case/maximum latency of a fop from application is bounded by (2 * network.ping-timeout + server.tcp-user-timeout + (server.keepalive-time + server.keepalive-interval * server.keepalive-count)) " Since, this is expected, closing it as not a bug.
Thanks. But I believe that this is the wrong behavior of the program. In the first place should be the continuous operation of programs on the current computer. And only then synchronization. To prevent things from happening with other servers in the cluster, the current server should continue to work as if clusterfs did not exist. It's my opinion.
Even if another server in the cluster is down, the current server should continue to work as if glusterfs did not exist.