Created attachment 1116509 [details] tcp dump while cthon lock test running Description of problem: cthon lock test hang for gluster nfs Version-Release number of selected component (if applicable): glusterfs-3.7.5-16.el7rhgs.x86_64 How reproducible: Always on this glusterfs build Steps to Reproduce: 1. Create a dist-rep volume 2. Mount using gluster-nfs v3 3. run cthon lock test, it hangs cthon lock test hangs at the 7th test case: Test #7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ]. Parent: Now free child to run, should block on lock. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 4092, 7 ]. Parent: 7.1 - COMPARE [ ffc, 7] PASSED. Parent: Now unlock region so child will unblock. Parent: 7.2 - F_ULOCK [ ffc, 9] PASSED. Attaching the tcp dump from server, while this test was running. Actual results: Expected results: cthon lock test should pass Additional info:
Could you please disable selinux and firewalld once and re-run the test?
Its confirmed that this is an issue is related to port block on nfs client machine. NFS/NLM server sends GRANTED calls to the client once the blocked locks are granted. These calls were being blocked by the client machine. After disabling the firewalld or flushing the IPtables, we observe that the cthon tests are successfully passed. We need to document that nlockmgr ports need to be opened on the NFS client machine as well.
As mentioned above, nlockmgr and rpc.statd services need corresponding ports to be opened on the NFS client machine for lock functionality to work as expected in the RHEL 7 machines. This needs to be documented in the admin guide.
Since we are using iptables flush (iptables -F).. am not sure if this can be documented in the guide. Neils. Can you let us know what is the best way to document this.
These are the steps needed on the RHEL-7 NFS-client. 1. unmount all NFS-exports 2. edit /etc/sysconfig/nfs and uncomment the lines: LOCKD_TCPPORT=32803 LOCKD_UDPPORT=32769 STATD_PORT=662 3. reconfigure the NFS-client # systemctl restart nfs-config # systemctl restart nfs-lock 4. open the listening ports in the persistent firewall configuration # firewall-cmd --add-port=662/tcp --add-port=662/udp --permanent # firewall-cmd --add-port=32803/tcp --add-port=32769/udp --permanent 5. open the listening ports in the runtime firewall configuration # firewall-cmd --add-port=662/tcp --add-port=662/udp # firewall-cmd --add-port=32803/tcp --add-port=32769/udp 6. mount the NFS-exports again 7. verify that 'rpcinfo -p' shows the correct tcp and udp ports Comment #5 does not mention to open the STATD_PORT. This might explain that clearing all firewall rules (with 'iptables -F') caused it to work again.
Bug 1245103 (marked as Depends On) is targetted for 3.1.3. Based on my discussion with Niels and Soumya, it was decided to fix the doc bug for the 3.1.3 release after bug 1245103 is fixed. I shall update the bug if there are any changes to the decision.
The list of ports which need to be opened are: ############### on the server node: NFS service (by default 2049) MOUNT service NLM service Rquota service (current default is 4501) STATD service Since at the moment there are no default ports provided for MOUNT and NLM services, we need to configure them by adding the below two lines in '/etc/ganesha/ganesha.conf' and add firewall rules to open them up. MNT_Port=<mnt_port_num> NLM_Port=<nlm_port_num> This will be fixed by BZ#1300175. We will package ganesha.conf with below default ports configured for all these services and then provide ganesha.xml to be used by firewalld to open those ports. MNT_Port = 20048; NLM_Port = 32803; Rquota_Port = 875; Note: these port numbers are chosen based on the values mentioned in '/etc/sysconfig/nfs' For STATD service, edit below line in '/etc/sysconfig/nfs' file and restart statd service after adding firewall rule to open it. # Port rpc.statd should listen on. #STATD_PORT=662 # Outgoing port statd should used. The default is port # is random #STATD_OUTGOING_PORT=2020 ############### on the client machine: MOUNTD service LOCKD service STATD service To configure these ports, edit below lines in '/etc/sysconfig/nfs' file and restart the services post adding the firewall rules to open up those ports. # Port rpc.statd should listen on. #STATD_PORT=662 # TCP port rpc.lockd should listen on. #LOCKD_TCPPORT=32803 # UDP port rpc.lockd should listen on. #LOCKD_UDPPORT=32769 # Port rpc.mountd should listen on. #MOUNTD_PORT=892
*** Bug 1289112 has been marked as a duplicate of this bug. ***
With bug1264387, the options to configure ports used by LOCKD on client side shall change. Need to make modifications as per that.
Hi Jiffin, As Soumya is on leave, can you let me know the changes that have to be made for the 3.1.3 release.
Thanks Jiffin, I have updated the port details accordingly: http://jenkinscat.gsslab.pnq.redhat.com:8080/view/Gluster/job/doc-Red_Hat_Gluster_Storage-3.1.3-Administration_Guide%20%28html-single%29/lastBuild/artifact/tmp/en-US/html-single/index.html#sect-NFS_Ganesha
There are lot of changes required in 3.1.3 documentation for the contents under important section and also in "defining service ports" section. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1245103#c11 for details and reach out to qe/dev for any more information.
Hi Jiffin, Based on comment 22, there seems to be a lot more changes that is required than the one provided by you. Can you please look into this and let me know what are the changes that are required.
@ Bhavana Below are the changes required with current 3.1.3 doc: **** under the important section**** >>> Under section "Allow the firewall service in the active zones, run the following commands" (Change as below): firewall-cmd --zone=public --add-service=nlm --add-service=nfs --add-service=rpc-bind --add-service=high-availability --add-service=mountd --add-service=rquota firewall-cmd --zone=public --add-service=nlm --add-service=nfs --add-service=rpc-bind --add-service=high-availability --add-service=mountd --add-service=rquota --permanent firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp --permanent >>> On the NFS-client machine, execute the following commands firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \ --add-port=32803/tcp --add-port=32769/udp firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \ --add-port=32803/tcp --add-port=32769/udp --permanent **** Defining Service Ports section **** >>> Remove below point completely, as it is being taken care while installing ganesha packages: Edit '/etc/ganesha/ganesha.conf' as mentioned below: # sed -i '/NFS_Core_Param/a \ \ \ \ \ \ \ \ MNT_Port = 20048;' /etc/ganesha/ganesha.conf # sed -i '/NFS_Core_Param/a \ \ \ \ \ \ \ \ NLM_Port = 32803;' /etc/ganesha/ganesha.conf @Soumya Can you please verify the doc and the above steps once, that i didn't miss anything.
Thanks Shashank, I have updated the sections accordingly: http://jenkinscat.gsslab.pnq.redhat.com:8080/view/Gluster/job/doc-Red_Hat_Gluster_Storage-3.1.3-Administration_Guide%20%28html-single%29/lastBuild/artifact/tmp/en-US/html-single/index.html#sect-NFS_Ganesha @Soumya, let us know if there are any further changes required. Thanks.
Thanks for the corrections. The changes overall good to me. Few minor suggestions - >>> On Red Hat Enterprise Linux 7, enable the NFS-Ganesha firewall service for mountd and HA in the active zones for runtime and permanent mode using the following commands: As we now have nlm and rquota services too, please edit the line to On Red Hat Enterprise Linux 7, enable the NFS-Ganesha firewall service for nfs, rpcbind, mountd, nlm, rquota and HA in the active zones or runtime and permanent mode using the following commands. In addition, configure firewalld to add port '662' which shall be used by statd service. >>> On the NFS-client machine, execute the following commands: On the NFS-client machine, configure firewalld to add ports used by statd and nlm services by executing the following commands:
Thanks for your comments Soumya. Here is the updated link with the changes: http://jenkinscat.gsslab.pnq.redhat.com:8080/view/Gluster/job/doc-Red_Hat_Gluster_Storage-3.1.3-Administration_Guide%20%28html-single%29/lastBuild/artifact/tmp/en-US/html-single/index.html#sect-NFS_Ganesha
Bhavana, Please change the Section as 7.2.4.3.1 under important section: >> Ensure to configure the ports mentioned above. For more information see Defining Service Ports. in Section 7.2.3.3.1 Pre-requisites to run nfs-ganesha, Soumya, Just above the defining Service Ports section, we have mentioned: >> As part of cluster setup, port 4501 is used to bind to the Rquota service. If this port is already in use, assign a different port to this service by modifying following line in ‘/etc/ganesha/ganesha.conf’ file on all the nodes. # Use a non-privileged port for RQuota Rquota_Port = 4501; This needs to be changed here as well keeping in mind the bug (https://bugzilla.redhat.com/show_bug.cgi?id=1332129) Please suggest as appropriate
yes. Good catch. Bhavana, We have now changed Rquota port to 875. So please change '4501' in the above mentioned statement to '875'. Thanks!
The changes are made. Following is the updated link: http://jenkinscat.gsslab.pnq.redhat.com:8080/view/Gluster/job/doc-Red_Hat_Gluster_Storage-3.1.3-Administration_Guide%20%28html-single%29/lastBuild/artifact/tmp/en-US/html-single/index.html#sect-NFS_Ganesha
The changes looks fine in the document mentioned in comment 30 and all the required comments are there now. So marking this bug as Verified.