Description of problem: Gluster nfs started running on one of the nodes of ganesha cluster, even though ganesha was running on it Version-Release number of selected component (if applicable): glusterfs-3.7.1-7.el6rhs.x86_64 nfs-ganesha-2.2.0-3.el6rhs.x86_64 How reproducible: Once Steps to Reproduce: 1. Running root-squash automated testcases 2. Created a volume testvol, performed few rootquash testcases, deleted the volume 3. Now created 2 new volumes nfsvol1 and nfsvol2, and enabled ganesha on it. On one of the server it got exported as nfs volumes and not ganesha volumes, nfs process started on it On all the other nodes it got exported as ganesha volumes Actual results:On one of the server it got exported as nfs volumes and not ganesha volumes, nfs process started on it On all the other nodes it got exported as ganesha volumes Expected results: On all the servers it must be exported as ganesha volume, nfs process should not start on any of the servers when ganesha process is running Additional info: [root@vm1 distaf]# gluster v status Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.64:/var/run/gluster/ss_brick 49152 0 Y 13357 Brick 10.70.46.63:/var/run/gluster/ss_brick 49152 0 Y 12916 Brick 10.70.46.59:/var/run/gluster/ss_brick 49152 0 Y 14208 Self-heal Daemon on localhost N/A N/A Y 14266 Self-heal Daemon on 10.70.46.65 N/A N/A Y 15524 Self-heal Daemon on 10.70.46.69 N/A N/A Y 18675 Self-heal Daemon on 10.70.46.62 N/A N/A Y 27717 Self-heal Daemon on 10.70.46.60 N/A N/A Y 32501 Self-heal Daemon on 10.70.46.64 N/A N/A Y 18666 Self-heal Daemon on 10.70.46.63 N/A N/A Y 25384 Self-heal Daemon on 10.70.46.68 N/A N/A Y 18476 Task Status of Volume gluster_shared_storage ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: nfsvol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.59:/rhs/brick1/brick1/nfsvol 1_brick0 49199 0 Y 13694 Brick 10.70.46.64:/rhs/brick1/brick1/nfsvol 1_brick1 49199 0 Y 18151 Brick 10.70.46.63:/rhs/brick1/brick1/nfsvol 1_brick2 49199 0 Y 24941 Brick 10.70.46.60:/rhs/brick1/brick0/nfsvol 1_brick3 49198 0 Y 32057 Brick 10.70.46.62:/rhs/brick1/brick0/nfsvol 1_brick4 49175 0 Y 27300 Brick 10.70.46.65:/rhs/brick1/brick0/nfsvol 1_brick5 49175 0 Y 15127 Brick 10.70.46.68:/rhs/brick1/brick0/nfsvol 1_brick6 49175 0 Y 18075 Brick 10.70.46.69:/rhs/brick1/brick0/nfsvol 1_brick7 49175 0 Y 18264 Brick 10.70.46.59:/rhs/brick1/brick2/nfsvol 1_brick8 49200 0 Y 13713 Brick 10.70.46.64:/rhs/brick1/brick2/nfsvol 1_brick9 49200 0 Y 18169 Brick 10.70.46.63:/rhs/brick1/brick2/nfsvol 1_brick10 49200 0 Y 24959 Brick 10.70.46.60:/rhs/brick1/brick1/nfsvol 1_brick11 49199 0 Y 32075 NFS Server on localhost 2049 0 Y 14517 Self-heal Daemon on localhost N/A N/A Y 14266 NFS Server on 10.70.46.64 N/A N/A N N/A Self-heal Daemon on 10.70.46.64 N/A N/A Y 18666 NFS Server on 10.70.46.68 N/A N/A N N/A Self-heal Daemon on 10.70.46.68 N/A N/A Y 18476 NFS Server on 10.70.46.65 N/A N/A N N/A Self-heal Daemon on 10.70.46.65 N/A N/A Y 15524 NFS Server on 10.70.46.62 N/A N/A N N/A Self-heal Daemon on 10.70.46.62 N/A N/A Y 27717 NFS Server on 10.70.46.63 N/A N/A N N/A Self-heal Daemon on 10.70.46.63 N/A N/A Y 25384 NFS Server on 10.70.46.69 N/A N/A N N/A Self-heal Daemon on 10.70.46.69 N/A N/A Y 18675 NFS Server on 10.70.46.60 N/A N/A N N/A Self-heal Daemon on 10.70.46.60 N/A N/A Y 32501 Task Status of Volume nfsvol1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: nfsvol2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.59:/rhs/brick1/brick3/nfsvol 2_brick0 49201 0 Y 14177 Brick 10.70.46.64:/rhs/brick1/brick3/nfsvol 2_brick1 49201 0 Y 18546 Brick 10.70.46.63:/rhs/brick1/brick3/nfsvol 2_brick2 49201 0 Y 25323 Brick 10.70.46.60:/rhs/brick1/brick2/nfsvol 2_brick3 49200 0 Y 32444 Brick 10.70.46.62:/rhs/brick1/brick1/nfsvol 2_brick4 49176 0 Y 27672 Brick 10.70.46.65:/rhs/brick1/brick1/nfsvol 2_brick5 49176 0 Y 15485 Brick 10.70.46.68:/rhs/brick1/brick1/nfsvol 2_brick6 49176 0 Y 18437 Brick 10.70.46.69:/rhs/brick1/brick1/nfsvol 2_brick7 49176 0 Y 18635 Brick 10.70.46.59:/rhs/brick1/brick4/nfsvol 2_brick8 49202 0 Y 14195 Brick 10.70.46.64:/rhs/brick1/brick4/nfsvol 2_brick9 49202 0 Y 18564 Brick 10.70.46.63:/rhs/brick1/brick4/nfsvol 2_brick10 49202 0 Y 25341 Brick 10.70.46.60:/rhs/brick1/brick3/nfsvol 2_brick11 49201 0 Y 32470 NFS Server on localhost 2049 0 Y 14517 Self-heal Daemon on localhost N/A N/A Y 14266 NFS Server on 10.70.46.65 N/A N/A N N/A Self-heal Daemon on 10.70.46.65 N/A N/A Y 15524 NFS Server on 10.70.46.69 N/A N/A N N/A Self-heal Daemon on 10.70.46.69 N/A N/A Y 18675 NFS Server on 10.70.46.60 N/A N/A N N/A Self-heal Daemon on 10.70.46.60 N/A N/A Y 32501 NFS Server on 10.70.46.63 N/A N/A N N/A Self-heal Daemon on 10.70.46.63 N/A N/A Y 25384 NFS Server on 10.70.46.64 N/A N/A N N/A Self-heal Daemon on 10.70.46.64 N/A N/A Y 18666 NFS Server on 10.70.46.62 N/A N/A N N/A Self-heal Daemon on 10.70.46.62 N/A N/A Y 27717 NFS Server on 10.70.46.68 N/A N/A N N/A Self-heal Daemon on 10.70.46.68 N/A N/A Y 18476 Task Status of Volume nfsvol2 ------------------------------------------------------------------------------ There are no active volume tasks [root@vm1 distaf]# for i in `seq 1 8`; do echo vm$i; ssh vm$i showmount -e localhost ; echo "-----------------"; done vm1 Export list for localhost: /nfsvol1 * /nfsvol2 * ----------------- vm2 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) ----------------- vm3 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) ----------------- vm4 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) ----------------- vm5 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) ----------------- vm6 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) ----------------- vm7 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) ----------------- vm8 Export list for localhost: /nfsvol1 (everyone) /nfsvol2 (everyone) [root@vm1 distaf]# ps -aux | grep nfs Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ root 1433 0.0 0.0 105452 896 pts/1 S+ 00:46 0:00 less /var/log/glusterfs/nfs.log root 5606 0.3 19.9 5480848 1637560 ? Ssl Jul06 5:43 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 5889 0.0 0.0 103308 828 pts/0 S+ 00:57 0:00 grep nfs root 13694 0.0 0.4 1326056 38456 ? Ssl 00:01 0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol1.10.70.46.59.rhs-brick1-brick1-nfsvol1_brick0 -p /var/lib/glusterd/vols/nfsvol1/run/10.70.46.59-rhs-brick1-brick1-nfsvol1_brick0.pid -S /var/run/gluster/61980065e9cb731bc330384b1a13bc6f.socket --brick-name /rhs/brick1/brick1/nfsvol1_brick0 -l /var/log/glusterfs/bricks/rhs-brick1-brick1-nfsvol1_brick0.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49199 --xlator-option nfsvol1-server.listen-port=49199 root 13713 0.0 0.5 1274832 45328 ? Ssl 00:01 0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol1.10.70.46.59.rhs-brick1-brick2-nfsvol1_brick8 -p /var/lib/glusterd/vols/nfsvol1/run/10.70.46.59-rhs-brick1-brick2-nfsvol1_brick8.pid -S /var/run/gluster/b6804ae6272a2e3c205e46212130c4a4.socket --brick-name /rhs/brick1/brick2/nfsvol1_brick8 -l /var/log/glusterfs/bricks/rhs-brick1-brick2-nfsvol1_brick8.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49200 --xlator-option nfsvol1-server.listen-port=49200 root 14177 0.0 0.4 1070044 38060 ? Ssl 00:02 0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol2.10.70.46.59.rhs-brick1-brick3-nfsvol2_brick0 -p /var/lib/glusterd/vols/nfsvol2/run/10.70.46.59-rhs-brick1-brick3-nfsvol2_brick0.pid -S /var/run/gluster/a3ce24cefc8c02f3ebc20397f5204a24.socket --brick-name /rhs/brick1/brick3/nfsvol2_brick0 -l /var/log/glusterfs/bricks/rhs-brick1-brick3-nfsvol2_brick0.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49201 --xlator-option nfsvol2-server.listen-port=49201 root 14195 0.0 0.4 993236 38704 ? Ssl 00:02 0:00 /usr/sbin/glusterfsd -s 10.70.46.59 --volfile-id nfsvol2.10.70.46.59.rhs-brick1-brick4-nfsvol2_brick8 -p /var/lib/glusterd/vols/nfsvol2/run/10.70.46.59-rhs-brick1-brick4-nfsvol2_brick8.pid -S /var/run/gluster/8d8cec41b11d92c673f70b8dc174b4da.socket --brick-name /rhs/brick1/brick4/nfsvol2_brick8 -l /var/log/glusterfs/bricks/rhs-brick1-brick4-nfsvol2_brick8.log --xlator-option *-posix.glusterd-uuid=ea1dc0ac-28f0-4349-95b6-3c64cc0e39c9 --brick-port 49202 --xlator-option nfsvol2-server.listen-port=49202 root 14517 0.0 2.0 809320 171200 ? Ssl 00:02 0:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/1f8cbbf6b4f5acf88e0a42bccc1ed867.socket
Hi Apeksha, I just looked at your machines. You have an old instance of NFS-Ganesha running on the first machine. This process was started on the 6th of July and is no longer responsive. Ideally, you should have torn down the cluster and this should have all NFS-Ganesha services stopped. This old instance, is still showing up in service nfs-ganesha status. When you run the command, gluster nfs-ganesha enable, we internally execute "service nfs-ganesha start" and since an older instance shows up in, it says success. The real case is, NFS-Ganesha technically does not exist. I have raised a bug for this in upstream, https://bugzilla.redhat.com/show_bug.cgi?id=1119601 If you see the glusterd logs, you can clearly see that ganesha.enable on does stop the gluster NFS service, [2015-07-07 18:32:08.206306] I [MSGID: 106540] [glusterd-utils.c:4153:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully [2015-07-07 18:32:08.206495] I [MSGID: 106540] [glusterd-utils.c:4162:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully [2015-07-07 18:32:08.206676] I [MSGID: 106540] [glusterd-utils.c:4171:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully [2015-07-07 18:32:08.206869] I [MSGID: 106540] [glusterd-utils.c:4180:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully Since NFS-Ganesha is an old instance, the next time you start a volume, gluster NFS starts on that machine. On all the other nodes, NFS-Ganesha is a new and "working" instance and GLuster-NFS does not and cannot come up on those nodes. I had worked on the bug listed above and it is too intermittent to reproduce. If you can reproduce this bug consistently, you can propose this as a blocker.
Reproduced the issue again: 1. Create a 6X2 volume say testvol, perform some root-squash tests Export list for localhost: /testvol (everyone) 2. Stop glusterd on server1 and start it again Stopping glusterd:[ OK ] Starting glusterd:[ OK ] 3. Now delete volume testvol 4. When we create a new volume say nfsvol1, enable ganesha on it, it gets exported as nfs volume Export list for localhost: /nfsvol1 *
I followed the same steps as recorded and it didn't get reproduced. Neither did it get reproduced on my set up and the QE set up. I am not sure how to reproduce it. Please update if you hit it again and attach all the logs
You had also executed refresh-config before you executed these tests.
This bug is fixed as part of another fix, https://bugzilla.redhat.com/show_bug.cgi?id=1242749
ran the automated rootsquash testcases on glusterfs-3.7.1-12.el7rhgs.x86_64, did not hit this issue
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html