Bug 762693 (GLUSTER-961)
Summary: | Unmount with invalid export crashes nfsx | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Shehjar Tikoo <shehjart> |
Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | nfs-alpha | CC: | gluster-bugs, lakshmipathi |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | RTP | Mount Type: | nfs |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Shehjar Tikoo
2010-05-27 06:02:14 UTC
Steps to reproduce: Say export is /distribute 1. In vsphere client, create a NFS datastore that mounts this export. 2. Stop Glusterfs process running nfsx. 3. Edit the nfsx volfile, change the name of the distribute volume to dist and change the subvolumes line in nfsx. Say the new name is dist, so the export will be /dist. 4. Start the glusterfs process with the nfsx volfile. 5. Mount the new name on a linux nfs client from the command line: $ mount <srv>:/dist /mnt 6. In vSphere, unmount the datastore created earlier. The nfs translator would've crashed in the unmount path. Suppose the export is /distribute and is mounted as such by the ESX nfs client. Then nfsx is restarted but this time the export name is changed to /dist. After the restart, on unmounting from vmware, nfsx crashes because vmware sends the old export name in the unmount request. This is not handled properly in nfsx. Wasnt seen till now because Linux nfs client does not send a unmount request when it sees that the server has restarted. I think it assumes that it is possible the export names to have changed after the restart. The problem is not with vmotion'ing only but also with simple self-heal scenarios. Even with just two replicas and a dd run there are two problems: In a nfs export of a 2-replica volume, no perf translators, just io-threads on posix. Scenario 1: 1. Start a dd run on nfs mount. 2. Bring down the replica that is not the nfs server, since nfs server is also a replica. 3. After a few seconds, restart the downed replica. 4. dd finishes. 5. File on the downed replica is same size as the nfs server replica but corrupted. Scenario 2: 1. Start a dd run on nfs mount. 2. Bring down the replica that is not the nfs server, since nfs server is also a replica. 3. dd finishes. 4. Bring up the downed replica. 4. Do ls -lRh on the mount point. 5. Check the file on the downed replica. It has not been self-healed. Comment three is not applicable for this bug. Typed in wrong browser tab. PATCH: http://patches.gluster.com/patch/3357 in master (mount3: Handle unmount for unknown volume names) Regression Test The test is caused by nfsx trying to remove a non-existent mount from its mount list. Test Case 1. Create a posix+nfsx vol file and start glusterfsd. 2. Start vsphere client tool on a windows machine and connect to the ESX server. 3. If the steps in comment 2 do not crash the nfs server, the test is a success. verified with nfs-beta-rc10 . Regression url - http://test.gluster.com/show_bug.cgi?id=91 |