Bug 1480182
| Summary: | [CephFS] Posix test failed with EPERM error on Ceph-Fuse mount. | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Ramakrishnan Periyasamy <rperiyas> | ||||||||||||
| Component: | CephFS | Assignee: | Patrick Donnelly <pdonnell> | ||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Ramakrishnan Periyasamy <rperiyas> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | medium | ||||||||||||||
| Version: | 3.0 | CC: | ceph-eng-bugs, hnallurv, john.spray, kdreyer, rperiyas | ||||||||||||
| Target Milestone: | rc | ||||||||||||||
| Target Release: | 3.0 | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | RHEL: ceph-12.2.1-1.el7cp Ubuntu: ceph_12.2.1-2redhat1xenial | Doc Type: | If docs needed, set a value | ||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2017-12-05 23:39:05 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 1311694 [details]
Kernel mount logs
Created attachment 1311695 [details]
Normal_linux_folder run logs
attached Normal_linux_Folder, Fuse_mount, kernel_mount POSIX_Compliance tool command execution logs.
> Test summary results for local_linux_folder, FUSE_mount & kernel_mount: > 1. Testing on normal linux folder failed for 4 tests. Test case no's: 141, 145, 149, 153 > 2. Testing in Ceph kernel mount failed for 6 tests. Test case no's: 84, 88, 141, 145, 149, 153 > 3. Testing in Ceph Fuse mount failed for 8 tests. Test case no's: 36-37, 68-69, 83-84, 88, 141, 145, 149, 153 So a "normal linux folder" would be ext4 or xfs? The 141, 145, 149, and 153 tests are perhaps just racy? 84, 69 indicates it expects the file mode to not contain set user/group bits to be set (S_ISUID, S_IGUID). That one concerns me and should be looked at. Since it effects both clients, it probably originates in the MDS. It's interesting that 69 does not fail for the kernel client. Thanks for the report Ramakrishnan. None of these look serious enough to delay Luminous. Moving to 3.1. > 2. Testing in Ceph kernel mount failed for 6 tests. Test case no's: 84, 88, 141, 145, 149, 153 84 and 88 indicate a real bug where the kernel client is not clear suid/sgid bits after chown on a file with executable bits set. > 3. Testing in Ceph Fuse mount failed for 8 tests. Test case no's: 36-37, 68-69, 83-84, 88, 141, 145, 149, 153 36 (and subsequently 37, 68, 69, 83, 84) is caused because supplementary groups are not checked by default in the fuse client. You need to set the config variable "fuse_set_user_groups = true". Probably this should be on by default. 88 fails because 87 was a no-op because 83 failed. Ramanakrishnan, please retest the fuse client with that config variable set to see if that resolves those issues for you. Adding related issue from Ceph bug tracker which does not cause any of the bugs in the report. > 84 and 88 indicate a real bug where the kernel client is not clear suid/sgid bits after chown on a file with executable bits set. Actually, from the kernel client log: ++ /opt/qa/tools/posix-testsuite/tests/chown/../../fstest mkdir fstest_e20b6eaef4417193e1c037ead65bf156 0755 The file is actually a directory. POSIX does not specify clearing the suid/sgid in that case per: http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html The tests 84 and 88 are wrong. Upstream PR for changing "fuse_set_user_groups" to true by default and other fixes for chown POSIX compatibility that do not cause failures in this bz. https://github.com/ceph/ceph/pull/17053 Ran the test with config param "fuse_set_user_groups = true", now kernel and fuse mounts giving same set of test cases failed. Test Summary of Fuse mount: ------------------- /opt/qa/tools/posix-testsuite/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 6) Failed tests: 84, 88, 141, 145, 149, 153 Files=185, Tests=1962, 421 wallclock secs ( 3.43 usr 1.50 sys + 24.66 cusr 42.16 csys = 71.75 CPU) Result: FAIL end: 02:11:11 Test Summary of kernel mount: ------------------- /opt/qa/tools/posix-testsuite/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 6) Failed tests: 84, 88, 141, 145, 149, 153 Files=185, Tests=1962, 428 wallclock secs ( 3.47 usr 1.46 sys + 25.04 cusr 44.23 csys = 74.20 CPU) Result: FAIL end: 02:11:21 Tests "36-37, 68-69" are now not failing for fuse mount. Can you upload the new log files too please. Created attachment 1315154 [details]
Fuse mount logs with param "fuse_set_user_groups = true"
Attaching Fuse mount logs with param "fuse_set_user_groups = true"
Created attachment 1315155 [details]
Kernel mount logs with param "fuse_set_user_groups = true"
Attaching Kernel mount logs with param "fuse_set_user_groups = true"
So to summarize: Test failures 36-37, 68-69, 83 on ceph-fuse were caused by "fuse_set_user_groups" setting default to false. This will be changed to true in a future backport to luminous: http://tracker.ceph.com/issues/21107 Test failures 84 and 88 are caused by the test being broken. POSIX does not mandate S_ISGID and S_ISUID to be dropped on directories after chown. Tests 141, 145, 149, 153 are wrong as well. POSIX is very clear that ctime need not be updated if uid and gid arguments to chown are both -1: "Upon successful completion, chown() shall mark for update the last file status change timestamp of the file, except that if owner is (uid_t)-1 and group is (gid_t)-1, the file status change timestamp need not be marked for update." From: http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html Moving this bug to verified state.
[ubuntu@host058 ~]$ sudo ceph --admin-daemon /var/run/ceph/qetest-mds.host058.asok config diff get fuse_set_user_groups
{
"diff": {
"current": {
"fuse_set_user_groups": "true"
},
"defaults": {
"fuse_set_user_groups": "true"
}
}
}
got same summary result as in comment 10
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387 |
Created attachment 1311693 [details] Fuse mount log Description of problem: Ran POSIX compliance tests on CephFS Fuse and kernel mounts got "EPERM" error on Fuse mount. Tool: Used the POSIX_Compliance tool used by Gluster team, which is available in NFS share (mount -t nfs -o vers=4 rhsqe-repo.lab.eng.blr.redhat.com:/opt /opt ) command used to run: sudo /opt/qa/tools/system_light/run.sh -w /mnt/cephfs/fuse/ -t posix_compliance -l /home/ubuntu/fuse_posix2.log 2> fuse_cmd2.log Total tests available in Tool: 171 Test summary results for local_linux_folder, FUSE_mount & kernel_mount: 1. Testing on normal linux folder failed for 4 tests. Test case no's: 141, 145, 149, 153 2. Testing in Ceph kernel mount failed for 6 tests. Test case no's: 84, 88, 141, 145, 149, 153 3. Testing in Ceph Fuse mount failed for 8 tests. Test case no's: 36-37, 68-69, 83-84, 88, 141, 145, 149, 153 Failure details: 1. Test 84 & 88: The flow of this test is like, create a file, change ownership of file to 06555, then change group twice and check for ownership and group. When executed manually not found issue but the tool is reproducing it in every run. 2. EPERM error: Tests 36, 68 & 83 failed for EPERM error in Fuse client not seeing this issue in kernel client, because of 36, 68 & 83 tests 37, 69 & 84 failed. Flow of test: Create a file with 0644 permissions, change ownership to 65534 then 65533, change permissions to 06555, check the stat and then change groups(65533, 65532) and uid's (65534), tests failed with EPERM error. Version-Release number of selected component (if applicable): ceph: ceph version 12.1.2-1.el7cp (b661348f156f148d764b998b65b90451f096cb27) luminous (rc) How reproducible: 5/5 Steps to Reproduce: 1. Mount NFS share 2. Run posix_compliance tool on CephFS Fuse and kernel mount 3. Test will run for 15mins and will give test summary in the console. 4. To get the commands executed in the tool redirected as mentioned in "command used to run" section. In this log test ID's are available also the sequence of steps followed. Ran this on different FUSE client but got the same issue every time. Actual results: Got EPERM error in CephFS Fuse mount which is not observed in kernel mount. Expected results: NA Additional info: Attaching tool command output logs for normal linux Folder, CephFS Fuse mount and CephFS kernel mount.