Bug 1480182 - [CephFS] Posix test failed with EPERM error on Ceph-Fuse mount.
[CephFS] Posix test failed with EPERM error on Ceph-Fuse mount.
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: CephFS (Show other bugs)
3.0
Unspecified Unspecified
medium Severity high
: rc
: 3.0
Assigned To: Patrick Donnelly
Ramakrishnan Periyasamy
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-10 06:52 EDT by Ramakrishnan Periyasamy
Modified: 2017-12-05 18:39 EST (History)
5 users (show)

See Also:
Fixed In Version: RHEL: ceph-12.2.1-1.el7cp Ubuntu: ceph_12.2.1-2redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-05 18:39:05 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fuse mount log (138.40 KB, text/plain)
2017-08-10 06:52 EDT, Ramakrishnan Periyasamy
no flags Details
Kernel mount logs (138.36 KB, text/plain)
2017-08-10 06:53 EDT, Ramakrishnan Periyasamy
no flags Details
Normal_linux_folder run logs (138.36 KB, text/plain)
2017-08-10 06:55 EDT, Ramakrishnan Periyasamy
no flags Details
Fuse mount logs with param "fuse_set_user_groups = true" (138.36 KB, text/plain)
2017-08-18 05:54 EDT, Ramakrishnan Periyasamy
no flags Details
Kernel mount logs with param "fuse_set_user_groups = true" (138.37 KB, text/plain)
2017-08-18 05:55 EDT, Ramakrishnan Periyasamy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 21004 None None None 2017-08-16 14:01 EDT

  None (edit)
Description Ramakrishnan Periyasamy 2017-08-10 06:52:37 EDT
Created attachment 1311693 [details]
Fuse mount log

Description of problem:
Ran POSIX compliance tests on CephFS Fuse and kernel mounts got "EPERM" error on Fuse mount.

Tool: Used the POSIX_Compliance tool used by Gluster team, which is available in NFS share (mount -t nfs -o vers=4 rhsqe-repo.lab.eng.blr.redhat.com:/opt /opt
)

command used to run: sudo /opt/qa/tools/system_light/run.sh -w /mnt/cephfs/fuse/ -t posix_compliance -l /home/ubuntu/fuse_posix2.log 2> fuse_cmd2.log

Total tests available in Tool: 171

Test summary results for local_linux_folder, FUSE_mount & kernel_mount:
1. Testing on normal linux folder failed for 4 tests. Test case no's: 141, 145, 149, 153
2. Testing in Ceph kernel mount failed for 6 tests. Test case no's: 84, 88, 
141, 145, 149, 153
3. Testing in Ceph Fuse mount failed for 8 tests. Test case no's: 36-37, 68-69, 83-84, 88, 141, 145, 149, 153

Failure details: 
1. Test 84 & 88: The flow of this test is like, create a file, change ownership of file to 06555, then change group twice and check for ownership and group. When executed manually not found issue but the tool is reproducing it in every run.

2. EPERM error: Tests 36, 68 & 83 failed for EPERM error in Fuse client not seeing this issue in kernel client, because of 36, 68 & 83 tests 37, 69 & 84 failed. 
Flow of test: Create a file with 0644 permissions, change ownership to 65534 then 65533, change permissions to 06555, check the stat and then change groups(65533, 65532) and uid's (65534), tests failed with EPERM error.

Version-Release number of selected component (if applicable):
ceph: ceph version 12.1.2-1.el7cp (b661348f156f148d764b998b65b90451f096cb27) luminous (rc)

How reproducible:
5/5

Steps to Reproduce:
1. Mount NFS share
2. Run posix_compliance tool on CephFS Fuse and kernel mount
3. Test will run for 15mins and will give test summary in the console.
4. To get the commands executed in the tool redirected as mentioned in "command used to run" section. In this log test ID's are available also the sequence of steps followed.

Ran this on different FUSE client but got the same issue every time.

Actual results:
Got EPERM error in CephFS Fuse mount which is not observed in kernel mount.

Expected results:
NA

Additional info:
Attaching tool command output logs for normal linux Folder, CephFS Fuse mount and CephFS kernel mount.
Comment 2 Ramakrishnan Periyasamy 2017-08-10 06:53 EDT
Created attachment 1311694 [details]
Kernel mount logs
Comment 3 Ramakrishnan Periyasamy 2017-08-10 06:55 EDT
Created attachment 1311695 [details]
Normal_linux_folder run logs

attached Normal_linux_Folder, Fuse_mount, kernel_mount POSIX_Compliance tool command execution logs.
Comment 4 Patrick Donnelly 2017-08-15 14:40:13 EDT
> Test summary results for local_linux_folder, FUSE_mount & kernel_mount:
> 1. Testing on normal linux folder failed for 4 tests. Test case no's: 141, 145, 149, 153
> 2. Testing in Ceph kernel mount failed for 6 tests. Test case no's: 84, 88, 
141, 145, 149, 153
> 3. Testing in Ceph Fuse mount failed for 8 tests. Test case no's: 36-37, 68-69, 83-84, 88, 141, 145, 149, 153

So a "normal linux folder" would be ext4 or xfs? The 141, 145, 149, and 153 tests are perhaps just racy?

84, 69 indicates it expects the file mode to not contain set user/group bits to be set (S_ISUID, S_IGUID). That one concerns me and should be looked at. Since it effects both clients, it probably originates in the MDS. It's interesting that 69 does not fail for the kernel client.


Thanks for the report Ramakrishnan.
Comment 5 Patrick Donnelly 2017-08-15 16:12:21 EDT
None of these look serious enough to delay Luminous. Moving to 3.1.
Comment 6 Patrick Donnelly 2017-08-16 14:00:05 EDT
> 2. Testing in Ceph kernel mount failed for 6 tests. Test case no's: 84, 88, 
141, 145, 149, 153

84 and 88 indicate a real bug where the kernel client is not clear suid/sgid bits after chown on a file with executable bits set.

> 3. Testing in Ceph Fuse mount failed for 8 tests. Test case no's: 36-37, 68-69, 83-84, 88, 141, 145, 149, 153

36 (and subsequently 37, 68, 69, 83, 84) is caused because supplementary groups are not checked by default in the fuse client. You need to set the config variable "fuse_set_user_groups = true". Probably this should be on by default.

88 fails because 87 was a no-op because 83 failed.

Ramanakrishnan, please retest the fuse client with that config variable set to see if that resolves those issues for you.
Comment 7 Patrick Donnelly 2017-08-16 14:01:23 EDT
Adding related issue from Ceph bug tracker which does not cause any of the bugs in the report.
Comment 8 Patrick Donnelly 2017-08-16 14:11:47 EDT
> 84 and 88 indicate a real bug where the kernel client is not clear suid/sgid bits after chown on a file with executable bits set.

Actually, from the kernel client log:

++ /opt/qa/tools/posix-testsuite/tests/chown/../../fstest mkdir fstest_e20b6eaef4417193e1c037ead65bf156 0755

The file is actually a directory. POSIX does not specify clearing the suid/sgid in that case per:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html

The tests 84 and 88 are wrong.
Comment 9 Patrick Donnelly 2017-08-16 14:24:11 EDT
Upstream PR for changing "fuse_set_user_groups" to true by default and other fixes for chown POSIX compatibility that do not cause failures in this bz.

https://github.com/ceph/ceph/pull/17053
Comment 10 Ramakrishnan Periyasamy 2017-08-17 02:16:34 EDT
Ran the test with config param "fuse_set_user_groups = true", now kernel and fuse mounts giving same set of test cases failed.

Test Summary of Fuse mount:
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 6)
  Failed tests:  84, 88, 141, 145, 149, 153
Files=185, Tests=1962, 421 wallclock secs ( 3.43 usr  1.50 sys + 24.66 cusr 42.16 csys = 71.75 CPU)
Result: FAIL
end: 02:11:11


Test Summary of kernel mount:
-------------------
/opt/qa/tools/posix-testsuite/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 6)
  Failed tests:  84, 88, 141, 145, 149, 153
Files=185, Tests=1962, 428 wallclock secs ( 3.47 usr  1.46 sys + 25.04 cusr 44.23 csys = 74.20 CPU)
Result: FAIL
end: 02:11:21

Tests "36-37, 68-69" are now not failing for fuse mount.
Comment 11 Patrick Donnelly 2017-08-17 13:37:40 EDT
Can you upload the new log files too please.
Comment 12 Ramakrishnan Periyasamy 2017-08-18 05:54 EDT
Created attachment 1315154 [details]
Fuse mount logs with param "fuse_set_user_groups = true"

Attaching Fuse mount logs with param "fuse_set_user_groups = true"
Comment 13 Ramakrishnan Periyasamy 2017-08-18 05:55 EDT
Created attachment 1315155 [details]
Kernel mount logs with param "fuse_set_user_groups = true"

Attaching Kernel mount logs with param "fuse_set_user_groups = true"
Comment 14 Patrick Donnelly 2017-08-24 17:49:19 EDT
So to summarize:

Test failures 36-37, 68-69, 83 on ceph-fuse were caused by "fuse_set_user_groups" setting default to false. This will be changed to true in a future backport to luminous:

http://tracker.ceph.com/issues/21107

Test failures 84 and 88 are caused by the test being broken. POSIX does not mandate S_ISGID and S_ISUID to be dropped on directories after chown.

Tests 141, 145, 149, 153 are wrong as well. POSIX is very clear that ctime need not be updated if uid and gid arguments to chown are both -1:

"Upon successful completion, chown() shall mark for update the last file status change timestamp of the file, except that if owner is (uid_t)-1 and group is (gid_t)-1, the file status change timestamp need not be marked for update."

From: http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html
Comment 20 Ramakrishnan Periyasamy 2017-10-16 04:02:23 EDT
Moving this bug to verified state.

[ubuntu@host058 ~]$ sudo ceph --admin-daemon /var/run/ceph/qetest-mds.host058.asok config diff get fuse_set_user_groups
{
    "diff": {
        "current": {
            "fuse_set_user_groups": "true"
        },
        "defaults": {
            "fuse_set_user_groups": "true"
        }
    }
}

got same summary result as in comment 10
Comment 23 errata-xmlrpc 2017-12-05 18:39:05 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387

Note You need to log in before you can comment on or make changes to this bug.