Bug 2063929 - Fedora/FCOS 35 seems to have an issue writting to CephFS mounted filesystems
Summary: Fedora/FCOS 35 seems to have an issue writting to CephFS mounted filesystems
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Jeff Layton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-14 16:24 UTC by John Fortin
Modified: 2022-05-24 18:13 UTC (History)
40 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-24 18:13:06 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description John Fortin 2022-03-14 16:24:50 UTC
Description of problem:

Build process writing to the local OKD image registry are now getting permission denied messages writing with the builder service account. I've tried multiple builds in multiple projects. These worked fine in OKD 4.9...

Version-Release number of selected component (if applicable):

OKD 4.10

How reproducible:

100%

Steps to Reproduce:
1. Build an image and write it to the local registry

Actual results:

Warning: Push failed, retrying in 5s ...
Getting image source signatures
Copying blob sha256:9ffaa45bf5e6266a04fd0689618322dab52cf9d8319d12a48841c29934387b26
Copying blob sha256:10231e77f89e7377b7253ce34c6decef817a2a2515b282b1acbb96d59970e714
Copying blob sha256:3cf11d9cb4970993e49b10f6450f99f858b1c794bbba1ca47291daf62ddcb626
Copying blob sha256:acc2461c71d34a368f915c0b352c6be549b320c7ad3dff9efbb7210cae4f39d3
Copying blob sha256:761b97ab1a7f433105e1ecdb4d51cf0a117facecae590abcb679c285bc8ab7e0
Copying blob sha256:be41b66cdf6346857fd0718fc076b40ad0d0f70e617971c859e39d0d5b5e5a28
Copying blob sha256:8b59f5c45c07ddea0229ea111d3f401728d3547179170e3084c75147fba96435
Copying blob sha256:26b183a2a5b5355f4fc04dffcbda29efeab1c15ce300add22510bd5821387552
Copying blob sha256:040d420a318c713ea0c95c23c0b526dab67e47c9db0e965ab0800a17b05f6078
Copying blob sha256:df962d62a357a8bb27ad8a896ad1e582cfe4fa992836ea44638fb37e97d90249
Copying blob sha256:19f65d057b806b3501ea05c15cabdbb2287dff8a8654f1cb05909818803f5466
Copying blob sha256:a927642233452ff5e4bb2ab7b99129d871229ea4bcfc36126a3413ff04268069
Warning: Push failed, retrying in 5s ...
Registry server Address:
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: writing blob: initiating layer upload to /v2/dev-shop-micro/artifact-micro-addressbook-jdk11/blobs/uploads/ in 
image-registry.openshift-image-registry.svc:5000: received unexpected HTTP status: 500 Internal Server Error

Expected results:

good push to local registry

Additional info:

logs from image registry:

time="2022-03-14T15:44:27.02764624Z" 
level=error 
msg="response completed with error" 
err.code=unknown 
err.detail="filesystem: open /registry/docker/registry/v2/repositories/dev-shop-micro/artifact-micro-addressbook-jdk11/_uploads/16726db4-2946-4ce6-9e83-28ce1980fcd0/data: permission denied" 
err.message="unknown error" 
go.version=go1.17.5 
http.request.host="image-registry.openshift-image-registry.svc:5000" 
http.request.id=a3d79156-6c59-4355-b54f-1b30d409bba1 
http.request.method=POST 
http.request.remoteaddr="10.131.0.142:41420" 
http.request.uri=/v2/dev-shop-micro/artifact-micro-addressbook-jdk11/blobs/uploads/ 
http.request.useragent="containers/5.16.1 (github.com/containers/image)" 
http.response.contenttype="application/json; charset=utf-8" 
http.response.duration=21.812669ms 
http.response.status=500 
http.response.written=279 
openshift.auth.user="system:serviceaccount:dev-shop-micro:builder" 
vars.name=dev-shop-micro/artifact-micro-addressbook-jdk11

Comment 1 John Fortin 2022-03-14 16:42:11 UTC
It may be something different.  Look at the permissions on the data directory:

sh-4.4$ ls -al /registry/docker/registry/v2/repositories/dev-shop-micro/artifact-micro-addressbook-jdk11/_uploads/16726db4-2946-4ce6-9e83-28ce1980fcd0/
ls: cannot access '/registry/docker/registry/v2/repositories/dev-shop-micro/artifact-micro-addressbook-jdk11/_uploads/16726db4-2946-4ce6-9e83-28ce1980fcd0/data': Permission denied
total 1
drwxr-xr-x. 2 1000330000 root  2 Mar 14 15:44 .
drwxr-xr-x. 5 1000330000 root  3 Mar 14 16:32 ..
-?????????? ? ?          ?     ?            ? data
-rw-r--r--. 1 1000330000 root 20 Mar 14 15:44 startedat

so maybe the registry is doing something odd with the data file permissions?

Comment 2 Jan grieb 2022-03-18 07:00:11 UTC
We experience this issue with odd permissions and strange denials of write access with any cephfs volume both on a systems with a clean okd 4.10 installation with rook 1.8.7 / ceph 16.2.7 as well as with a okd 4.9 updated to 4.10. Both installations were based on FCOS 35.

Creating more than one new file on a cephfs volume leads to permission denied errors. Waiting more than a minute between the requests seems to help (but is obviously no solution).


sh-4.4$ echo 1 > /test/1.txt
sh-4.4$ echo 2 > /test/2.txt
sh: /test/2.txt: Permission denied
sh-4.4$ echo 3 > /test/3.txt
sh: /test/3.txt: Permission denied
sh-4.4$ ls -la /test/
ls: cannot access '/test/3.txt': Permission denied
ls: cannot access '/test/2.txt': Permission denied
total 1
drwxrwxrwx. 2 root root  3 Mar 17 22:46 .
dr-xr-xr-x. 1 root root 40 Mar 17 22:44 ..
-rw-r--r--. 1 rook rook  2 Mar 17 22:46 1.txt
-?????????? ? ?    ?     ?            ? 2.txt
-?????????? ? ?    ?     ?            ? 3.txt
sh-4.4$ sleep 120
sh-4.4$ echo > /test/4.txt
sh-4.4$ ls -la /test/
ls: cannot access '/test/3.txt': Permission denied
ls: cannot access '/test/2.txt': Permission denied
total 1
drwxrwxrwx. 2 root root  4 Mar 17 22:48 .
dr-xr-xr-x. 1 root root 40 Mar 17 22:44 ..
-rw-r--r--. 1 rook rook  2 Mar 17 22:46 1.txt
-?????????? ? ?    ?     ?            ? 2.txt
-?????????? ? ?    ?     ?            ? 3.txt
-rw-r--r--. 1 rook rook  1 Mar 17 22:48 4.txt


No problems with rdb block volumes. No hints in logs or on the ceph status dashboard.

Comment 3 Christian Glombek 2022-03-18 10:04:21 UTC
xref'ing the reports/discussions on Github for further context:

https://github.com/openshift/okd/discussions/1153
https://github.com/openshift/okd/issues/1160

Comment 4 John Fortin 2022-03-18 14:34:26 UTC
This actually seems to be an issue with Fedora/FCOS So I am redirecting this to the Fedora team (hopefully)

Comment 5 Sri Ramanujam 2022-04-22 04:29:34 UTC
I managed to track down what I believe to be the root cause of this issue. See https://github.com/openshift/okd/issues/1160#issuecomment-1105940751 and https://github.com/openshift/okd/issues/1160#issuecomment-1105980765 for more details. 

It looks like https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902 is the root cause of the issue we're seeing. Enabling async dirops for Ceph seems to cause SELinux contexts (possibly all xattrs in general, though I did not explicitly test that) to not be propagated correctly on the local mountpoint (ie, the mount on which the files were originally created). I don't have a Ceph cluster or CephFS volumes outside of OKD to test this with, but I believe this repro is sound so long as your Ceph volume mounts have selinux enabled.

Reproduction steps:

1. Mount a CephFS volume.
2. Create a new directory in that volume.
3. Rapidly create a bunch of files in that directory - perhaps with a bash loop like this: for i in $(seq 1 1000); do echo "testing123" > test$i.txt; done
4. Note that the SELinux contexts for all of these newly created files EXCEPT FOR THE FIRST ONE MADE (usually test1.txt) are NOT what you expect, but are instead system_u:object_r:unlabeled_t:s0
5. Unmount the volume and re-mount it.
6. Navigate back to your test folder and note that all of the SELinux contexts are now correct.

Re-mounting the volume with the wsync option (-o wsync) and performing the test should show that all the created files have the correct context on them right away.

Comment 6 Dusty Mabe 2022-04-22 15:21:24 UTC
Thanks Sri,

Can you follow up with the upstream commiter (jlayton) on the issue?

Should we move this bug to the `kernel` component?

Comment 7 Jeff Layton 2022-04-22 22:47:19 UTC
Sounds like a plan. I'll take a look soon. Thanks.

Comment 8 Jeff Layton 2022-04-25 11:04:50 UTC
It looks like we are sending off the SELinux context when we create the files, but then we end up creating the inode locally without setting the xattr. I'm still looking at what the right fix is at this point and I'll need to set up a reproducer to ensure I understand the problem correctly.

Comment 9 Jeff Layton 2022-04-25 16:59:21 UTC
Doing a bit of testing with this today, I made a script to create 10 files, and after the first one (which is created synchronously), the others don't have the same selinux context:

[jlayton@client1 ~]$ ls -lZ /mnt/test/mkfiles
total 0
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 0
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 1
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 2
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 3
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 4
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 5
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 6
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 7
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 8
-rw-r--r--. 1 root root system_u:object_r:unlabeled_t:s0     0 Apr 25 12:54 9


But... if we unmount and mount, then we do see it:

[jlayton@client1 ~]$ sudo umount /mnt/test
[jlayton@client1 ~]$ sudo mount /mnt/test
[jlayton@client1 ~]$ ls -lZ /mnt/test/mkfiles
total 0
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 0
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 1
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 2
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 3
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 4
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 5
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 6
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 7
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 8
-rw-r--r--. 1 root root unconfined_u:object_r:unlabeled_t:s0 0 Apr 25 12:54 9

So the files are being created with the correct contexts, but the asynchronously created inodes aren't inheriting it correctly so you won't see them until you remount (or the inodes get flushed out of the cache). I think we're not passing the xattr blob along correctly to finish the async create. I'll have a look at a fix.

Comment 10 Jeff Layton 2022-04-25 18:53:18 UTC
I just sent a patch that should fix this. I'll see about getting Xiubo to mark it for stable too since this is a regression now that async dirops are enabled.

Comment 12 Dusty Mabe 2022-04-26 01:15:17 UTC
Thanks Jeff!

Comment 13 Dusty Mabe 2022-05-18 15:55:41 UTC
Did this ever land? If so is there an upstream commit hash (or link)?

Comment 14 Jeff Layton 2022-05-18 15:57:57 UTC
Yes:

commit 620239d9a32e9fe27c9204ec11e40058671aeeb6
Author: Jeff Layton <jlayton>
Date:   Mon Apr 25 15:54:27 2022 -0400

    ceph: fix setting of xattrs on async created inodes

Comment 15 Dusty Mabe 2022-05-18 17:40:44 UTC
Looks like that made it into 5.17.9 too:

commit 25633e355cbea61e5a18b938a56f391b7185cf60                                                                                              
Refs: v5.17.8-92-g25633e355cbe                                                                                                               
Author:     Jeff Layton <jlayton>                                                                                                 
AuthorDate: Mon Apr 25 15:54:27 2022 -0400                                                                                                   
Commit:     Greg Kroah-Hartman <gregkh>                                                                                  
CommitDate: Wed May 18 10:28:21 2022 +0200                                                                                                   

    ceph: fix setting of xattrs on async created inodes

Comment 16 Timothée Ravier 2022-05-24 17:28:23 UTC
5.17.9 is now in F35+ so I think we can close this one.


Note You need to log in before you can comment on or make changes to this bug.