Bug 2124630

Summary: Please include multi-threading in squashfuse_ll
Product: [Fedora] Fedora EPEL Reporter: Dave Dykstra <dwd>
Component: squashfuseAssignee: Kyle Fazzari <kyrofa>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: epel7CC: epel-packagers-sig, kyrofa, michel
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-16 16:24:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Dykstra 2022-09-06 16:20:32 UTC
Description of problem:

The performance of squashfuse turns out to be very awful for multi-core applications.  Please see for details:

https://github.com/apptainer/apptainer/issues/665

We are about to release a version of apptainer (which I maintain in EPEL and Fedora) that puts a heavy dependence on squashfuse.  The performance of the upstream squashfuse_ll multithreading patch at

https://github.com/vasi/squashfuse/pull/70

makes a huge difference, so much so that we have decided to for now include a patched version of squashfuse_ll in the apptainer distribution.  We would much prefer of course if it would come directly from the EPEL/Fedora version of the squashfuse package.

Version-Release number of selected component (if applicable):

0.1.102

How reproducible:

Very

Steps to Reproduce:

Please see the above issue for details.

Actual results:

With the detailed benchmark on a 16 core el7 node, reading from a local disk takes 6:23, epel squashfuse by itself takes 41:11, and epel squashfuse_ll takes 13:06.

Expected results:

I expect results much more comparable to local disk, and in fact with the multithreaded patch the benchmark time goes down to 6:35, nearly the local disk time.

Additional info:

Would you consider upgrading to squashfuse 0.1.105, including the multithreaded patch, and compiling it with `--enable-multithreading`?  The upstream provider does not seem to be in any hurry to include the patch; he hasn't even commented on the thread which has been available for several months.

Comment 1 Kyle Fazzari 2022-09-06 16:23:31 UTC
In general I'm not a huge fan of the idea of shipping something that isn't upstream. I'd rather avoid maintaining a fork, here.

Comment 2 Dave Dykstra 2022-09-06 16:28:26 UTC
I understand, I would be the same, but this makes such a huge difference and the upstream isn't moving on it.  Do you have any influence on the upstream owner?

Comment 3 Kyle Fazzari 2022-09-16 16:24:52 UTC
I'm afraid I have no influence upstream. I agree, this is actually my biggest annoyance with squashfuse, but it makes me very uncomfortable to carry a 1k-line patch that is unapproved and indeed unreviewed by the upstream maintainer. Let's keep pressure there. Perhaps someone needs to offer to help maintain the upstream project?

Comment 4 Kyle Fazzari 2022-09-16 16:31:21 UTC
By the way, since your experience seems to contrast with the metrics shared in https://github.com/vasi/squashfuse/pull/70#issuecomment-1186259602, you might consider adding your own. Right now it actually looks like the multithreaded version performs quite a bit worse without a crazy number of threads.

Comment 5 Dave Dykstra 2022-09-16 19:44:42 UTC
I did add my own metrics in the github issue that I linked from my comment on that PR.

Comment 6 Kyle Fazzari 2022-09-16 19:53:23 UTC
I doubt vasi will look at that. Obviously reviews are few and far between. Anything we can do to:

1. Make it an easy review so we don't need too many (slow) passes
2. Make it look like a worthwhile PR to review

would be worthwhile. We didn't write the patch, but we could help with (1) by potentially reviewing it and trying to make sure it's in such a shape that, once vasi gets to it, it takes as few passes as possible. (2) is easier: show that the patch is actually worthwhile. If I'm vasi, taking a quick look at PRs, the current comments on that particular one looks like it drags overall performance down. Not sure that would be worth a closer look with limited time.

Comment 7 Dave Dykstra 2022-09-17 18:34:41 UTC
I don't understand what you mean -- why do you doubt vasi look at what I posted in the PR?  I showed an amazing improvement with the PR, and a benchmark showing it basically equivalent to the kernel squashFS.  One person posted something that looked like a decrease in performance at low numbers of threads but never posted methodology even when asked by the author of the PR.