Bug 2178510
| Summary: | mksquashfs crashes (signal 11) during installer image build | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> | ||||
| Component: | squashfs-tools | Assignee: | Bruno Wolff III <bruno> | ||||
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rawhide | CC: | bruno | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | openqa | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-03-15 20:27:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2143444 | ||||||
| Attachments: |
|
||||||
|
Description
Adam Williamson
2023-03-15 06:54:30 UTC
I have been running tests of the updated packages locally and so far they have been clean. So it isn't something that always is happening. squashfs-tools-4.6-0.6.20230314git36abab0.fc39 has the memory leak fixes, so if it first broke last night, it might be related to that. As Phillip noted on github most of the recent changes otherwise have been related to documentation. I wasn't planning to build this for f38 until maybe after the f38 release, so it got more testing. I don't think there were any security fixes between 4.5.1 and 4.6. The reason I put it in f39 was to help Phillip get the code test more before he actually published a release. I figured Fedora owes him some help for the work he does on squashfs-tools, which is on his own time. One other change I need to check is one that allows the build to be affected by environment variables when running make. I didn't look at it in depth. At the surface it looked like what we were doing in the spec file for CC. But it might be that something started getting set that was incompatible with libraries or something like that. I'm not sure if that commit got included in 4.6-0.5.20230312gitaaf011a.fc39 or if it didn't get in until the broken update. I'll try to figure this stuff out today. In my comment about plans for the update I realized release was ambiguous. What I meant was I wasn't going to do an f38 update until after f38 final had been out for a while. Beta is already passed and screwing up the final release ISOs would cause a lot of pain. Squashfs-tools will also need a release as well for me to do that, but that is very likely to happen before f38 final. I checked and aaf011a868c786b06e74cbdaf860d45793939f35 was the commit that added the customize makefile on the command line feature and that was in 4.6-0.5.20230312gitaaf011a.fc39 which I think was used to successfully build ISOs. So that commit probably isn't the problem. Yes, the cause should be one of the six commits since then. The test did fail across four executions in openQA (we rerun failed tests once automatically, and we have both prod and staging instances; both tests on both instances failed the same way). It did also break last night's nightly compose, see e.g. https://koji.fedoraproject.org/koji/taskinfo?taskID=98710743 - you can see that ended in "2023-03-15 05:48:50,608: rootfs.img creation failed. See program.log" , indicating the same problem. I'll see if I can get a coredump one way or another (either reproduce manually, or tweak the openQA test to disable the coredump limit). Created attachment 1951092 [details]
backtrace of the crash
Here's a backtrace I was able to get by reproducing this locally.
Think I've got this figured out, am doing a scratch build to confirm. Thanks for fixing this. I'm glad this was caught before the 4.6 squashfs-tools release. I'll do another update for f39 after Phillip merges or incorporates your fix. https://koji.fedoraproject.org/koji/taskinfo?taskID=98730435 has my fix for this backported. While the fix is correct, I'm not sure the cause was. Since j was only initialized and not declared in the for statement, I think it is still in scope after the loop exits. However it would be equal to i at that point and the desired value was i-1, so the subscript after the loop exit should have been j-1, not j. I'm not 100% sure of this and using i-1 seems safer than using j-1, so I'm not advocating changing anything. oh yeah, you might be right there. It probably doesn't matter a lot, though. :D |