A race condition occurs when multiple DBus modules are started concurrently during anaconda initialization, specifically affecting the Localization module. The issue manifests as "Illegal instruction" errors when multiple processes simultaneously access the thread-unsafe langtable C extension library. Reproducible: Sometimes Steps to Reproduce: Reproduction Environment: Live media creation via livemedia-creator Frequency: Intermittent (race condition dependent) Log evidence: Found in live/mock-live-*/build/logs/anaconda/dbus.log Actual Results: Build failures: ~40% of live media builds fail due to this race condition in anaconda Expected Results: live media builds pass Additional Information: *Affected Component* Module: pyanaconda/modules/localization/ Function: _build_layout_infos() in pyanaconda/localization.py Library: langtable C extension *Symptoms* Failed builds: Show "Illegal instruction (core dumped)" errors in DBus logs Successful builds: Complete without langtable-related errors Error pattern: Service org.fedoraproject.Anaconda.Modules.Localization has failed to start: Process exited with status 132 *Root Cause* Concurrent module startup: StartModulesTask starts multiple DBus modules simultaneously Thread-unsafe C extension: langtable library is not safe for concurrent access Race condition: Multiple processes call _build_layout_infos() during LocalizationService initialization Memory corruption: Simultaneous access to langtable's C functions causes "Illegal instruction" errors *Code Locations* Race trigger: pyanaconda/modules/localization/localization.py:82 - self._layout_infos = _build_layout_infos() Concurrent startup: pyanaconda/modules/boss/module_manager/start_modules.py:125-135 langtable usage: pyanaconda/localization.py:397-420 - _build_layout_infos() function *Impact* Build failures: ~40% of live media builds fail due to this race condition Installation blocking: Prevents successful system installation Resource waste: Failed builds consume time and resources *Proposed Solutions* Add synchronization: Implement locks around langtable usage Lazy initialization: Move _build_layout_infos() to on-demand execution Process isolation: Restrict langtable access to main process only Retry mechanism: Add automatic retry for failed module starts *Evidence Files* Failed build logs: live/mock-live-l4564cra/build/logs/anaconda/dbus.log Successful build logs: live/mock-live-rt9e6oqx/build/logs/anaconda/dbus.log Source code: pyanaconda/modules/localization/localization.py *Builds performed* Build Results Summary with Timestamps ✅ SUCCESSFUL Builds (9/15 = 60%) live/mock-live-ouzs7l0z ✅ 2025-08-15 20:30:31,096 - Results are in /build/results live/mock-live-rt9e6oqx ✅ 2025-08-15 22:10:31,624 - Results are in /build/results live/mock-live-4p_pgtlt ✅ 2025-08-15 23:16:53,699 - Results are in /build/results live/mock-live-i4usnz84 ✅ 2025-08-16 00:22:39,141 - Results are in /build/results live/mock-live-z84u4uj7 ✅ 2025-08-16 01:36:36,836 - Results are in /build/results live/mock-live-1fivcx53 ✅ 2025-08-16 02:50:16,707 - Results are in /build/results live/mock-live-6r6p6xnx ✅ 2025-08-16 04:03:33,094 - Results are in /build/results live/mock-live-25p6kg_d ✅ 2025-08-16 05:17:10,352 - Results are in /build/results live/mock-live-r8lwbe6s ✅ 2025-08-16 05:52:30,257 - Complete! (installation finished) ❌ FAILED Builds (6/15 = 40%) live/mock-live-l4564cra ❌ 2025-08-15 16:15:51,069 - Running anaconda failed live/mock-live-j8qj1qom ❌ 2025-08-15 21:06:53,178 - Running anaconda failed live/mock-live-mnfl0m82 ❌ 2025-08-16 00:34:13,983 - Running anaconda failed live/mock-live-0qtpriq3 ❌ 2025-08-16 01:47:45,440 - Running anaconda failed live/mock-live-ahbl9ew7 ❌ 2025-08-16 03:01:47,062 - Running anaconda failed live/mock-live-_arsq4gf ❌ 2025-08-16 04:15:01,157 - Running anaconda failed Key Statistics Total builds: 15 Successful: 9 (60%) Failed: 6 (40%) Failure rate: 40% Timeline Analysis Builds spanned: 2025-08-15 16:15 to 2025-08-16 05:52 (approximately 13.5 hours) Failure pattern: Failures occurred throughout the timeline, not clustered Success pattern: Successful builds also distributed across the timeline Race condition: Intermittent failures confirm the race condition nature The timestamps show that the race condition affects builds randomly throughout the build process, with no clear pattern based on timing, confirming it's a true race condition rather than a systematic issue.
Created attachment 2103817 [details] anaconda_race_condition_logs.tar.gz This archive contains sanitized logs and configuration files from 15 live media creation attempts that demonstrate a race condition in pyanaconda's Localization module. The data shows a clear pattern of intermittent failures (46.7% failure rate) caused by concurrent access to the thread-unsafe langtable C extension library. Contents 15 build directories with complete logs from both successful and failed builds Sanitized log files: build.log, anaconda.log, dbus.log, packaging.log, storage.log Sanitized kickstart files: live-aarch64.ks (personal information removed) Timing analysis: Build timestamps showing success/failure patterns Error evidence: "Illegal instruction" errors in failed builds Key Evidence Race Condition Pattern: 7 failed builds vs 8 successful builds over ~13 hours Error Signature: "Illegal instruction" errors in Localization module initialization Root Cause: Concurrent access to langtable library during _build_layout_infos() calls Reproducibility: Consistent failure pattern across multiple builds Sanitization Applied ✅ Personal usernames replaced with [USERNAME] ✅ SSH keys replaced with [SSH_KEY_REDACTED] ✅ Hostnames replaced with [HOSTNAME] ✅ Password hashes replaced with [PASSWORD_HASH_REDACTED] ✅ Large debug logs replaced with explanatory messages ✅ Post-installation scripts removed from kickstart files Technical Details Compression: Maximum pigz compression (6.4x ratio) Files: 265 total files, 2.5M compressed from 16M original Time Range: 2025-08-15 16:15 to 2025-08-16 05:17 Architecture: aarch64 builds
this is not limited to the localization module, Build: mock-live-0_rl_ix1 Failed Module: org.fedoraproject.Anaconda.Modules.Payloads Error: Process org.fedoraproject.Anaconda.Modules.Payloads exited with status 139 (Segmentation fault) Timing: Failed at 21:29:34 during concurrent module startup Build: mock-live-7d5thxk3 Failed Module: org.fedoraproject.Anaconda.Modules.Storage Error: Process org.fedoraproject.Anaconda.Modules.Storage exited with status 132 (Illegal instruction) Timing: Failed at 22:36:41 during concurrent module startup Build: mock-live-funuh0c9 Failed Module: org.fedoraproject.Anaconda.Modules.Services Error: Process org.fedoraproject.Anaconda.Modules.Services exited with status 132 (Illegal instruction) Timing: Failed at 21:25:53 during concurrent module startup
This is a dupe, but can you add any new information to the original bug? Thanks for looking into it. Note the bug doesn't affect Kiwi, so a good way to avoid this is, build your lives with Kiwi. :D *** This bug has been marked as a duplicate of bug 2247319 ***
Is this really a dupe though? It looks to me like this report clearly finds the cause and suggests fixes, the other bug is against libblockdev and not anaconda. Also 'use kiwi' isn't really an option for some users, especially with something that's been working. It also seems that this bug could possibly hit regular installer users.
It seems like pretty clearly the same bug to me, yeah: SIGILL when building live images on aarch64 (only) with livemedia-creator. I agree that this bug seems to have a new diagnosis that's worth investigating, but that doesn't mean it's a different bug. Closing a later bug as a dupe of an earlier one isn't a judgement of how "good" the report is, it's just done to ensure the discussion doesn't split and everyone who is interested in the bug is aware of this new information. The libblockdev assignment on the original bug was done in https://bugzilla.redhat.com/show_bug.cgi?id=2247319#c16 based on analysis of a backtrace that Kevin got, but after we dug through that for quite some time, we kinda started thinking around https://bugzilla.redhat.com/show_bug.cgi?id=2247319#c75 that we might be barking up the wrong tree. This definitely looks like a different tree to bark up. > Also 'use kiwi' isn't really an option for some users Is it not? We are kind of generally trying to push things lightly towards Kiwi, for various reasons: it's *one* tool that does a lot of the things we want, it's actively maintained by more than one person, there are multiple Fedora-aligned people who vaguely understand how it works (Neal, me, Kevin), and the kiwi template repository has *actual CI* so when we change something in it, we can tell whether it's broken before we break a real compose. If there are any remaining reasons to use lmc beyond "I'm used to it", I think we'd quite like to get rid of them for reasons far beyond this bug.