Description of problem: 389-ds project have found an issue which causes system instability on all versions of 1.3.x and 1.4.x of the server on i686 platform. This is a hardware limitation of the platform related to how we consume atomic types. This may lead to thread unsafety and other issues. The team have decided that since there have been "no reports" of issues, we would rather remove i686 support than invest time in repairing an unused platform. As a result, we will be removing i686 support from 1.4.x. This will affect f28 and current rawhide.
The excluded arch will also affect freeIPA. As a consequence, freeIPAA is going to drop i686 arch support for all server packages: * freeipa-server * python2-ipaserver * python3-ipaserver * freeipa-server-common * freeipa-server-dns * freeipa-server-trust-ad The change does **not** affect client support. freeipa-client will still support i686. The freeIPA upstream ticket is https://pagure.io/freeipa/issue/7400.
The following comment is copied from the bug 1544386 as this one is the right place to discuss, IMHO. William, could you please respond when you are available? (In reply to Carlos O'Donell from comment #9) > (In reply to Christian Heimes from comment #8) > > Mark, > > > > I was going through William's mails and noticed that only i686 is affected. > > > > ~~~~~~~~~~~~~~~~ > > William wrote in reply to this question: > > > Is it only about i686 or problem is in all 32 bit architectures? > > > Fedora also has "armv7hl" > > > > i686 only. > > ~~~~~~~~~~~~~~~~ > > > > We can lift the restriction and only exclude ix86 architecture. 32bit ARMv7 > > should be fine. > > I'm surprised it's an i686 issue. > > I just double checked 32-bit i686, s390, and ppc. > > On i686 the compiler effectively generates 'lock cmpxchg8b' to work with the > 64-bit values atomically, no problems there. > > On s390 the compiler effectively generates 'cds' (compare double and swap) > to work with the 64-bit values atomically, no problems there. > > On ppc the compiler generates out-of-line calls to > __atomic_compare_exchange_8 and the like and call into libatomic (part of > gcc), so you have to support linking that in for 32-bit ppc. > > There is a lot of use of __atomic_* operations in the code, along with the > potential to use NSPR's PR_Atomic* operations also, so one has to know > exactly which was selected at build time. > > However, I note that slapi_counter.c has at least some instances of relaxed > memory ordering which will loose counter counts, so it has to be a rough > estimate of count, but if you fall back to the pthread operations, the > counts will be accurate. No comments seem to mention this in the API for > slapi_counter_add, slapi_counter_subtract, slapi_counter_set_value, or > slapi_counter_get_value. > > Likewise lfds711_porting_abstraction_layer_compiler.h uses relaxed memory > ordering which could lead to multiple threads succeeding at > LFDS711_PAL_ATOMIC_CAS (depending on the version used), but there the user > in lfds711_freelist_push seems to couple this with load and store barriers > to create synchronization points with other threads to be able to see their > stores. So this looks more correct than the slapi_counter.c functions. > > However, vattr_map_entry_free certainly looks like it could have double-free > calls to slapi_ch_free by using relaxed MO, and this looks like it could > result in double calls to free() in glibc, which is undefined behaviour. > This would need more investigation. > > Overall there are a lot of suspicious uses of relaxed MO in the code which > might cause problems. > > I'm curious to hear what problems arose on i686 and what their root cause > was.
> However, I note that slapi_counter.c has at least some instances of relaxed > memory ordering which will loose counter counts, so it has to be a rough > estimate of count, but if you fall back to the pthread operations, the > counts will be accurate. No comments seem to mention this in the API for > slapi_counter_add, slapi_counter_subtract, slapi_counter_set_value, or > slapi_counter_get_value. > > Likewise lfds711_porting_abstraction_layer_compiler.h uses relaxed memory > ordering which could lead to multiple threads succeeding at > LFDS711_PAL_ATOMIC_CAS (depending on the version used), but there the user > in lfds711_freelist_push seems to couple this with load and store barriers > to create synchronization points with other threads to be able to see their > stores. So this looks more correct than the slapi_counter.c functions. > > However, vattr_map_entry_free certainly looks like it could have double-free > calls to slapi_ch_free by using relaxed MO, and this looks like it could > result in double calls to free() in glibc, which is undefined behaviour. > This would need more investigation. > > Overall there are a lot of suspicious uses of relaxed MO in the code which > might cause problems. Carlos: Are these i686 specific issues?
(In reply to Peter Robinson from comment #3) > > Overall there are a lot of suspicious uses of relaxed MO in the code which > > might cause problems. > > Carlos: Are these i686 specific issues? No, they are not specific to i686. These issues can affect all arches. I would strongly suggest all removal of direct atomic manipulations, and instead simply use pthread locks until it can be shown they are too expensive and that bespoke lockless algorithms need to be used. Granted that trivial uses of atomics are OK, but there are some rather complex uses of relaxed MO which seem entirely incorrect.
(In reply to Carlos O'Donell from comment #4) > (In reply to Peter Robinson from comment #3) > > > Overall there are a lot of suspicious uses of relaxed MO in the code which > > > might cause problems. > > > > Carlos: Are these i686 specific issues? > > No, they are not specific to i686. These issues can affect all arches. I > would strongly suggest all removal of direct atomic manipulations, and > instead simply use pthread locks until it can be shown they are too > expensive and that bespoke lockless algorithms need to be used. Granted that > trivial uses of atomics are OK, but there are some rather complex uses of > relaxed MO which seem entirely incorrect. ... I just want to reiterate that these are not my decisions to take though. The package maintainer is responsible for these decisions. All I can do is review the code and offer advice and support. I am looking into the claims that 64-bit atomics do not work on i686 though since this would be a serious toolchain defect. This is being looked into here: https://bugzilla.redhat.com/show_bug.cgi?id=1544386#c18
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle. Changing version to '28'.
FESCo is trying to make a decision about this issue, and it would help us if we could get answers to a few questions: 0) Should the bug title be updated to state that this affects all 32-bit architectures, or does it only affect i686? 1) Are there any known real-world use cases where problems are detected, or are the problems only noticeable via tests (i.e., is it reproducible under real usage)? 2) Is there any evidence as to whether this issue affects primary architectures? If you feel strongly that this should be dropped in F28, please file a change request for FESCo to review during our next meeting. Thanks!
(In reply to Randy Barlow from comment #8) > FESCo is trying to make a decision about this issue, and it would help us if > we could get answers to a few questions: > > 0) Should the bug title be updated to state that this affects all 32-bit > architectures, or does it only affect i686? It is apparently only on i686 > 1) Are there any known real-world use cases where problems are detected, or > are the problems only noticeable via tests (i.e., is it reproducible under > real usage)? It was discovered by internal cmocka testing. All of the code that is showing this problem is currently upstream (rawhide), and has yet to be tested in the real world by customers. So we really don't know the impact yet, but the tests consistently show a problem on i686. > 2) Is there any evidence as to whether this issue affects primary > architectures? No. > > If you feel strongly that this should be dropped in F28, please file a > change request for FESCo to review during our next meeting. Thanks!
Please file a change for this for Fedora 28.
(In reply to Randy Barlow from comment #10) > Please file a change for this for Fedora 28. This is already a f28 bug, so what exactly needs to be filed?
I believe we need FESCo change filed: https://fedoraproject.org/wiki/Changes/Policy#For_developers
Viktor is correct - we want a Change filed for this as described on the Wiki he linked. This allows us to communicate this to our users (among other things), who would probably only otherwise find this BZ after they are affected.
Thanks I'll get his filed.t
https://fedoraproject.org/wiki/Changes/389-ds-base-remove-686 From what I'm readying this is all I need to do for now. If I am missing something please let me know. Thanks! Mark
Thanks Mark, but we also need to add: - FreeIPA server will not be available on i686 due to this - slapi-nis set of plugins will not be available on i686 due to this - Upgrade of i686 instance of Fedora with FreeIPA server will not be possible without fully uninstalling FreeIPA replica I don't think we should file separate change for other packages (freeipa, slapi-nis).
(In reply to Alexander Bokovoy from comment #16) > Thanks Mark, but we also need to add: > > - FreeIPA server will not be available on i686 due to this > - slapi-nis set of plugins will not be available on i686 due to this > - Upgrade of i686 instance of Fedora with FreeIPA server will not be > possible without fully uninstalling FreeIPA replica > > I don't think we should file separate change for other packages (freeipa, > slapi-nis). All add these points to the description section. Feel free to edit the doc as well (if you can)
Upstream ticket to request review of change request: https://pagure.io/releng/issue/7412
This has been fixed