Bug 1530832 - 389-ds-base removing i686 support from 1.4.x
Summary: 389-ds-base removing i686 support from 1.4.x
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: 389-ds-base
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: mreynolds
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: FE-ExcludeArch-x86, F-ExcludeArch-x86
TreeView+ depends on / blocked
 
Reported: 2018-01-04 00:00 UTC by wibrown@redhat.com
Modified: 2018-06-14 15:55 UTC (History)
10 users (show)

Fixed In Version: 389-ds-base-1.4.0.10-2.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-14 15:55:53 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1544386 None CLOSED freeipa-server removing i686 arch support in F28 2018-11-27 13:19:22 UTC

Internal Links: 1544386

Description wibrown@redhat.com 2018-01-04 00:00:17 UTC
Description of problem:
389-ds project have found an issue which causes system instability on all versions of 1.3.x and 1.4.x of the server on i686 platform. This is a hardware limitation of the platform related to how we consume atomic types. This may lead to thread unsafety and other issues.

The team have decided that since there have been "no reports" of issues, we would rather remove i686 support than invest time in repairing an unused platform. 

As a result, we will be removing i686 support from 1.4.x. This will affect f28 and current rawhide.

Comment 1 Christian Heimes 2018-02-12 10:32:44 UTC
The excluded arch will also affect freeIPA. As a consequence, freeIPAA is going to drop i686 arch support for all server packages:

* freeipa-server
* python2-ipaserver
* python3-ipaserver
* freeipa-server-common
* freeipa-server-dns
* freeipa-server-trust-ad

The change does **not** affect client support. freeipa-client will still support i686. The freeIPA upstream ticket is https://pagure.io/freeipa/issue/7400.

Comment 2 Alexander Bokovoy 2018-02-15 20:28:43 UTC
The following comment is copied from the bug 1544386 as this one is the right place to discuss, IMHO. William, could you please respond when you are available?

(In reply to Carlos O'Donell from comment #9)
> (In reply to Christian Heimes from comment #8)
> > Mark,
> > 
> > I was going through William's mails and noticed that only i686 is affected.
> > 
> > ~~~~~~~~~~~~~~~~
> > William wrote in reply to this question:
> > > Is it only about i686 or problem is in all 32 bit architectures?
> > > Fedora also has "armv7hl"
> > 
> > i686 only.
> > ~~~~~~~~~~~~~~~~
> > 
> > We can lift the restriction and only exclude ix86 architecture. 32bit ARMv7
> > should be fine.
> 
> I'm surprised it's an i686 issue.
> 
> I just double checked 32-bit i686, s390, and ppc.
> 
> On i686 the compiler effectively generates 'lock cmpxchg8b' to work with the
> 64-bit values atomically, no problems there.
> 
> On s390 the compiler effectively generates 'cds' (compare double and swap)
> to work with the 64-bit values atomically, no problems there.
> 
> On ppc the compiler generates out-of-line calls to
> __atomic_compare_exchange_8 and the like and call into libatomic (part of
> gcc), so you have to support linking that in for 32-bit ppc.
> 
> There is a lot of use of __atomic_* operations in the code, along with the
> potential to use NSPR's PR_Atomic* operations also, so one has to know
> exactly which was selected at build time.
> 
> However, I note that slapi_counter.c has at least some instances of relaxed
> memory ordering which will loose counter counts, so it has to be a rough
> estimate of count, but if you fall back to the pthread operations, the
> counts will be accurate. No comments seem to mention this in the API for
> slapi_counter_add, slapi_counter_subtract, slapi_counter_set_value, or
> slapi_counter_get_value.
> 
> Likewise lfds711_porting_abstraction_layer_compiler.h uses relaxed memory
> ordering which could lead to multiple threads succeeding at
> LFDS711_PAL_ATOMIC_CAS (depending on the version used), but there the user
> in lfds711_freelist_push seems to couple this with load and store barriers
> to create synchronization points with other threads to be able to see their
> stores.  So this looks more correct than the slapi_counter.c functions.
> 
> However, vattr_map_entry_free certainly looks like it could have double-free
> calls to slapi_ch_free by using relaxed MO, and this looks like it could
> result in double calls to free() in glibc, which is undefined behaviour.
> This would need more investigation.
> 
> Overall there are a lot of suspicious uses of relaxed MO in the code which
> might cause problems.
> 
> I'm curious to hear what problems arose on i686 and what their root cause
> was.

Comment 3 Peter Robinson 2018-02-16 12:07:40 UTC
> However, I note that slapi_counter.c has at least some instances of relaxed
> memory ordering which will loose counter counts, so it has to be a rough
> estimate of count, but if you fall back to the pthread operations, the
> counts will be accurate. No comments seem to mention this in the API for
> slapi_counter_add, slapi_counter_subtract, slapi_counter_set_value, or
> slapi_counter_get_value.
> 
> Likewise lfds711_porting_abstraction_layer_compiler.h uses relaxed memory
> ordering which could lead to multiple threads succeeding at
> LFDS711_PAL_ATOMIC_CAS (depending on the version used), but there the user
> in lfds711_freelist_push seems to couple this with load and store barriers
> to create synchronization points with other threads to be able to see their
> stores.  So this looks more correct than the slapi_counter.c functions.
> 
> However, vattr_map_entry_free certainly looks like it could have double-free
> calls to slapi_ch_free by using relaxed MO, and this looks like it could
> result in double calls to free() in glibc, which is undefined behaviour.
> This would need more investigation.
> 
> Overall there are a lot of suspicious uses of relaxed MO in the code which
> might cause problems.

Carlos: Are these i686 specific issues?

Comment 4 Carlos O'Donell 2018-02-19 06:01:24 UTC
(In reply to Peter Robinson from comment #3)
> > Overall there are a lot of suspicious uses of relaxed MO in the code which
> > might cause problems.
> 
> Carlos: Are these i686 specific issues?

No, they are not specific to i686. These issues can affect all arches. I would strongly suggest all removal of direct atomic manipulations, and instead simply use pthread locks until it can be shown they are too expensive and that bespoke lockless algorithms need to be used. Granted that trivial uses of atomics are OK, but there are some rather complex uses of relaxed MO which seem entirely incorrect.

Comment 5 Carlos O'Donell 2018-02-19 06:03:19 UTC
(In reply to Carlos O'Donell from comment #4)
> (In reply to Peter Robinson from comment #3)
> > > Overall there are a lot of suspicious uses of relaxed MO in the code which
> > > might cause problems.
> > 
> > Carlos: Are these i686 specific issues?
> 
> No, they are not specific to i686. These issues can affect all arches. I
> would strongly suggest all removal of direct atomic manipulations, and
> instead simply use pthread locks until it can be shown they are too
> expensive and that bespoke lockless algorithms need to be used. Granted that
> trivial uses of atomics are OK, but there are some rather complex uses of
> relaxed MO which seem entirely incorrect.

... I just want to reiterate that these are not my decisions to take though. The package maintainer is responsible for these decisions. All I can do is review the code and offer advice and support.

I am looking into the claims that 64-bit atomics do not work on i686 though since this would be a serious toolchain defect. This is being looked into here:
https://bugzilla.redhat.com/show_bug.cgi?id=1544386#c18

Comment 6 Fedora End Of Life 2018-02-20 15:21:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 8 Randy Barlow 2018-03-05 14:05:59 UTC
FESCo is trying to make a decision about this issue, and it would help us if we could get answers to a few questions:

0) Should the bug title be updated to state that this affects all 32-bit architectures, or does it only affect i686?
1) Are there any known real-world use cases where problems are detected, or are the problems only noticeable via tests (i.e., is it reproducible under real usage)?
2) Is there any evidence as to whether this issue affects primary architectures?

If you feel strongly that this should be dropped in F28, please file a change request for FESCo to review during our next meeting. Thanks!

Comment 9 mreynolds 2018-03-05 14:25:35 UTC
(In reply to Randy Barlow from comment #8)
> FESCo is trying to make a decision about this issue, and it would help us if
> we could get answers to a few questions:
> 
> 0) Should the bug title be updated to state that this affects all 32-bit
> architectures, or does it only affect i686?

It is apparently only on i686

> 1) Are there any known real-world use cases where problems are detected, or
> are the problems only noticeable via tests (i.e., is it reproducible under
> real usage)?

It was discovered by internal cmocka testing.  All of the code that is showing this problem is currently upstream (rawhide), and has yet to be tested in the real world by customers.  So we really don't know the impact yet, but the tests consistently show a problem on i686.

> 2) Is there any evidence as to whether this issue affects primary
> architectures?

No.

> 
> If you feel strongly that this should be dropped in F28, please file a
> change request for FESCo to review during our next meeting. Thanks!

Comment 10 Randy Barlow 2018-03-26 02:34:55 UTC
Please file a change for this for Fedora 28.

Comment 11 mreynolds 2018-03-26 13:23:36 UTC
(In reply to Randy Barlow from comment #10)
> Please file a change for this for Fedora 28.

This is already a f28 bug, so what exactly needs to be filed?

Comment 12 Viktor Ashirov 2018-03-26 13:37:50 UTC
I believe we need FESCo change filed:

https://fedoraproject.org/wiki/Changes/Policy#For_developers

Comment 13 Randy Barlow 2018-03-26 13:46:34 UTC
Viktor is correct - we want a Change filed for this as described on the Wiki he linked. This allows us to communicate this to our users (among other things), who would probably only otherwise find this BZ after they are affected.

Comment 14 mreynolds 2018-03-26 13:51:26 UTC
Thanks I'll get his filed.t

Comment 15 mreynolds 2018-03-26 14:15:24 UTC
https://fedoraproject.org/wiki/Changes/389-ds-base-remove-686

From what I'm readying this is all I need to do for now.  If I am missing something please let me know.

Thanks!
Mark

Comment 16 Alexander Bokovoy 2018-03-26 14:23:41 UTC
Thanks Mark, but we also need to add:

 - FreeIPA server will not be available on i686 due to this
 - slapi-nis set of plugins will not be available on i686 due to this
 - Upgrade of i686 instance of Fedora with FreeIPA server will not be possible without fully uninstalling FreeIPA replica

I don't think we should file separate change for other packages (freeipa, slapi-nis).

Comment 17 mreynolds 2018-03-26 14:34:11 UTC
(In reply to Alexander Bokovoy from comment #16)
> Thanks Mark, but we also need to add:
> 
>  - FreeIPA server will not be available on i686 due to this
>  - slapi-nis set of plugins will not be available on i686 due to this
>  - Upgrade of i686 instance of Fedora with FreeIPA server will not be
> possible without fully uninstalling FreeIPA replica
> 
> I don't think we should file separate change for other packages (freeipa,
> slapi-nis).

All add these points to the description section.  Feel free to edit the doc as well (if you can)

Comment 18 mreynolds 2018-03-27 12:56:37 UTC
Upstream ticket to request review of change request:

https://pagure.io/releng/issue/7412

Comment 19 mreynolds 2018-06-14 15:55:53 UTC
This has been fixed


Note You need to log in before you can comment on or make changes to this bug.