Bug 1644936

Summary: Installer images fail to reach graphical installer since 20181025.n.0 (Rawhide), 20181031.n.0 (F29)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: anacondaAssignee: Miro Hrončok <mhroncok>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: anaconda-maint-list, awilliam, bkabrda, cagney, cstratak, dmalcolm, dustymabe, harald, hugh, jkonecny, jkysela, jonathan, kellin, kevin, klember, ksinny, lbalhar, mcyprian, mhroncok, mkolman, pbrobinson, peter.hutterer, pviktori, rkuska, robatino, rvykydal, shcherbina.iryna, skumari, tomspur, torsava, vanmeeuwen+fedora, vponcova, vtrefny, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: anaconda-30.16-1 anaconda-29.24.7-2.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-24 04:31:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anaconda.log (from Rawhide 20181030.n.0 openQA test fail)
none
dnf.librepo.log (from Rawhide 20181030.n.0 openQA test fail)
none
packaging.log (from Rawhide 20181030.n.0 openQA test fail)
none
program.log (from Rawhide 20181030.n.0 openQA test fail)
none
storage.log (from Rawhide 20181030.n.0 openQA test fail)
none
syslog (from Rawhide 20181030.n.0 openQA test fail)
none
X.log (from Rawhide 20181030.n.0 openQA test fail)
none
tarball of /var/log (from Rawhide 20181030.n.0 openQA test fail) none

Description Adam Williamson 2018-11-01 01:10:53 UTC
In openQA and local VM testing of current Rawhide and F29 nightly composes, installer images do not boot to the graphical installer. Instead they boot to a black screen with a cursor and nothing more.

So far I've no idea what's causing this. anaconda did not change in the relevant time frame. The most likely suspects are all the things that changed in *both* the Rawhide 20181025.n.0 compose *and* the F29 20181031.n.0 compose, which is these:

alsa-lib (new version 1.1.7-2.fc30 / 1.1.7-2.fc29)
cairo (new version 1.16.0-1.fc30 / 1.16.0-1.fc29)
dracut (new version 049-11.git20181024.fc30 / 049-11.git20181024.fc29)
libarchive (new version 3.3.3-1.fc30 / 3.3.3-1.fc29)
libinput (new version 1.12.2-1.fc30 / 1.12.2-1.fc29)
libtasn1 (new version 4.13-5.fc30 / 4.13-5.fc29)
python3 (new version 3.7.1-1.fc30 / 3.7.1-1.fc29)
rpm (new version 4.14.2.1-2.fc30 / 4.14.2.1-1.fc29)

of those, it's hard to guess which is the culprit, so I just picked one. Will CC all relevant maintainers.

You can download an image that suffers from this bug here, for investigation:

https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20181025.n.0/compose/Server/x86_64/iso/Fedora-Server-netinst-x86_64-Rawhide-20181025.n.0.iso

Comment 1 Adam Williamson 2018-11-01 01:24:09 UTC
Note the text install test boots OK and completes successfully, so this is specific to the *graphical* installer somehow. So far I haven't spotted any smoking guns in the logs.

Comment 2 Adam Williamson 2018-11-01 02:05:52 UTC
Created attachment 1499700 [details]
anaconda.log (from Rawhide 20181030.n.0 openQA test fail)

Comment 3 Adam Williamson 2018-11-01 02:06:25 UTC
Created attachment 1499701 [details]
dnf.librepo.log (from Rawhide 20181030.n.0 openQA test fail)

Comment 4 Adam Williamson 2018-11-01 02:07:00 UTC
Created attachment 1499702 [details]
packaging.log (from Rawhide 20181030.n.0 openQA test fail)

Comment 5 Adam Williamson 2018-11-01 02:07:56 UTC
Created attachment 1499703 [details]
program.log (from Rawhide 20181030.n.0 openQA test fail)

Comment 6 Adam Williamson 2018-11-01 02:08:25 UTC
Created attachment 1499704 [details]
storage.log (from Rawhide 20181030.n.0 openQA test fail)

Comment 7 Adam Williamson 2018-11-01 02:09:10 UTC
Created attachment 1499705 [details]
syslog (from Rawhide 20181030.n.0 openQA test fail)

Comment 8 Adam Williamson 2018-11-01 02:10:35 UTC
Created attachment 1499708 [details]
X.log (from Rawhide 20181030.n.0 openQA test fail)

Comment 9 Adam Williamson 2018-11-01 02:14:26 UTC
Created attachment 1499709 [details]
tarball of /var/log (from Rawhide 20181030.n.0 openQA test fail)

Comment 10 Charalampos Stratakis 2018-11-01 13:24:25 UTC
Adam is there anything we can do at the moment from the python side to help you with that? Or should we wait for more info/investigation?

Comment 11 Adam Williamson 2018-11-01 17:43:06 UTC
Well, I mean, *someone* needs to figure out what's actually going on. Anyone is more than welcome to try. :P It's hard to be more specific until we have some idea of exactly what's getting stuck and where.

I'll probably start swinging a fairly blunt ax at it soon, by doing custom image builds based on 1021.n.0 with one of the 'suspect' packages added at a time until it starts breaking...

Comment 12 Miro Hrončok 2018-11-01 19:10:41 UTC
BTW The Anaconda installer on a Live iso starts fine?

Comment 13 Adam Williamson 2018-11-01 19:13:18 UTC
I don't actually know, as both live images seem to encounter other issues in testing before they try and start anaconda. I'll test locally in a minute.

Comment 14 Adam Williamson 2018-11-01 19:35:12 UTC
Lives are really badly broken and have been since 1014.n.0, apparently - they just don't behave like live images at all. Workstation live boots to gnome-initial-setup, KDE live boots to sddm. I will file this separately, of course.

You can get into a new user session with the Workstation live. Running 'sudo liveinst' from that seems to get stuck during installer init, so that might well be the same problem as this bug.

Comment 15 Adam Williamson 2018-11-01 20:43:53 UTC
So it seems python3 really *is* the culprit. I built a test image locally with lorax from the 20181021.n.0 Rawhide tree; as expected, that boots fine. Then I ran an identical build except for adding a single side repo with python3-3.7.1-1.fc30 in it; that image has the bug, it does not boot to the installer.

So I guess next up is figuring out why Python 3.7.1 breaks anaconda!

Comment 16 Miro Hrončok 2018-11-01 21:11:46 UTC
Was that python3-3.7.0-10.fc30 -> python3-3.7.1-1.fc30,
or       python3-3.7.0-9.fc30  -> python3-3.7.1-1.fc30?

Comment 17 Miro Hrončok 2018-11-01 21:17:17 UTC
Answering my own (lazy) question: https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20181021.n.0/compose/Everything/x86_64/os/Packages/p/ has python3-3.7.0-10.fc30.


If you can share the instructions how to build the iso with a custom built package, I can try to bisect the CPython commit that started this.

Comment 18 Miro Hrončok 2018-11-01 21:21:04 UTC
In the meantime, I've kicked of a 3.7.0-11 scratch build to eliminate "something got changed in the buildroot" problem: https://koji.fedoraproject.org/koji/taskinfo?taskID=30598946

Comment 19 Miro Hrončok 2018-11-01 21:24:44 UTC
You can also grab simple-koji-ci built RPMs of 3.7.1rc1 and 3.7.1rc2 from:

https://src.fedoraproject.org/rpms/python3/pull-request/58
https://src.fedoraproject.org/rpms/python3/pull-request/60

Comment 20 Miro Hrončok 2018-11-01 21:27:46 UTC
(In reply to Miro Hrončok from comment #19)
> You can also grab simple-koji-ci built RPMs of 3.7.1rc1 and 3.7.1rc2 from:
> 
> https://src.fedoraproject.org/rpms/python3/pull-request/58
> https://src.fedoraproject.org/rpms/python3/pull-request/60

Oh, those got garbage collected already, so resubmitted:

rc1 https://koji.fedoraproject.org/koji/taskinfo?taskID=30599047
rc2 https://koji.fedoraproject.org/koji/taskinfo?taskID=30599059

Comment 21 Adam Williamson 2018-11-02 05:53:38 UTC
Building a custom installer image is quite easy actually. Install lorax, create a scratch directory, and do this:

setenforce Permissive
lorax -p Fedora -v Rawhide -r Rawhide -s https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20181021.n.0/compose/Everything/x86_64/os/ -s https://www.happyassassin.net/temp/repo/x86_64/ ./results/

The image winds up as ./results/images/boot.iso . Of course, that command creates an image based on Fedora-Rawhide-20181021.n.0 , with additional packages from https://www.happyassassin.net/temp/repo/x86_64/ . Create your own side repo and change the base as you desire. (I think you can use a local side repo too, but I've just always used a remote one as I have a pet web server to use anyway). Each time you want to re-do this, you have to delete results/ , lorax will bail if the result directory exists when you run it.

If you don't get to it first, I'll try some more custom images tomorrow.

Comment 22 Miro Hrončok 2018-11-02 08:27:18 UTC
Successfully built an rc1 iso.

I'll take this.

Comment 23 Miro Hrončok 2018-11-02 08:33:50 UTC
3.7.0rc1 boots into Anaconda (in GNOME Boxes).

Comment 24 Miro Hrončok 2018-11-02 09:01:20 UTC
3.7.0rc2 boots into black screen with a blinking _ cursor (shows Anaconda starting text and then this). I'll give it a couple more minutes (maybe it just takes a while to show up), but it seems long enough.

Comment 25 Miro Hrončok 2018-11-02 12:59:57 UTC
3b699932e5ac3e76031bbb6d700fbea07492641d is the first new commit
commit 3b699932e5ac3e76031bbb6d700fbea07492641d
Author: Miss Islington (bot) <31488909+miss-islington.github.com>
Date:   Sun Oct 7 00:09:59 2018 -0700

    bpo-6721: Hold logging locks across fork() (GH-4071) (#9291)
    
    bpo-6721: When os.fork() was called while another thread holds a logging lock, the child process may deadlock when it tries to log.  This fixes that by acquiring all logging locks before fork and releasing them afterwards.
    
    A regression test that fails before this change is included.
    
    Within the new unittest itself: There is a small _potential_ due to mixing of fork and a thread in the child process if the parent's thread happened to hold a non-reentrant library call lock (malloc?) when the os.fork() happens.  buildbots and time will tell if this actually manifests itself in this test or not.  :/  A functionality test that avoids that would be a challenge.
    
    An alternate test that isn't trying to produce the deadlock itself but just checking that the release and acquire calls are made would be the next best alternative if so.
    (cherry picked from commit 19003841e965bbf56fd06824d6093620c1b66f9e)
    
    Co-authored-by: Gregory P. Smith <greg> [Google]

:040000 040000 54bff49aed96e3c886b25711c9313a44f0c209c1 a5268f5ecd2b53aa62dc49a10637483cd629cc87 M	Lib
:040000 040000 de53a0eb6d67653599a6b0ce28f782112a257911 b077206b973a7ce6b696185c939d5f2739c42536 M	Misc


https://bugs.python.org/issue6721
https://github.com/python/cpython/commit/3b699932e5ac3e76031bbb6d700fbea07492641d
https://github.com/python/cpython/pull/4071

Comment 26 Miro Hrončok 2018-11-02 13:43:02 UTC
I have a 3.7.1 built with this commit reverted, however gnome boxes won't work any more.

So here it is again: https://koji.fedoraproject.org/koji/taskinfo?taskID=30608435


I wonder what Anaconda does that this affects it.

Comment 27 Adam Williamson 2018-11-02 15:31:29 UTC
Thanks a lot! I'm building a test image with that Python build now.

(I use virt-manager rather than Boxes, FWIW. It's a bit of a better fit for testing work.)

Comment 28 Miro Hrončok 2018-11-02 15:52:10 UTC
virt-manager works for me!

However my latest built iso is from 3b699932e and I cannot rebuild it ATM with lorax, because I am on train and the internet is too slow (and lorax seems to always download, no cache etc.).

Hope you'll get lucky with that iso. If not, I can try again once I get more stable connection.

Anyway, even if it works with it, I don't feel like reverting that commit is a "solution", more like an immediate workaround. How urgent is this, should we do it (assuming it works) and investigate later, or do we have time to investigate first?

Comment 29 Adam Williamson 2018-11-02 21:30:08 UTC
Confirmed a test image built with the Python 3 build linked above (https://koji.fedoraproject.org/koji/taskinfo?taskID=30608435 ) boots fine.

I would really like the revert for now, please - this is basically a showstopper bug for Rawhide testing, and we are *REALLY* trying to keep Rawhide basically functional from day to day. The problem if we leave a bug like this in is that none of the tests will get past the installer and so we won't have any idea what *other* bugs are creeping in while this one is being fixed; by the time this one is fixed, maybe ten other bugs have backed up "behind" it.

Of course, we can still treat investigating this issue as urgent, I'd just really like us not to leave Rawhide broken while we do that.

Thanks!

Comment 30 Miro Hrončok 2018-11-04 18:29:34 UTC
https://src.fedoraproject.org/rpms/python3/pull-request/63

I'll wait for the CI build, I'll rebuild an iso with the result, retest and push and build if it works.

Let me know about Fedora 29, it seems that it is not needed there (yet? ever?).

Comment 31 Adam Williamson 2018-11-04 23:00:43 UTC
We will need it for 29 too, as we compose regularly for Atomic. Those composes are already broken by this, since 3.7.1 update was pushed stable.

Comment 32 Martin Kolman 2018-11-05 18:19:36 UTC
Looks like there might be more cases either related or similar:

bug 1644936
- also stuck at UI startup
- fails both in GUI *and TUI*

bug 1646568
- Initial Setup on F29 IoT edition blocks boot
- UI startup is the last thing in logs but
  no UI can be seen

Comment 33 Fedora Update System 2018-11-05 20:21:42 UTC
python3-3.7.1-2.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-230c3c19f5

Comment 34 Miro Hrončok 2018-11-05 21:47:39 UTC
I can no longer work on this any more (I am sick).

Reassigning back to the default.

It seems like the logging antideadlock commit actually might have introduced some deadlock. I suggest we try a reproducer that doesn't involve building isos and we bring this to upstream.

Comment 35 Adam Williamson 2018-11-05 23:00:50 UTC
well, I think you can reproduce by just trying to run anaconda from a live image, if the live image doesn't have the python3 with the revert.

I suppose it may be possible to reproduce by just running initial-setup on an installed system with the non-reverted python3, too? I'll try and check on that soon.

Comment 36 Miro Hrončok 2018-11-06 20:11:39 UTC
FWIW anaconda does it's own logging + locks so I'll look in there:

https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/anaconda_logging.py

Comment 37 Fedora Update System 2018-11-06 22:01:28 UTC
python3-3.7.1-2.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-230c3c19f5

Comment 38 Adam Williamson 2018-11-08 19:47:17 UTC
Here's an odd thing: F29 updates-testing composes are still failing in what appears to be the same way, even though they include the python3-3.7.1-2.fc29 build. I'm not sure why this is, yet.

https://openqa.fedoraproject.org/tests/306219

This log seems to show python3-3.7.1-2.fc29 is in the installer build:

https://kojipkgs.fedoraproject.org/compose/updates/Fedora-29-updates-testing-20181108.0/logs/x86_64/AtomicHost/ostree_installer-3/lorax.log

really not sure what's going on there yet.

Comment 39 Fedora Update System 2018-11-09 06:02:34 UTC
python3-3.7.1-2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 40 Sinny Kumari 2018-11-09 06:31:22 UTC
(In reply to Adam Williamson from comment #38)
> Here's an odd thing: F29 updates-testing composes are still failing in what
> appears to be the same way, even though they include the
> python3-3.7.1-2.fc29 build. I'm not sure why this is, yet.
> 
> https://openqa.fedoraproject.org/tests/306219
> 
> This log seems to show python3-3.7.1-2.fc29 is in the installer build:
> 
> https://kojipkgs.fedoraproject.org/compose/updates/Fedora-29-updates-testing-
> 20181108.0/logs/x86_64/AtomicHost/ostree_installer-3/lorax.log
> 
> really not sure what's going on there yet.

openqa test are fine now for AH ISO generated from 20181109 for both Fedora-29-updates (https://openqa.fedoraproject.org/tests/306830) and Fedora-29-updates-testing (https://openqa.fedoraproject.org/tests/306833).

I think reason of openqa test failure in previous fedora-29-updates-testing could be the Python version available in the buildroot.

If we look at root.log from runroot task for AH ISO from fedora-29-updates-tetsing compose 20181108 (https://kojipkgs.fedoraproject.org//work/tasks/7628/30747628/root.log contains python3-3.7.1-1.fc29) and 20181109 (https://kojipkgs.fedoraproject.org//work/tasks/7628/30747628/root.log contains python3-3.7.1-2.fc29) in buildroot becasue python3-3.7.1-2.fc29 was pushed to stable later before 20181109 compose run.

Comment 41 Adam Williamson 2018-11-09 06:50:38 UTC
hmm, still, it's odd that the actual installer build log shows the newer python...oh well, if it's working now, that's fine.

Comment 42 Sinny Kumari 2018-11-09 07:01:35 UTC
Yeah, even I was wondering if we are providing latest updates-testing repo (which should contain latest Python3 from updates-testing) to lorax for composing ISO, then why was the problem. So, from my debugging I can only think of having older Python3 in the buildroot affecting somehow.

Comment 43 Miro Hrončok 2018-11-09 09:51:44 UTC
OK. I've reset the urgency and removed the blocker status.

This is now "solved" however in a most ugly fashion.

Hence, keeping open.

Comment 44 Lumír Balhar 2018-11-14 08:16:12 UTC
I am trying to investigate it.

I am working with the latest live Workstation image and I am able to reproduce the problem, which means that:

- anaconda works properly with the fixed version of Python and
- anaconda GUI doesn't start when I use the "broken Python"

It seems that `anaconda` command runs in the background as `/usr/bin/python3 /sbin/anaconda --liveinst --method=livecd:/dev/mapper/live-base`.
Interesting is that this command works with both Python versions and I see only one difference between `anaconda` and the longer command - `anaconda` reboots a computer when quits but the longer command doesn't.

Is there any other difference between commands:
- anaconda
- /usr/bin/python3 /sbin/anaconda --liveinst --method=livecd:/dev/mapper/live-base
?

Comment 45 Lumír Balhar 2018-11-14 09:10:40 UTC
I've tested it again and it seems that there is a difference between commands `anaconda` and `/usr/bin/python3 /sbin/anaconda` because the later one works with both Python versions and the former one not.

Comment 46 Lumír Balhar 2018-11-15 08:42:02 UTC
After some more testing, I can say that there is no difference between executing `anaconda` and `/usr/bin/python3 /sbin/anaconda` but the behavior of anaconda itself isn't deterministic.

I did more than ten tests (all with the newest Python) and in a few of them, GUI started. What happened every time is that the anaconda window was registered into Gnome - it was visible in app switcher but the window itself was blank (transparent).

I don't know anaconda well enough to debug this issue more but I am willing to help anybody.

Comment 47 Miro Hrončok 2018-11-15 09:53:23 UTC
Anaconda team. Please help a fellow snake team.

We reverted an upstream commit in Python 3.7 to make the installer work. But we don't want to carry that revert patch forever.

What is that anaconda does, that this commit renders it broken?

Comment 48 Martin Kolman 2018-11-16 11:19:04 UTC
(In reply to Miro Hrončok from comment #47)
> Anaconda team. Please help a fellow snake team.
> 
> We reverted an upstream commit in Python 3.7 to make the installer work. But
> we don't want to carry that revert patch forever.
> 
> What is that anaconda does, that this commit renders it broken?
Looking at the Python commit I really suspects it deadlocks the Anaconda logging code, eq. a log() somewhere during Anaconda startup never returns due to the Python logging machinery getting stuck on something.

IIRC we already had something similar in Blivet some time ago due to lazy log evaluation:

log.debug("doing %s, self.foo)

Where foo was a property that when evaluated resulted in another attempt to log which deadlocked the whole thing. I'm not sure if it Blivet has a custom logging lock or it was something in Python itself causing the issue. Adding Vojta to CC to elaborate.

Also the Python commit mentions process forking so I'm wondering if the issue might be possibly related to Anaconda/Blivet or another library calling some utility with Subprocess & associated logging. Especially Blivet calls a lot of utilities during the startup phase when it is scanning available storage, which could easily trigger this if it is actually a race condition.

Comment 49 Vendula Poncova 2018-12-11 17:44:30 UTC
Anaconda freezes during import of modules, while we hold the imp's lock:

https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/core/util.py#L1359

I wanted to replace imp with importlib, but it is not so easy. However,
it seems to be enough to just remove the locking for now:

https://github.com/rhinstaller/anaconda/pull/1721

Comment 50 Vendula Poncova 2019-01-02 12:37:59 UTC
Hi Miro, is the suggested solution acceptable for the Python team, or should I investigate it further?

Comment 51 Charalampos Stratakis 2019-01-02 12:44:30 UTC
(In reply to Vendula Poncova from comment #50)
> Hi Miro, is the suggested solution acceptable for the Python team, or should
> I investigate it further?

Is it confirmed that python, without having the commit reverted, in conjunction with the locking removal, fixes the issue?

If so then I'd say it's acceptable.

Comment 52 Vendula Poncova 2019-01-04 14:23:05 UTC
(In reply to Charalampos Stratakis from comment #51)
> Is it confirmed that python, without having the commit reverted, in
> conjunction with the locking removal, fixes the issue?

I was testing anaconda with python3-3.7.1-1.fc30 and I wasn't able to reproduce the issue with the removed locking.

Comment 53 Miro Hrončok 2019-01-04 15:04:33 UTC
That anaconda is in the repos or just locally built?


I'll prep a python3 PR to remove the patch.

Comment 55 Miro Hrončok 2019-01-07 10:54:11 UTC
Here's a build to test it with: https://koji.fedoraproject.org/koji/taskinfo?taskID=31819177

I can do the test, but please answer my question: Does anaconda in rawhide contain the fix?

Thanks

Comment 56 Vendula Poncova 2019-01-07 11:38:08 UTC
(In reply to Miro Hrončok from comment #55)
> Here's a build to test it with:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=31819177

Ok, I will test it. Thanks!
 
> I can do the test, but please answer my question: Does anaconda in rawhide
> contain the fix?

The fix will be part of anaconda-30.16-1. We will release it this week.

Comment 57 Vendula Poncova 2019-01-07 14:08:41 UTC
(In reply to Vendula Poncova from comment #56)
> (In reply to Miro Hrončok from comment #55)
> > Here's a build to test it with:
> > https://koji.fedoraproject.org/koji/taskinfo?taskID=31819177
> 
> Ok, I will test it. Thanks!

I can confirm that Anaconda works with the fix and python3-3.7.2-2.fc30 from the build.

Comment 58 Miro Hrončok 2019-01-07 14:11:41 UTC
Thanks! Please report here once anaconda-30.16-1 is built in rawhide.

Will the fix go to Fedora 29 as well?

Comment 59 Adam Williamson 2019-01-07 16:05:19 UTC
Please don't build/tag 30.16 until https://bugzilla.redhat.com/show_bug.cgi?id=1663585 is sorted out, because it will break composes again.

Comment 60 Martin Kolman 2019-01-07 16:54:29 UTC
(In reply to Miro Hrončok from comment #58)
> Thanks! Please report here once anaconda-30.16-1 is built in rawhide.
> 
> Will the fix go to Fedora 29 as well?
We normally don't update Anaconda post Fedora GA & there are no install image respins post GA, so no.

(In reply to Adam Williamson from comment #59)
> Please don't build/tag 30.16 until
> https://bugzilla.redhat.com/show_bug.cgi?id=1663585 is sorted out, because
> it will break composes again.
I currently plan to build anaconda-30.16-1 tomorrow with fix for this bug and bug 1663585.

Comment 61 Peter Robinson 2019-01-07 16:58:02 UTC
(In reply to Martin Kolman from comment #60)
> (In reply to Miro Hrončok from comment #58)
> > Thanks! Please report here once anaconda-30.16-1 is built in rawhide.
> > 
> > Will the fix go to Fedora 29 as well?
> We normally don't update Anaconda post Fedora GA & there are no install
> image respins post GA, so no.

That's not entirely true. Atomic do two weekly releases, and the community respins a bunch of stuff even if it's not official.

Comment 62 Adam Williamson 2019-01-07 16:58:48 UTC
"there are no install image respins post GA, so no."

this is no longer strictly true; the two-week Atomic builds include an Atomic installer image. I'm not sure whether they build the image itself from post-release updates, though, or just use a *payload* generated from post-release packages but keep the *installer itself* built from the frozen release tree.

Comment 63 Miro Hrončok 2019-01-07 16:59:38 UTC
I'd rather drop that patch from Fedora 29 as well, can you make an exception?

Comment 64 Martin Kolman 2019-01-07 18:21:15 UTC
(In reply to Peter Robinson from comment #61)
> (In reply to Martin Kolman from comment #60)
> > (In reply to Miro Hrončok from comment #58)
> > > Thanks! Please report here once anaconda-30.16-1 is built in rawhide.
> > > 
> > > Will the fix go to Fedora 29 as well?
> > We normally don't update Anaconda post Fedora GA & there are no install
> > image respins post GA, so no.
> 
> That's not entirely true. Atomic do two weekly releases, and the community
> respins a bunch of stuff even if it's not official.

(In reply to Adam Williamson from comment #62)
> "there are no install image respins post GA, so no."
> 
> this is no longer strictly true; the two-week Atomic builds include an
> Atomic installer image. I'm not sure whether they build the image itself
> from post-release updates, though, or just use a *payload* generated from
> post-release packages but keep the *installer itself* built from the frozen
> release tree.

You are correct, I just didn't want to go into unnecessary details. We already did a couple post GA builds in the past for Atomic or recently for the IoT edition. But in all cases this was more or less exception due to the blocking/urgent nature of the fixes contained in the builds. We certainly do not mind fixing urgent issues in this way, especially if the fixes are minor and not very invasive.

If more is expected from the Anaconda team post GA it should documented somewhere so that we can take it into account during planing and possibly others might have to do so as well (Fedora QA ?). We already maintain 3 to 4 (2 RHEL & 1-2 Fedora versions) Anaconda versions in parallel, so possibly adding one more would be something to take into account.

In this concrete case if there is a good reason for the Python team to drop the workaround patch from Python for F29, then we could backport the Anaconda fix to F29 as well. Still I would recommend waiting a bit after the Anaconda fix has been in Rawhide for a bit in case some subtle regressions show up.

Comment 65 Adam Williamson 2019-01-07 18:31:51 UTC
I don't think anything "more" is expected, but as this is a potentially installer-breaking bug, obviously if the Python change is backported to F29, we'd want the fix for this backported too :) It certainly seems to count as an "urgent issue".

Comment 66 Miro Hrončok 2019-01-07 22:23:40 UTC
Why backport the fix to F29 and drop the patch form Python:

1. the reverted commit is a bugfix. somebody might get hit by this bug. if a fix for anaconda exists, I rather have that fix applied than carrying a potentially dangerous workaround by having this bugfix not applied.
2. we advertise Fedora as a platform for using various Python versions to test user's code - if users test on our 3.7 and it works, they might be surprised if it doesn't work somewhere else.
3. as much careful as we try to be, we might accidentally backport the commit that removes the workaround
4. Fedora 29 and master are still in sync git wise and I'd like to keep it that way if possible


I don't think many users will actually be affected by the bug the reverted commit fixes, so the severity of carrying that patch is not huge. The other two points are trivial.

I would not argue for this if the fix for Fedora 29 was particularly nasty or complicated. However this shall be simple backport of one very small commit.

I think that backporting a bugfix in one application (even an important one like anaconda) beats reverting a bugfix in a language interpreter. However that's juts my view and I'm awfully biased here :D

Comment 67 Martin Kolman 2019-01-08 11:34:12 UTC
Sounds like good reasons to backport the Anaconda fix. :)

Still, would suggest to do it after it is in Rawhide for a while, so maybe sometime next week ?

Comment 68 Miro Hrončok 2019-01-08 12:18:24 UTC
Definitively.

Comment 69 Martin Kolman 2019-01-08 12:58:05 UTC
anaconda-30.16-1 has been built with the Python fix and fix for bug 1663585:

https://koji.fedoraproject.org/koji/taskinfo?taskID=31892395

Comment 70 Miro Hrončok 2019-01-16 08:15:09 UTC
Now is the time I'd really love to fast forward the f29 branch. Can we have this bug fixed in anaconda and do a joined bodhi update please?

Comment 71 Martin Kolman 2019-01-16 13:52:32 UTC
(In reply to Miro Hrončok from comment #70)
> Now is the time I'd really love to fast forward the f29 branch. Can we have
> this bug fixed in anaconda and do a joined bodhi update please?

Sure, this is the F29 Anaconda build with the patch from Rawhide backported:
https://koji.fedoraproject.org/koji/taskinfo?taskID=32066291

As the Bodhi update will have to contain packages from 2 different groups of maintainers I think a proven package will have to create it.

Comment 72 Miro Hrončok 2019-01-16 14:56:53 UTC
Thank you. I'm in the provenpackagers group.

Comment 73 Fedora Update System 2019-01-16 21:10:26 UTC
anaconda-29.24.7-2.fc29 python3-3.7.2-4.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-00870e8bfc

Comment 74 Fedora Update System 2019-01-17 02:11:33 UTC
anaconda-29.24.7-2.fc29, python3-3.7.2-4.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-00870e8bfc

Comment 75 Miro Hrončok 2019-01-17 09:36:14 UTC
Adam, Martin, Vendula, Lumír, Sinny - Thank You all!

Comment 76 Fedora Update System 2019-01-24 04:31:01 UTC
anaconda-29.24.7-2.fc29, python3-3.7.2-4.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 77 D. Hugh Redelmeier 2019-04-02 22:37:54 UTC
This might be relevant https://bugzilla.redhat.com/show_bug.cgi?id=1691434

Some things deadlock in python3*-3.7.2-4.fc29.x86_64
that worked in python3*-3.7.1-4.fc29.x86_64

These don't involve anaconda.

What is "the python fix" mentioned in comment #69?
Is it https://github.com/python/cpython/commit/3b699932e5ac3e76031bbb6d700fbea07492641d ?  (That seems to cause problems.)
Or is it reverting that commit? (That seems to fix problems.)
Or something else?

Comment 78 Miro Hrončok 2019-04-03 00:15:44 UTC
The "the python fix" mentioned in comment #69 is reverting the reverting of that commit.

Comment 79 Cagney 2019-04-04 15:01:49 UTC
@DHR

The "fix" removed a lock:

- Don't acquire the imp's lock (#1644936) (vponcova)


+diff --git a/pyanaconda/core/util.py b/pyanaconda/core/util.py
+index 6106671c8..084c32da9 100644
+--- a/pyanaconda/core/util.py
++++ b/pyanaconda/core/util.py
+@@ -1354,7 +1354,6 @@ def collect(module_pattern, path, pred):
+         module_path = None
+ 
+         try:
+-            imp.acquire_lock()
+             (fo, module_path, module_flags) = imp.find_module(mod_name, [path])
+             module = sys.modules.get(module_pattern % mod_name)
+ 
+@@ -1427,8 +1426,6 @@ def collect(module_pattern, path, pred):
+             log.error("Failed to import module %s from path %s in collect: %s", mod_name, module_path, imperr)
+             continue
+         finally:
+-            imp.release_lock()
+-
+             if mod_info and mod_info[0]:  # pylint: disable=unsubscriptable-object
+                 mod_info[0].close()  # pylint: disable=unsubscriptable-object
+