Bug 1666112 - "beta nag" dialog sometimes does not appear for a long time after Continue button is pressed
Summary: "beta nag" dialog sometimes does not appear for a long time after Continue bu...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 31
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Anaconda Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-14 23:12 UTC by Adam Williamson
Modified: 2020-11-24 15:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 15:19:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2019-01-14 23:12:59 UTC
I've observed some odd failures in openQA in the last few weeks, like this:

https://openqa.stg.fedoraproject.org/tests/465123

The test fails because the 'beta nag' dialog takes a long time to appear after the Continue button is clicked on the language select screen.

openQA is set to wait 60 seconds, but sometimes, it takes longer than this before the dialog eventually appears, by which time openQA has given up waiting, assumed the screen is not going to appear, and gone on to looking for the hub.

While we're waiting, the UI just shows the welcome screen with the spinning 'busy' cursor. Then, eventually, the betanag dialog fades in.

I added some debug logging lines to welcome.py's _on_continue_clicked() to try and figure out what was going on:

    # Override the default in StandaloneSpoke so we can display the beta
    # warning dialog first.
    def _on_continue_clicked(self, window, user_data=None):
        log.info("XXX in welcome _on_continue_clicked")
        # Don't display the betanag dialog if this is the final release or
        # when autostep has been requested as betanag breaks the autostep logic.
        if not isFinal and not self.data.autostep.seen:
            log.info("XXX in welcome occ isfinal conditional 1")
            dlg = self.builder.get_object("betaWarnDialog")
            log.info("XXX in welcome occ isfinal conditional 2")
            with self.main_window.enlightbox(dlg):
                log.info("XXX in welcome occ isfinal conditional 3")
                rc = dlg.run()
                log.info("XXX in welcome occ isfinal conditional 4")
                dlg.hide()
                log.info("XXX in welcome occ isfinal conditional 5")
            if rc != 1:
                ipmi_abort(scripts=self.data.scripts)
                sys.exit(0)

and the anaconda.log messages are rather interesting:

22:07:34,286 DBG ui.common: Entered spoke: WelcomeLanguageSpoke
...
22:09:26,589 INF ui.gui.spokes.welcome: XXX in welcome _on_continue_clicked

and indeed, I was manually monitoring this test, and the 'beta nag' dialog finally actually appeared right around 22:09:27. Per openQA's log, it had clicked the 'Continue' button at 22:07:47.443 . So it took around 1:40 from clicking the 'Continue' button to the _on_continue_clicked() handler actually triggering.

It's a bit difficult to tell what we're actually waiting on for all that time, but one possibility is repository metadata refresh. If we look in the packaging.log file - https://openqa.stg.fedoraproject.org/tests/465123/file/_boot_to_anaconda-packaging.log - we see this:

22:07:35,494 INF packaging: configuring base repo
...
22:09:27,824 DBG packaging: repo updates: _sync_metadata success from https://mirrors.fedoraproject.org/metalink?repo=updates-released-frawhide&arch=x86_64

i.e. repo configuration took a long time, and finally completed right around the time the beta nag dialog appeared. It does clearly complete a second or so *later*, though, which doesn't *quite* fit in with the theory. It's also interesting that anaconda.log has these messages:

22:07:31,871 INF threading: Running Thread: AnaPayloadThread (140485204723456)
22:07:31,871 INF threading: Thread Done: AnaPayloadRestartThread (140485213116160)
22:07:31,872 DBG payload: Updating payload thread state: 0
22:07:31,873 DBG payload: Updating payload thread state: 1
...
22:07:35,494 DBG payload: Updating payload thread state: 2
22:07:35,494 DBG payload: Updating payload thread state: 4
...
22:07:36,174 DBG payload: Updating payload thread state: 5

The '5' state is the last thing we see apart from repeating "setting locale to: en_US.UTF-8" before we *finally* hit _on_continue_clicked a minute and a half later...but interestingly, after the '5' message, I cannot find a message for state "6" *or* a message for state "-1", even though the PayloadManager._run_thread() code looks like it should always proceed from 5 (STATE_GROUP_MD) to either 6 (STATE_FINISHED) or -1 (STATE_ERROR):

        # Gather the group data
        self._setState(self.STATE_GROUP_MD)
        payload.gatherRepoMetadata()
        payload.release()

        # Check if that failed
        if not payload.baseRepo:
            log.error("No base repo configured")
            self._error = self.ERROR_MD
            self._setState(self.STATE_ERROR)
            payload.unsetup()
            return

        # run payload specific post configuration tasks
        payload.postSetup()

        self._setState(self.STATE_FINISHED)

so, that's all a bit mysterious too.

If you're wondering *why* repo refresh takes so long, btw, I think it happens when metalink is out-of-date. Right now the 2019-01-14 compose is done and synced to the infra mirror (which is always the first choice for openQA tests) and many other mirrors, but the metalink - https://mirrors.fedoraproject.org/metalink?repo=fedora-rawhide&arch=x86_64 - is still serving out the checksums for the 2019-01-08 compose, so metadata refresh keeps failing until it finds an out of date mirror with the 2019-01-08 compose still on it. So the likelihood of this bug happening is, if it *is* the metadata refresh we're waiting on, affected by the mirror and metalink state.

Adding to the mysteriousness, I cannot recreate this behaviour in a local VM at all. If I just spin up a VM on my laptop and boot the same ISO - https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190114.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20190114.n.0.iso - then type 'english' in the language box and click 'Continue' (as the openQA test does) on the language select screen, the beta nag dialog appears instantly. I've tried to test if I can get it to block by waiting different amount of times before clicking Continue, but with no luck so far...

So there's a few mysteries here, but the basic behaviour does at least seem to happen quite a lot on openQA...

Comment 1 Vendula Poncova 2019-01-15 14:44:52 UTC
Hi, I am not able to reproduce the issue as well, which makes it super difficult to debug. So I can only offer a few wild guesses.

From anaconda.log:

22:07:40,514 DBG localization: setting locale to: en_US.UTF-8
22:07:40,528 DBG localization: setting locale to: en_US.UTF-8
22:07:40,539 DBG localization: setting locale to: en_US.UTF-8
22:07:40,555 DBG localization: setting locale to: en_US.UTF-8
22:07:40,592 DBG localization: setting locale to: en_US.UTF-8
22:07:40,597 DBG localization: setting locale to: en_US.UTF-8
22:07:40,895 DBG localization: setting locale to: en_US.UTF-8
22:07:41,133 DBG localization: setting locale to: en_US.UTF-8
22:07:42,163 DBG localization: setting locale to: en_US.UTF-8
22:07:42,169 DBG localization: setting locale to: en_US.UTF-8

This part of logs is very interesting, because I have no idea what could repeatedly trigger the function setup_locale in this case. It would be great to find out what causes it.

Also, the busy cursor suggests, that we are probably waiting for an action that is decorated with the timed_action decorator. Based on the code, the action, that could be related to the WelcomeSpoke, is the method on_entry_changed from the class LangLocaleHandler. Could something repeatedly change the text entry of the welcome spoke? I don't know.

Comment 2 Ben Cotton 2019-08-13 16:47:43 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 3 Ben Cotton 2020-11-03 17:01:58 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 4 Ben Cotton 2020-11-24 15:19:30 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.