952661 – [RFE] Provide a better failure message when Anaconda prompts for user input

Bug 952661 - [RFE] Provide a better failure message when Anaconda prompts for user input

Summary: [RFE] Provide a better failure message when Anaconda prompts for user input

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Beaker
Classification:	Retired
Component:	lab controller
Sub Component:
Version:	0.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	0.15.3
Assignee:	Dan Callaghan
QA Contact:	tools-bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	980357 (view as bug list)
Depends On:	1054035
Blocks:
TreeView+	depends on / blocked

Reported:	2013-04-16 11:35 UTC by Petr Beňas
Modified:	2018-02-06 00:41 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-02-03 04:51:33 UTC
Embargoed:

Attachments	(Terms of Use)

Comment 4 Nick Coghlan 2013-08-28 02:36:24 UTC

Even for kickstart installs, Anaconda will sometimes stop and prompt for user input on the serial console.

Similar behaviour is also seen if Anaconda drops into an interactive install for some reason (it asks the user to choose a language).

In Beaker, both of these appear as an External Watchdog Timeout when the install gets aborted.

Similar to the panic detection, it may be desirable to monitor the console log for output that looks like an installer prompt, and provide that information in the main UI (e.g. as a task result for /distribution/install) rather than requiring users to go look at the console log to determine that an anaconda prompt was triggered.

Comment 6 Dan Callaghan 2014-01-15 04:00:29 UTC

Does anybody have some sample output from Anaconda prompting during installation? The jobs from comment 0 have unfortunately expired.

I will need to devise some good heuristics for finding Anaconda errors/prompts. I suspect we can look for either the ASCII art or the wording about F12, which should be fairly consistent for curses installs up to RHEL6. For RHEL7 I'm less sure what we can look for. And we will also need to handle cmdline installs, such as on S/390. So I will need to collect as many example outputs as I can...

Comment 7 Dan Callaghan 2014-01-15 06:52:06 UTC

Another similar failure mode we may be able to detect here is when the installation completes but the harness was not installed, in which case the system sits there doing nothing until EWD.

Comment 8 Dan Callaghan 2014-01-15 08:08:02 UTC

I've assembled a collection of console logs from as many different Anaconda failure scenarios as I can think of:

http://fedorapeople.org/~dcallagh/bz952661-anaconda-failures/

The tl;dr is that there is very little in common between them all.

My first idea of looking for "<F12> next screen" is not going to work because that appears under normal circumstances too. We had the idea of looking for "<F12> next screen" followed by a pause of >5 minutes in output, to detect Anaconda displaying a screen and waiting, but the problem with that approach is that there are some circumstances (such as creating a filesystem on a very large volume) where there is no output for many minutes, and we don't want to detect that as an error.

Given the huge variety of different outputs from the various Anaconda versions (and all the various mangling from the serial consoles and Beaker's control char sanitization) I don't think there is any general way we can scrape the console log to detect the case where Anaconda is displaying a prompt or an error and waiting. I think the best we can do is devise some regexes, like the existing kernel panic detection, to match on certain hardcoded error strings which we know indicate that Anaconda has failed unrecoverably.

Comment 9 Nick Coghlan 2014-01-15 08:46:07 UTC

Could we potentially have a config subdirectory called "installfailed.d" or similar, and put files containing lists of regexes in there?

So rather than hardcoding them, our default list of regexes would go in there as "kickstart.conf", and users would be free to add additional regexes that they are confident indicate an install failure, without needing to update Beaker.

The other advantage of such an approach is that adding additional regex files would be straightforward if Beaker is ever updated to support other bootstrapping methods.

Comment 10 Bill Peck 2014-01-15 13:41:36 UTC

When anaconda prompts like this does it also run %traceback?  I'm guessing not or you would have looked at using that.

Comment 11 Dan Callaghan 2014-01-16 00:38:25 UTC

(In reply to Bill Peck from comment #10)
> When anaconda prompts like this does it also run %traceback?  I'm guessing
> not or you would have looked at using that.

%traceback is interesting, I never heard of that before. But I don't think it will help much here, since it only fires for unhandled exceptions. It doesn't seem to fire for errors where Anaconda handles it and displays a prompt or message to the user (which is most of the cases I could find).

Comment 12 Dan Callaghan 2014-01-16 00:39:41 UTC

(In reply to Nick Coghlan from comment #9)
> Could we potentially have a config subdirectory called "installfailed.d" or
> similar, and put files containing lists of regexes in there?

Right, I shouldn't have said hardcoded. I did intend that the regexes would be in a config file somewhere, like the panic regex currently is.

I like the idea of having a directory containing patterns though. That makes it easier to organise things.

Comment 13 Dan Callaghan 2014-01-17 03:57:48 UTC

On Gerrit: http://gerrit.beaker-project.org/2696

Comment 17 Dan Callaghan 2014-01-22 23:05:12 UTC

*** Bug 980357 has been marked as a duplicate of this bug. ***

Comment 19 Dan Callaghan 2014-01-23 22:45:26 UTC

(In reply to xjia from comment #18)
> However, this bug is verified. I list some issues(not bug)
> 1. Why rhel4 don't have failure patterns? i didn't see any patches for rhel4.

The output of RHEL4 Anaconda looks mostly the same as RHEL3 Anaconda so we don't need any specific patterns for RHEL4, it is covered by the RHEL3 ones.

> 2. Because on job details page, user could see the wrong message in
> console.log . So we could avoid the special character, such as "┤“.

Ultimately the failure message we report back is a best effort only, and is never going to give an exact indication of why Anaconda failed. The user will need to look in the logs to understand what went wrong.

However, this patch improves the pattern matching so that we don't report those decorative characters, while still requiring them to be present in order to match:

http://gerrit.beaker-project.org/2726

That will make the messages look slightly less bizarre.

Comment 20 Nick Coghlan 2014-02-03 04:51:33 UTC

This change is included in the Beaker 0.15.3 maintenance release:

http://beaker-project.org/docs/whats-new/release-0.15.html#beaker-0-15-3

Note You need to log in before you can comment on or make changes to this bug.