Even for kickstart installs, Anaconda will sometimes stop and prompt for user input on the serial console. Similar behaviour is also seen if Anaconda drops into an interactive install for some reason (it asks the user to choose a language). In Beaker, both of these appear as an External Watchdog Timeout when the install gets aborted. Similar to the panic detection, it may be desirable to monitor the console log for output that looks like an installer prompt, and provide that information in the main UI (e.g. as a task result for /distribution/install) rather than requiring users to go look at the console log to determine that an anaconda prompt was triggered.
Does anybody have some sample output from Anaconda prompting during installation? The jobs from comment 0 have unfortunately expired. I will need to devise some good heuristics for finding Anaconda errors/prompts. I suspect we can look for either the ASCII art or the wording about F12, which should be fairly consistent for curses installs up to RHEL6. For RHEL7 I'm less sure what we can look for. And we will also need to handle cmdline installs, such as on S/390. So I will need to collect as many example outputs as I can...
Another similar failure mode we may be able to detect here is when the installation completes but the harness was not installed, in which case the system sits there doing nothing until EWD.
I've assembled a collection of console logs from as many different Anaconda failure scenarios as I can think of: http://fedorapeople.org/~dcallagh/bz952661-anaconda-failures/ The tl;dr is that there is very little in common between them all. My first idea of looking for "<F12> next screen" is not going to work because that appears under normal circumstances too. We had the idea of looking for "<F12> next screen" followed by a pause of >5 minutes in output, to detect Anaconda displaying a screen and waiting, but the problem with that approach is that there are some circumstances (such as creating a filesystem on a very large volume) where there is no output for many minutes, and we don't want to detect that as an error. Given the huge variety of different outputs from the various Anaconda versions (and all the various mangling from the serial consoles and Beaker's control char sanitization) I don't think there is any general way we can scrape the console log to detect the case where Anaconda is displaying a prompt or an error and waiting. I think the best we can do is devise some regexes, like the existing kernel panic detection, to match on certain hardcoded error strings which we know indicate that Anaconda has failed unrecoverably.
Could we potentially have a config subdirectory called "installfailed.d" or similar, and put files containing lists of regexes in there? So rather than hardcoding them, our default list of regexes would go in there as "kickstart.conf", and users would be free to add additional regexes that they are confident indicate an install failure, without needing to update Beaker. The other advantage of such an approach is that adding additional regex files would be straightforward if Beaker is ever updated to support other bootstrapping methods.
When anaconda prompts like this does it also run %traceback? I'm guessing not or you would have looked at using that.
(In reply to Bill Peck from comment #10) > When anaconda prompts like this does it also run %traceback? I'm guessing > not or you would have looked at using that. %traceback is interesting, I never heard of that before. But I don't think it will help much here, since it only fires for unhandled exceptions. It doesn't seem to fire for errors where Anaconda handles it and displays a prompt or message to the user (which is most of the cases I could find).
(In reply to Nick Coghlan from comment #9) > Could we potentially have a config subdirectory called "installfailed.d" or > similar, and put files containing lists of regexes in there? Right, I shouldn't have said hardcoded. I did intend that the regexes would be in a config file somewhere, like the panic regex currently is. I like the idea of having a directory containing patterns though. That makes it easier to organise things.
On Gerrit: http://gerrit.beaker-project.org/2696
*** Bug 980357 has been marked as a duplicate of this bug. ***
(In reply to xjia from comment #18) > However, this bug is verified. I list some issues(not bug) > 1. Why rhel4 don't have failure patterns? i didn't see any patches for rhel4. The output of RHEL4 Anaconda looks mostly the same as RHEL3 Anaconda so we don't need any specific patterns for RHEL4, it is covered by the RHEL3 ones. > 2. Because on job details page, user could see the wrong message in > console.log . So we could avoid the special character, such as "┤“. Ultimately the failure message we report back is a best effort only, and is never going to give an exact indication of why Anaconda failed. The user will need to look in the logs to understand what went wrong. However, this patch improves the pattern matching so that we don't report those decorative characters, while still requiring them to be present in order to match: http://gerrit.beaker-project.org/2726 That will make the messages look slightly less bizarre.
This change is included in the Beaker 0.15.3 maintenance release: http://beaker-project.org/docs/whats-new/release-0.15.html#beaker-0-15-3