It seems like the exact wording of this oops message is different to the normal ones we are expecting. In this case, the important line indicating an oops happened is this one: [ 1595.106182] Internal error: Oops - SP/PC alignment exception: 8a000000 [#1] SMP but the PANIC_REGEX setting in the lab controller settings is only going to match on "Oops: " (note the colon). We can expand the pattern of course, but I am curious as to why this message on arm64 seems to be different and also what other possibilities there might be.
Add jbastian to cc... Jeff you have probably seen more than your fair share of arm64 oops messages :-) so I was wondering if you have any opinion here. Should we just add another pattern to Beaker's panic regex to match the "Internal error: Oops" string? I feel like we are fighting a bit of a losing battle here, if kernel developers keep using any arbitrary formatting and spelling for their oops messages, but maybe there is nothing better we can do.
I was curious what abrt does, since it also catches oops by reading kernel messages. It looks like it has this quite lengthy list of possible patterns it will match on: https://github.com/abrt/abrt/blob/faf826e9b76f9a0de0c2b046080cf792f1232668/src/lib/kernel.c#L77 although nowhere can I see where it matches on the actual string "Oops" or anything like it. Maybe I am misreading that code.
Just from perusing the kernel source code, I see lots of arbitrary formatting on the ARM side: $ find . -type f | xargs grep Oops ... ./arm/kernel/traps.c: str = "Oops - BUG"; ./arm/kernel/traps.c: arm_notify_die("Oops - undefined instruction", regs, &info, 0, 6); ./arm/kernel/traps.c: die("Oops - bad mode", regs, 0); ./arm/kernel/traps.c: arm_notify_die("Oops - bad syscall", regs, &info, n, 0); ./arm/kernel/traps.c: arm_notify_die("Oops - bad syscall(2)", regs, &info, no, 0); ./arm/kernel/traps.c: panic("Oops failed to kill thread"); ./arm/mm/alignment.c: * Oops, we didn't handle the instruction. ./arm/mm/fault.c: * Oops. The kernel tried to access some page that wasn't present. ./arm/mm/fault.c: die("Oops", regs, fsr); ./arm64/kernel/traps.c: die("Oops - bad mode", regs, 0); ./arm64/kernel/traps.c: die("Oops - BUG", regs, 0); ./arm64/mm/fault.c: die("Oops", regs, esr); ./arm64/mm/fault.c: arm64_notify_die("Oops - SP/PC alignment exception", regs, &info, esr); ... The x86 Oops messages are wrapped in a single function to give consistent formatting: ./x86/mm/fault.c: if (__die("Oops", regs, error_code)) The __die() function adds the colon: int __die(const char *str, struct pt_regs *regs, long err) { ... printk(KERN_DEFAULT "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter, ... I suppose a safe regex would be to search for the string Oops surrounded by word boundaries, e.g., \bOops\b
With the possibility that I might have missed the mark fixing this or clashing with Dan's progress, I put up a patch: https://gerrit.beaker-project.org/c/5989/
Injecting the following string into the dmesg log results in an expected failure during a beaker run. echo 'Internal error: Oops - SP/PC alignment exception: 8a000000 [#1] SMP' > /dev/kmesg
Beaker 25.0 has been released. Release notes are available upstream: https://beaker-project.org/docs/whats-new/release-25.html
This new pattern is too broad, it now triggers if someone puts "Oops" into their test case name: bug 1572880. We need to find a narrow one...