+++ This bug was initially created as a clone of Bug #1499260 +++ I have prepared a F-27 installer image (from https://kojipkgs.fedoraproject.org/compose/branched/Fedora-27-20171004.n.0/compose/Everything/s390x/os/) and now I get a segfault when kernel switches to user-space. ================================================================================ [ 3.786004] Key type big_key registered [ 3.787436] Key type encrypted registered [ 3.787720] Freeing unused kernel memory: 664K [ 3.787725] Write protected read-only-after-init data: 20k [ 3.787728] rodata_test: all tests were successful [ 3.790779] User process fault: interruption code 0013 ilc:3 in libpthread-2. 26.so[3ff93c00000+1b000] [ 3.790786] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1 [ 3.790788] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) [ 3.790789] task: 00000000fafc8000 task.stack: 00000000fafc4000 [ 3.790791] User PSW : 0705200180000000 000003ff93c14e70 [ 3.790792] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI: 0 EA:3 [ 3.790794] User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000 003ff93144d5e [ 3.790795] 0000000000000000 0000000000000002 0000000000000000 000 003ff00000000 [ 3.790796] 0000000000000000 0000000000000418 0000000000000000 000 003ffcc9fe770 [ 3.790797] 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000 003ffcc9fe6d0 [ 3.790805] User Code: 000003ff93c14e62: 60e0b030 std %f14,48( %r11) [ 3.790805] 000003ff93c14e66: 60f0b038 std %f15,56( %r11) [ 3.790805] #000003ff93c14e6a: e5600000ff0e tbegin 0,65294 [ 3.790805] >000003ff93c14e70: a7740006 brc 7,3ff93c 14e7c [ 3.790805] 000003ff93c14e74: a7080000 lhi %r0,0 [ 3.790805] 000003ff93c14e78: a7f40023 brc 15,3ff93 c14ebe [ 3.790805] 000003ff93c14e7c: b2220000 ipm %r0 [ 3.790805] 000003ff93c14e80: 8800001c srl %r0,28 [ 3.790819] Last Breaking-Event-Address: [ 3.790821] [<000003ff93c14de4>] 0x3ff93c14de4 [ 3.790950] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00 000004 [ 3.790950] [ 3.790952] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1 [ 3.790953] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) [ 3.790953] Call Trace: RUNNING DEVEL2 I have a guest running F-27 without problem, but what changed is that z/VM 6.4 hypervisor (updated last weekend to 6.4) now exposes the Transactional Execution bit (TE) to z/VM guests. It wasn't the case with the previously installed z/VM (6.1 or 6.3?) Version-Release number of selected component (if applicable): glibc-2.26-8.fc27.s390x --- Additional comment from Carlos O'Donell on 2017-10-06 11:53:59 EDT --- I find it suspicious that this is after a 'tbegin' instruction has started executing a transactional region. Exactly what hardware is this and does it claim to support HWCAP_S390_TE? --- Additional comment from Dan Horák on 2017-10-06 12:12:49 EDT --- (In reply to Carlos O'Donell from comment #1) > I find it suspicious that this is after a 'tbegin' instruction has started > executing a transactional region. > > Exactly what hardware is this and does it claim to support HWCAP_S390_TE? It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings "Guest Transactional Execution support". That's a change in our environment since last week. --- Additional comment from Carlos O'Donell on 2017-10-06 12:26:07 EDT --- (In reply to Dan Horák from comment #2) > (In reply to Carlos O'Donell from comment #1) > > I find it suspicious that this is after a 'tbegin' instruction has started > > executing a transactional region. > > > > Exactly what hardware is this and does it claim to support HWCAP_S390_TE? > > It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings > "Guest Transactional Execution support". That's a change in our environment > since last week. Is there any way to disable TE at the hardware level so the kernel doesn't report it and then see if this fixes the boot issue? Otherwise I will have to rebuild F27 glibc for s390x with elision turned off until I get the upstream tunables in place. --- Additional comment from Carlos O'Donell on 2017-10-06 12:26:53 EDT --- (In reply to Carlos O'Donell from comment #3) > (In reply to Dan Horák from comment #2) > > (In reply to Carlos O'Donell from comment #1) > > > I find it suspicious that this is after a 'tbegin' instruction has started > > > executing a transactional region. > > > > > > Exactly what hardware is this and does it claim to support HWCAP_S390_TE? > > > > It's RH zEC12 which supports TE, and I read in z/VM 6.4 news that it brings > > "Guest Transactional Execution support". That's a change in our environment > > since last week. > > Is there any way to disable TE at the hardware level so the kernel doesn't > report it and then see if this fixes the boot issue? > > Otherwise I will have to rebuild F27 glibc for s390x with elision turned off > until I get the upstream tunables in place. ... if that's the issue. --- Additional comment from Dan Horák on 2017-10-09 07:09:32 EDT --- so with a glibc that correctly disables the lock elision (https://koji.fedoraproject.org/koji/taskinfo?taskID=22348456) the boot of the installation image continues correctly, without the segfault ... --- Additional comment from Dan Horák on 2017-10-09 07:37:46 EDT --- The installation then succeeds, it installs glibc-2.26-8.fc27.s390x from the Fedora repos and the installed system boots without an issue. --- Additional comment from Dan Horák on 2017-10-12 06:03:07 EDT --- for the record, rawhide compose has the same problem (https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20171011.n.0/compose/Server/s390x/os/images/) --- Additional comment from Carlos O'Donell on 2017-10-13 12:44:59 EDT --- It goes without saying that we are very interested in the root cause analysis of this issue with input from IBM, since this should "just work" (tm). --- Additional comment from Dan Horák on 2017-10-16 08:21:09 EDT --- There is (only) one difference I'm aware of between the working and failing scenario - the installer boot is initiated from the CMS shell using the virtual card reader, while the installed systems starts from CP using a DASD.
Cloning the original bug to allow further investigation.
Created attachment 1347225 [details] full boot log Boot log from Rawhide installation started from zipl from a previous installation. When the same glibc and kernel as in the installer are on installed, there is no such crash as when the installer is run. Installing kernel after glibc ensures that the new glibc is included in the boot initramfs.
Switching to kernel, because - the interruption code 0013 means a special-operation exception and PoP says for TBEGIN: "2. A special-operation exception is recognized and the operation is suppressed if the transactional-execution control, bit 8 of control register 0, is zero. - the content or CR0 can be confirmed with 00: d xg0 00: CRG 0 = 0000000014846A12 after catching the exception with a trace (TR prog 13) 00: -> 0000000001000582' TBEGIN E5600000FF0E 0000000000000000 00: *** 0000000001000582' PROG 0013 -> 00000000008EF144" 00: SPECIAL OPERATION The problem can be reproduced with something like #include <stdio.h> int tbegin() { int ret; __asm__ __volatile__ ( " tbegin 0, 0xFF0E\n\t" " jnz 0f\n\t" " lhi %0, 0\n\t" " j 1f\n\t" "0: ipm %0\n\t" " srl %0, 28\n\t" "1:" :"=&d" (ret):: ); return ret; } void tend() { __asm__ ( " tend\n\t" ::: ); } int main(void) { int t; t = tbegin(); printf("tbegin=%d\n", t); tend(); return 0; } linked staticly (or dynamicly) and used as /init CR0 content from another guest in the same z/VM instance [sharkcz@devel11 ~]$ sudo vmcp d xg0 [sudo] password for sharkcz: CRG 0 = 0080000014846A12 I suspect the kernel doesn't explicitly enable TE (under some conditions) when it runs on machine with TE available in the facilities bits (like our zEC12 with zVM 6.4.0).
PPC kernel added recently a command line option to disable transactional memory - see http://patchwork.ozlabs.org/patch/824763/ It might be useful for s390x to follow the same idea.
------- Comment From h.carstens.com 2017-11-08 16:38 EDT------- (In reply to comment #7) > PPC kernel added recently a command line option to disable transactional > memory - see http://patchwork.ozlabs.org/patch/824763/ > It might be useful for s390x to follow the same idea. Yes, indeed. This bug has proven that would be useful.
Created attachment 1349586 [details] test patch against v4.13 ------- Comment on attachment From h.carstens.com 2017-11-08 16:41 EDT------- I had just a short look into the code and suspect the missing initial enabling of the transactional execution facility early at startup to be the culprit. Later on the corresponding control register bit will be set at every task switch. Could you please try this, and let me know if it fixes the problem? The patch is against vanilla v4.13 but should easily apply to your kernel sources.
yes, the problem is away, both with minimal test case from comment #3 and also with Rawhide installer images recreated with an updated kernel.
Shouldn't something like this be applied as well? diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index 164a1e16b53e..ad6e4a4c65dc 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -764,7 +764,7 @@ static int __init setup_hwcaps(void) /* * Transactional execution support HWCAP_S390_TE is bit 10. */ - if (test_facility(50) && test_facility(73)) + if (test_facility(50) && test_facility(73) && __ctl_get_bit(0, 55)) elf_hwcap |= HWCAP_S390_TE; /* And how about KVM or qemu? Or arch/s390/kernel/ptrace.c (it checks for MACHINE_HAS_TE)? Would you have an explanation why we saw the problem only in a installation, but did not in already installed guests?
------- Comment From h.carstens.com 2017-11-09 07:08 EDT------- (In reply to comment #10) > yes, the problem is away, both with minimal test case from comment #3 and > also with Rawhide installer images recreated with an updated kernel. Thanks for testing. After looking a bit deeper into the code, it looks incomplete however. I will mark the current patch obsolete and a new one. There are more bugs with respect to transactional exection. > Shouldn't something like this be applied as well? > * Transactional execution support HWCAP_S390_TE is bit 10. > */ > - if (test_facility(50) && test_facility(73)) > + if (test_facility(50) && test_facility(73) && __ctl_get_bit(0, 55)) > elf_hwcap |= HWCAP_S390_TE; As a cleanup this should be converted to something like if (MACHINE_HAS_TE) But that doesn't fix any bug. > And how about KVM or qemu? Or arch/s390/kernel/ptrace.c (it checks for > MACHINE_HAS_TE)? KVM + QEMU are fine. This a "simple" kernel bug. > Would you have an explanation why we saw the problem only in a installation, but did not > but did not in already installed guests? Yes, as soon as the system runs, control register 0 is updated on every task switch and the facility gets enabled (if the following process is not a kernel thread). So you will see this behavior only on startup. Unless you use ptrace requests which disable it again... which is yet another bug. See new patch. ------- Comment From h.carstens.com 2017-11-09 07:11 EDT------- Could you give the new patch also a try, please?
Created attachment 1349861 [details] For more bugs within transactional execution ------- Comment on attachment From h.carstens.com 2017-11-09 07:10 EDT------- There are several bugs with control register handling with respect to transactional execution: - on task switch update_per_regs() is only called if the next task has an mm (is not a kernel thread). This however is incorrect. This breaks e.g. for user mode helper handling, where the kernel creates a kernel thread and then execve's a user space program. Control register contents related to transactional execution won't be updated on execve. If the previous task ran with transactional execution disabled then the new task will also run with transactional execution disabled, which is incorrect. Therefore call update_per_regs() unconditionally within switch_to(). - on startup the transactional execution factility is not enabled for the idle thread. This is not really a bug, but an inconsistency to other facilities. Therefore enable the facility if it is available. - on fork the new thread's per_flags field is not cleared. This means that a child process inherits the PER_FLAG_NO_TE flag. This flag can be set with a ptrace request to disable transactional execution for the current process. It should not be inherited by new child processes in order to be consistent with the handling of all other PER related debugging option. Therefore clear the per_flags field in copy_thread_tls().
> ------- Comment From h.carstens.com 2017-11-09 07:11 EDT------- > Could you give the new patch also a try, please? yes, all looks still good with the new patch
And from the "Cc: <stable.org> # v3.7+" note in the patch, I assume we need the fix in RHEL-7 kernel as well, right? Hanns, will you manage it from the IBM side for the whole enterprise part, or shall I initiate the process by cloning this bug to RHEL?
------- Comment From h.carstens.com 2017-11-10 05:32 EDT------- Yes, a backport to RHEL-7 is also needed.
(In reply to Dan Horák from comment #12) > And from the "Cc: <stable.org> # v3.7+" note in the patch, I > assume we need the fix in RHEL-7 kernel as well, right? > > Hanns, will you manage it from the IBM side for the whole enterprise part, > or shall I initiate the process by cloning this bug to RHEL? . ... Dan, I will check at our side how to proceed for RHEL ...
(In reply to IBM Bug Proxy from comment #10) > Created attachment 1349861 [details] > For more bugs within transactional execution > > . fyi ... the upstream git commit in the s390 tree is https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 ("s390: fix transactional execution control register handling") .
and merged to mainline as https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. As kernel maintainers, we try to keep up with bugzilla but due the rate at which the upstream kernel project moves, bugs may be fixed without any indication to us. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.15.3-300.f27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you did actually update, we apologize for the inconvenience (there are a lot of bugs). If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.