Bug 1504264

Summary: hangs in mkfs.ext4/loop with 4.13.6-200.fc26.armv7hl+lpae
Product: [Fedora] Fedora Reporter: Kevin Fenzi <kevin>
Component: kernelAssignee: Peter Robinson <pbrobinson>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 27CC: airlied, bskeggs, dan, dustymabe, eparis, esandeen, hdegoede, herrold, ichavero, itamar, jarodwilson, jcm, jforbes, jglisse, jonathan, josef, jwboyer, kernel-maint, labbott, linville, mchehab, mjg59, ngompa13, nhorman, pbrobinson, quintela, steved
Target Milestone: ---Flags: jforbes: needinfo?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 15:18:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418    
Attachments:
Description Flags
dmesg-w
none
dmesg-t
none
test-output none

Description Kevin Fenzi 2017-10-19 20:03:20 UTC
We have 3 armv7 compose vm's that are used by nightly rawhide and branched composes to do 'runroot' jobs. These are where koji makes a mock chroot and runs some commands in there and writes results out to shared storage. 

In the past we have used hardware armv7 SOCs for this (calxeda), but those are end of life and limited, so we have moved to armv7 vm's on top of aarch64 vmhosts (hp moonshot). 

I switched over to using the new armv7 vm's on oct 3rd and they worked fine. 
Then on oct 17th, I noticed that I had installed the normal 'kernel' on them instead of the 'kernel-lpae', so they only saw 4gb of memory. I updated the kernel and rebooted and since the 17th they have been having this issue. So, without lpae its fine, with lpae this issue happens. 

Here's a pstree of a hung instance: 

  |   `-kojid,29182 /usr/sbin/kojid --fg --force-lock --verbose
  |       `-mock,6571 -tt /usr/libexec/mock/mock -r koji/f28-build-10234478-800588 chroot -- /bin/sh -c...
  |           `-sh,6585 -c...
  |               |-sh,6586 -c...
  |               |   `-lorax,6591 /usr/sbin/lorax --product=Fedora --version=Rawhide --release=Rawhide...
  |               |       `-mkfs.ext4,12116 -L Anaconda -b 4096 -m 0 /dev/loop0
  |               `-tee,6587 /builddir/runroot.log

Here's the processes stuck: 

root       368  0.0  0.0  14756  1704 ?        D<sl 15:53   0:00 /sbin/auditd
root     32380  0.0  0.0      0     0 ?        D    18:51   0:03 [kworker/u8:1]
root     11377  0.0  0.0      0     0 ?        D<   19:26   0:00 [loop0]
root     11381  0.0  0.0   7616  6368 ?        D    19:26   0:00 mkfs.ext4 -L Anaconda -b 4096 -m 0 /dev/loop0
root     11568  0.0  0.0  18140  4772 ?        Ds   19:56   0:00 /usr/lib/systemd/systemd-journald

The only thing in dmesg is systemd-journald getting upset: 

[Thu Oct 19 19:50:30 2017] systemd-journald[11565]: File /var/log/journal/d08c4416aa9845648e06760c48cc0292/system.journal corrupted or uncleanly shut down, renaming and replacing.
[Thu Oct 19 19:53:30 2017] systemd-journald[11567]: File /var/log/journal/d08c4416aa9845648e06760c48cc0292/system.journal corrupted or uncleanly shut down, renaming and replacing.
[Thu Oct 19 19:56:31 2017] systemd-journald[11568]: File /var/log/journal/d08c4416aa9845648e06760c48cc0292/system.journal corrupted or uncleanly shut down, renaming and replacing.
[Thu Oct 19 19:59:31 2017] systemd-journald[11610]: File /var/log/journal/d08c4416aa9845648e06760c48cc0292/system.journal corrupted or uncleanly shut down, renaming and replacing.

The vm's are using 4.13.6-200.fc26.armv7hl+lpae kernel
The vmhost is on 4.5.0-15.4.2.el7.aarch64

Comment 1 Jon Masters 2018-01-02 17:20:10 UTC
This made me ponder whether it's time to just cut the 32-bit builders over to being containers on 64-bit VMs. That might make for a much more easily supportable setup.

Comment 2 Peter Robinson 2018-01-02 23:55:33 UTC
(In reply to Jon Masters from comment #1)
> This made me ponder whether it's time to just cut the 32-bit builders over
> to being containers on 64-bit VMs. That might make for a much more easily
> supportable setup.

Define supportable, it would be nothing like any of our other infrastructure at the current point in time.

Comment 3 Kevin Fenzi 2018-01-03 00:28:17 UTC
We aren't even using the systemd-nspawn mode of mock yet, so I fear we aren't ready to try and containerize builders. At the very least that mode breaks some image creation tasks. 

The host machines have been updated to the rhel7-alt kernel/userspace, and I've moved these armv7 vm's to fedora 27 and thought for a day or two it fixed this issue, but it seems it just made it happen less often, we have still hit it since moving to f27 on the vms. :( It seems to take a few days/composes before it hits. 

I suppose I could try Fedora 27 as the host os? Or is there anything else I could gather to help track this down any?

Comment 4 Kevin Fenzi 2018-01-18 23:51:36 UTC
Just a note that we are still hitting this. Daily rawhide composes hit these builders and sometimes hang them, rebooting them then causes the compose to continue.

Comment 5 Kevin Fenzi 2018-02-15 17:16:54 UTC
We are still seeing this. 

vm's now have: 4.14.8-300.fc27.armv7hl+lpae
hosts are: 4.11.0-44.2.1.el7a.aarch64

I am going to try and install the vm's on different storage (iscsi if I can), and perhaps try the host running f27. Other ideas welcome.

Comment 6 Dan Horák 2018-02-16 13:19:40 UTC
If kdump/crashdump won't work on armv7, then the sysrq interface could provide some help. "w" and "t" commands look useful.

Comment 7 Dusty Mabe 2018-02-16 13:33:14 UTC
(In reply to Dan Horák from comment #6)
> If kdump/crashdump won't work on armv7, then the sysrq interface could
> provide some help. "w" and "t" commands look useful.

yeah. and in case you haven't done this in a while I wrote down how to send sysrq to a kvm guest here: 

https://dustymabe.com/2012/04/21/send-magic-sysrq-to-a-kvm-guest-using-virsh/

Comment 8 Neal Gompa 2018-02-17 14:51:50 UTC
(In reply to Jon Masters from comment #1)
> This made me ponder whether it's time to just cut the 32-bit builders over
> to being containers on 64-bit VMs. That might make for a much more easily
> supportable setup.

Does the cross-section of hardware used for AArch64 servers running AArch64 VMs support armv7hl containers (that is, have support for 32-bit arm instructions)? My experience thus far with several ARM servers (like the SoftIron ones) is that they lack that.

Comment 9 Kevin Fenzi 2018-02-18 18:33:53 UTC
Created attachment 1397637 [details]
dmesg-w

(In reply to Dan Horák from comment #6)
> If kdump/crashdump won't work on armv7, then the sysrq interface could
> provide some help. "w" and "t" commands look useful.

Here's dmesg-w and dmesg-t output.

Comment 10 Kevin Fenzi 2018-02-18 18:34:31 UTC
Created attachment 1397638 [details]
dmesg-t

dmesg-t

Comment 11 Laura Abbott 2018-02-19 19:05:31 UTC
Based on the backtraces, this smells like balancing getting stuck forever on dirty pages. Given this is 32-bit with highmem there might be something off in page calculations (especially since such things have happened in the past). I'll send an e-mail to linux-mm asking about this. In parallel it might be worth testing 4.15.

Comment 12 Laura Abbott 2018-02-19 20:22:32 UTC
Looking again, it seems like the writeback is just getting throttled a lot, we _might_ be hitting something fixed by https://patchwork.kernel.org/patch/10201593/ but given we know the underlying storage is slow, it might be worth testing on a different medium

Comment 13 Kevin Fenzi 2018-02-19 20:28:32 UTC
I've updated the guests to 4.15.3-300.fc27.armv7hl+lpae.

Note that the underlying storage is a lv on the host using ssd's... so it shouldn't be all that slow. ;(

Comment 14 Laura Abbott 2018-02-26 22:32:23 UTC
I asked upstream and I got this response:

"How much dirtyable memory does the system have? We do allow only lowmem
to be dirtyable by default on 32b highmem systems. Maybe you have the
lowmem mostly consumed by the kernel memory. Have you tried to enable
highmem_is_dirtyable?"

Can we check/try adjusting the highmem_is_dirtyable setting?

Comment 15 Kevin Fenzi 2018-02-26 22:43:59 UTC
ok. I have set that to 1 on the compose builders. Will see if the problem happens again.

Comment 16 Kevin Fenzi 2018-03-01 19:54:05 UTC
So, since setting that to 1: 

buildvm-armv7-01.arm.fedoraproject.org | SUCCESS | rc=0 | (stdout)  19:52:37 up 6 days, 19:41,  0 users,  load average: 0.01, 0.06, 0.62
buildvm-armv7-02.arm.fedoraproject.org | SUCCESS | rc=0 | (stdout)  19:52:37 up 6 days, 21:16,  0 users,  load average: 0.01, 0.07, 0.64
buildvm-armv7-03.arm.fedoraproject.org | SUCCESS | rc=0 | (stdout)  19:52:37 up 6 days, 58 min,  0 users,  load average: 0.00, 0.03, 0.51

no reboots needed/hangs. ;) 

I guess this is something we just need to keep manually setting? Or is it something upstream would be willing to change the default on?

Comment 17 Laura Abbott 2018-03-01 19:56:45 UTC
If it's a tunable setting I think the preference is for us to set it but I'll follow up with upstream because multiple process stuck in D state is a bad failure mode for a tuning setting.

Comment 18 Laura Abbott 2018-03-06 00:41:50 UTC
Upstream wanted a little bit more information, can you run the scratch build https://koji.fedoraproject.org/koji/taskinfo?taskID=25509848 which has one additional debugging patch applied and then _WITHOUT_ the highmem_is_dirtyable setting let it hang and then run 

---------- command line start ----------
# echo m > /proc/sysrq-trigger
# echo t > /proc/sysrq-trigger
# sleep 10
# echo m > /proc/sysrq-trigger
# echo t > /proc/sysrq-trigger
# sleep 10
# echo m > /proc/sysrq-trigger
# echo t > /proc/sysrq-trigger
---------- command line end ----------


basically we want to collect the memory (sysrq-m) and the task state (sysrq-t) a couple of times to see if there is any change.

Comment 19 Kevin Fenzi 2018-03-18 19:13:50 UTC
Just as an update here: I booted one of our arm buildvm's in this kernel last week and have been waiting for it to hang. So far it hasn't. Hopefully it will soon.

Comment 20 Kevin Fenzi 2018-03-23 14:20:53 UTC
Created attachment 1412138 [details]
test-output

Here's the output of the various sysrq commands and dmesg at the end.

Comment 21 Justin M. Forbes 2018-07-23 15:26:43 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 22 Justin M. Forbes 2018-08-29 15:18:02 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.