Bug 1247382 - i386 Rawhide install fails to boot
i386 Rawhide install fails to boot
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
i386 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-27 16:47 EDT by Mike Ruckman
Modified: 2015-09-17 12:50 EDT (History)
21 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1250148 (view as bug list)
Environment:
Last Closed: 2015-09-17 12:50:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
full boot log (48.19 KB, text/plain)
2015-07-28 09:30 EDT, Paul Whalen
no flags Details
full boot log from a VM (377.21 KB, text/plain)
2015-07-28 13:10 EDT, Mike Ruckman
no flags Details

  None (edit)
Description Mike Ruckman 2015-07-27 16:47:45 EDT
Description of problem:
Attempt to boot i386 Server DVD results in a crash.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Boot image
2. Crash
3.

Actual results:
Crash

Expected results:
Boot to installer

Additional info:

transcribed logs from verification in a VM: http://fpaste.org/248730/02938314/

If you attempt to check the media at boot time it also fails to start with:
[FAILED] Failed to start checkisomd5@-dev-sr0.service

The checksum of my iso is valid.

Tested this on bare metal (dd to usb stick) and in a VM.
Comment 1 Fedora Blocker Bugs Application 2015-07-27 16:50:42 EDT
Proposed as a Blocker for 23-alpha by Fedora user roshi using the blocker tracking app because:

 This bug violates the following Alpha criterion: "All release-blocking images must boot in their supported configurations."
Comment 2 satellitgo 2015-07-27 16:54:45 EDT
tested in f23 Virtual Machine Manager:

boot.iso i386 0727   "BUG: Bad page state in process kworker/u5:4 pfn:3619

...
page dumped because: page still charged to cgroup

workstation netinstall i386
failed in same manner
at starting switch root
Comment 3 Paul Whalen 2015-07-27 17:00:59 EDT
Also happening with ARM pxe installations:

[   53.203157] BUG: Bad page state in process kworker/u9:15  pfn:1016e
[   53.209480] page:ee5fb378 count:0 mapcount:0 mapping:ed3d497c index:0xac
[   53.216212] flags: 0x20021(locked|lru|mappedtodisk)
[   53.221172] page dumped because: page still charged to cgroup
[   53.226940] bad because of flags:
[   53.230287] flags: 0x21(locked|lru)
[   53.233824] page->mem_cgroup:ed83ac00
Comment 4 Josh Boyer 2015-07-27 19:16:40 EDT
Which kernel version?  Do you have the full boot log?  That would be very helpful.

I'll note that the kernel team pushed i686 issues to community support some time ago.  Given that it happens on ARM as well, there is some grounds for blocker, but that is jumping the gun a bit.  We don't even have sufficient information to start debugging it at this point.

Nobody has reported this on a 64-bit architecture that I'm aware of.
Comment 5 Paul Whalen 2015-07-28 09:30:27 EDT
Created attachment 1057029 [details]
full boot log
Comment 6 Paul Whalen 2015-07-28 09:30:45 EDT
F23 Alpha TC2 with kernel-4.2.0-0.rc3.git4.1.fc23. Full boot log attached.
Comment 7 Josh Boyer 2015-07-28 09:35:44 EDT
(In reply to Paul Whalen from comment #6)
> F23 Alpha TC2 with kernel-4.2.0-0.rc3.git4.1.fc23. Full boot log attached.

This is the first major oops in your case.  It looks ARM specific.  Everything after it is already suspect.

If you'd like, split your issue out into a separate bug.  We'll focus on i686 in this one.  rc3.git4.1 booted in our i686 KVM test setup and has booted elsewhere for others as well given we have submitted test results from it.

[   15.294300] ------------[ cut here ]------------
[   15.298919] kernel BUG at arch/arm/mm/highmem.c:114!
[   15.303877] Internal error: Oops - BUG: 0 [#1] SMP ARM
[   15.309011] Modules linked in: xgmac sata_highbank xts lrw gf128mul sha256_arm dm_crypt dm_round_robin linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor xor_neon async_tx raid6_pq raid1 raid0 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs dm_multipath
[   15.340086] CPU: 2 PID: 598 Comm: dracut-rootfs-g Tainted: G        W       4.2.0-0.rc3.git4.1.fc23.armv7hl #1
[   15.350079] Hardware name: Highbank
[   15.353568] task: c62c8b80 ti: c6424000 task.ti: c6424000
[   15.358972] PC is at __kunmap_atomic+0x54/0x178
[   15.363503] LR is at copy_page_to_iter+0x15c/0x258
[   15.368289] pc : [<c0222f84>]    lr : [<c052d9f0>]    psr: 200b0013
[   15.368289] sp : c6425e78  ip : 00000020  fp : 8164aac0
[   15.379759] r10: c6425f0c  r9 : c6425f14  r8 : 00001000
[   15.384975] r7 : fff00000  r6 : 00000000  r5 : c0d47554  r4 : ffeff000
[   15.391493] r3 : 0001f000  r2 : ffee0000  r1 : 2c686000  r0 : ffeff000
[   15.398011] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   15.405135] Control: 10c5387d  Table: 0646404a  DAC: 00000015
[   15.410872] Process dracut-rootfs-g (pid: 598, stack limit = 0xc6424220)
[   15.417563] Stack: (0xc6425e78 to 0xc6426000)
[   15.421911] 5e60:                                                       c6425f0c 00003000
[   15.430080] 5e80: 00000000 c052d9f0 ef78f4d8 ffeff000 00001000 00000000 ecc4e348 00000000
[   15.438248] 5ea0: 00000002 00002000 c6425f28 ecd5e900 c6425f14 c0342ee4 000200da 00000000
[   15.446416] 5ec0: 00001000 00000000 ecc4e434 00001000 c6424000 ef78f4d8 00000002 ecd5e900
[   15.454585] 5ee0: 00000000 c6425f88 0000833d c020fae4 c6424000 00000200 00000000 c037fe30
[   15.462754] 5f00: 0000833d c020fae4 c6424000 81648ac0 0000833d 00000000 00002000 0000633d
[   15.470921] 5f20: c6425f0c 00000001 ecd5e900 00000000 00000000 00000000 00000000 00000000
[   15.479090] 5f40: 00000000 00000000 816488b0 81648ac0 ecd5e900 c6425f88 0000833d c03804f8
[   15.487259] 5f60: ecd5e900 81648ac0 0000833d ecd5e900 ecd5e900 81648ac0 0000833d c020fae4
[   15.495428] 5f80: c6424000 c0380dac 00000000 00000000 0000833d 816468f8 00000002 00000003
[   15.503596] 5fa0: 00000003 c020fad0 816468f8 00000002 00000003 81648ac0 0000833d 81648ac0
[   15.511765] 5fc0: 816468f8 00000002 00000003 00000003 0000833d 00000000 7f5f7b9c 00000000
[   15.519933] 5fe0: 00000000 bef285fc 7f5fb358 b6e26a70 60010010 00000003 00000000 00000000
[   15.528114] [<c0222f84>] (__kunmap_atomic) from [<c052d9f0>] (copy_page_to_iter+0x15c/0x258)
[   15.536549] [<c052d9f0>] (copy_page_to_iter) from [<c0342ee4>] (shmem_file_read_iter+0x218/0x2a0)
[   15.545420] [<c0342ee4>] (shmem_file_read_iter) from [<c037fe30>] (__vfs_read+0xb0/0xd8)
[   15.553506] [<c037fe30>] (__vfs_read) from [<c03804f8>] (vfs_read+0x8c/0x13c)
[   15.560632] [<c03804f8>] (vfs_read) from [<c0380dac>] (SyS_read+0x48/0x88)
[   15.567503] [<c0380dac>] (SyS_read) from [<c020fad0>] (__sys_trace_return+0x0/0x10)
[   15.575153] Code: e1a03603 e0632002 e1540002 0a000000 (e7f001f2) 
[   15.581240] ---[ end trace cb88537fdc8fa202 ]---
Comment 8 Zbigniew Jędrzejewski-Szmek 2015-07-28 11:14:17 EDT
(In reply to Mike Ruckman from comment #0)
> If you attempt to check the media at boot time it also fails to start with:
> [FAILED] Failed to start checkisomd5@-dev-sr0.service
This should be fixed already, bug #1241704.
Comment 9 Peter Robinson 2015-07-28 12:02:47 EDT
> This is the first major oops in your case.  It looks ARM specific. 
> Everything after it is already suspect.
> 
> If you'd like, split your issue out into a separate bug.  We'll focus on

Yep, that's fine.

> i686 in this one.  rc3.git4.1 booted in our i686 KVM test setup and has
> booted elsewhere for others as well given we have submitted test results
> from it.
> 
> [   15.294300] ------------[ cut here ]------------
> [   15.298919] kernel BUG at arch/arm/mm/highmem.c:114!

Just for reference the only commit that has gone anywhere near this file in recent history (5 Dec 14) is:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2cb7c9cb426660b5ed58b643d9e7dd5d50ba901f
Comment 10 Mike Ruckman 2015-07-28 13:10:21 EDT
Created attachment 1057077 [details]
full boot log from a VM

(In reply to Josh Boyer from comment #4)
> Which kernel version?  Do you have the full boot log?  That would be very
> helpful.

Kernel version: 4.2.0-0.rc3.git4.1.fc23. Full boot log attached.

> Nobody has reported this on a 64-bit architecture that I'm aware of.

Far as I know, this is only a 32bit issue.
Comment 11 Adam Williamson 2015-07-28 14:38:43 EDT
Just to make sure it's clear, AFAIK everyone who's tried recent i686 F23 DVD / boot.iso images (Alpha TC2 and nightlies) on any hardware has hit this, it appears to be a showstopper for those images. It's not system specific, but it *is* particular to the i686 images, it does not happen with the x86_64 images.

i686 is still a Fedora primary architecture, so far as I'm aware. Showstoppers in primary arch release-blocking images are release blockers. If kernel team is sufficiently down on i686 that Fedora can't reasonably keep it as a primary arch that's obviously a conversation that needs to happen, but it seems like something that should be considered calmly and through appropriate processes ideally *outside* of an already-happening release cycle, not after the fact of us discovering blocker bugs...
Comment 12 Josh Boyer 2015-07-29 09:16:41 EDT
(In reply to Adam Williamson from comment #11)
> Just to make sure it's clear, AFAIK everyone who's tried recent i686 F23 DVD
> / boot.iso images (Alpha TC2 and nightlies) on any hardware has hit this, it
> appears to be a showstopper for those images. It's not system specific, but
> it *is* particular to the i686 images, it does not happen with the x86_64
> images.

Does it require physical hardware?  Our autotest setup boots 32-bit VM images for every build and has not seen this issue.

https://apps.fedoraproject.org/kerneltest/kernel/4.2.0-0.rc3.git4.1.fc23.i686+PAE
https://apps.fedoraproject.org/kerneltest/kernel/4.2.0-0.rc4.git0.1.fc23.i686+PAE
https://apps.fedoraproject.org/kerneltest/kernel/4.2.0-0.rc4.git1.1.fc23.i686+PAE


> i686 is still a Fedora primary architecture, so far as I'm aware.

Correct.

> Showstoppers in primary arch release-blocking images are release blockers.

Also correct.

> If kernel team is sufficiently down on i686 that Fedora can't reasonably

It has nothing to do with being "down" on anything.  It has everything to do with only having 3 people and having to make priority calls.

> keep it as a primary arch that's obviously a conversation that needs to
> happen, but it seems like something that should be considered calmly and
> through appropriate processes ideally *outside* of an already-happening
> release cycle, not after the fact of us discovering blocker bugs...

It isn't after the fact.  We announced quite a while ago that i686 kernel issues would be the lowest priority and we have an needed a community behind them.  As far as I'm aware, Fedora is still a community driven distribution.
Comment 13 Josh Boyer 2015-07-29 10:16:30 EDT
Has anyone determined the earliest ISO image that fails?
Comment 14 Paul Whalen 2015-07-29 10:31:54 EDT
kernel-4.2.0-0.rc3.git4.1.fc23 also boots fine on ARM (with the highmem OOPS). The 'Bad page state' bug only occurs during an installation attempt.
Comment 15 Mike Ruckman 2015-07-29 11:57:12 EDT
(In reply to Josh Boyer from comment #12)
> (In reply to Adam Williamson from comment #11)
> > Just to make sure it's clear, AFAIK everyone who's tried recent i686 F23 DVD
> > / boot.iso images (Alpha TC2 and nightlies) on any hardware has hit this, it
> > appears to be a showstopper for those images. It's not system specific, but
> > it *is* particular to the i686 images, it does not happen with the x86_64
> > images.
> 
> Does it require physical hardware?  Our autotest setup boots 32-bit VM
> images for every build and has not seen this issue.
> 
> https://apps.fedoraproject.org/kerneltest/kernel/4.2.0-0.rc3.git4.1.fc23.
> i686+PAE
> https://apps.fedoraproject.org/kerneltest/kernel/4.2.0-0.rc4.git0.1.fc23.
> i686+PAE
> https://apps.fedoraproject.org/kerneltest/kernel/4.2.0-0.rc4.git1.1.fc23.
> i686+PAE

It doesn't require physical hardware. I see the same results on 32bit bare metal and launching from virt-manager (which is where that boot log came from). I'll try to dig into which was the last kernel that this started in.
Comment 16 Adam Williamson 2015-07-29 13:04:34 EDT
"We announced quite a while ago that i686 kernel issues would be the lowest priority and we have an needed a community behind them."

Sure, but given that it didn't come along with a proposal to downgrade i686 from being a Fedora primary arch, I kind of took it as read that it didn't involve the possibility of not fixing release blocker bugs.

As Mike says, the bug is trivially reproducible in a VM but seems to be specific to the traditional installer images in some way (i.e. it's probably something specific they do during boot that triggers it). That's probably why your tests didn't hit it.

I'm (re-)setting up a boot.iso building environment now, so I can help with the requested testing for this. However, it's not very easy (or really even possible, I don't think) to build images with old kernels for testing - dnf is always going to want to install the most recent kernel, even if we provide a side repo with an older one it'll pull the newer one from the official repo when populating the environment.

If the problem cropped up some time since July 1 I should be able to bisect it down reasonably well, as we still have the nightlies from that date onwards archived; if it appeared before then, though, triaging it will be hard.
Comment 17 Josh Boyer 2015-07-29 13:12:42 EDT
(In reply to Adam Williamson from comment #16)
> "We announced quite a while ago that i686 kernel issues would be the lowest
> priority and we have an needed a community behind them."
> 
> Sure, but given that it didn't come along with a proposal to downgrade i686
> from being a Fedora primary arch, I kind of took it as read that it didn't
> involve the possibility of not fixing release blocker bugs.

<ignoring this as it is irrelevant to this bug.  conversation to be had elsewhere.>

> As Mike says, the bug is trivially reproducible in a VM but seems to be
> specific to the traditional installer images in some way (i.e. it's probably
> something specific they do during boot that triggers it). That's probably
> why your tests didn't hit it.

Yes, possibly.  Also terrible if so because the turn around time is really long in such scenarios, but alas.

> I'm (re-)setting up a boot.iso building environment now, so I can help with
> the requested testing for this. However, it's not very easy (or really even
> possible, I don't think) to build images with old kernels for testing - dnf
> is always going to want to install the most recent kernel, even if we
> provide a side repo with an older one it'll pull the newer one from the
> official repo when populating the environment.

That will at least be helpful for testing side-builds of test kernels.  One might show up today.

> If the problem cropped up some time since July 1 I should be able to bisect
> it down reasonably well, as we still have the nightlies from that date
> onwards archived; if it appeared before then, though, triaging it will be
> hard.

If this bug has been sitting here for almost a month and this is the first report we've gotten on it, then it strikes me as a serious gap in coverage somewhere.  Here's hoping it came in more recently.
Comment 18 Adam Williamson 2015-07-29 13:22:03 EDT
Unfortunately the oldest bootable nightly we still have - 2015-07-04 - has this bug (07-01 to 07-03 had a syslinux bug and don't make it to the boot menu). So we can say it existed at least as early as kernel-4.2.0-0.rc0.git4.1.fc23 (which is what's in that nightly tree), but it's hard to be more precise. I'll see if I can do anything by building images off of F22.
Comment 19 Chris Murphy 2015-07-29 13:48:35 EDT
DMI: Dell Computer Corporation Latitude D600  /0G5152, BIOS A16 06/29/2005
On Fedora 22 (server) with these kernels, I cannot reproduce this bug.
4.2.0-0.rc3.git4.1.fc23.i686
4.2.0-0.rc3.git4.1.fc23.i686+debug
4.2.0-0.rc4.git0.1.fc23.i686

So it must be something rather unique about the install environment poking the kernel to trigger this.
Comment 20 Bruno Wolff III 2015-07-29 16:07:28 EDT
I tried the Fedora-Live-Games-i686-23-20150729.iso image on a usb drive and it booted fine. I can't conveniently test installs right now.
Comment 21 Adam Williamson 2015-07-29 17:11:05 EDT
So I did some package-level bisecting on this, building F22 images with F23 kernels on top.

Result is that an image with kernel-4.1.0-0.rc5.git0.1.fc23 works, an image with kernel-4.1.0-0.rc6.git0.1.fc23 hits the error (and it's definitely the same error, we checked). So this came in somewhere between 4.1rc5 and 4.1rc6, it appears.

For those keeping track of how much we suck at finding bugs - rc6 was built on 2015-06-01, so Rawhide/23 i686 installer images have likely been broken for about two months. For the record, this bug falls neatly in a gap between several automated testing systems: kernel team tests kernel images, *including* 32-bit ones, but doesn't test the installer environment. anaconda and QA teams both have tests for installer images, but both only test x86_64 ones at present. We did have several 'nominated' pre-Alpha TC test events, but it looks like no-one happened to choose to test the i686 installer images in those.
Comment 22 Felix Miata 2015-07-30 06:37:33 EDT
I just dnf distro-sync'd from F22 to F23 on i386 host gx27b. The new kernel/initrd 4.2.0-0.rc4 doesn't seem to boot any differently than F22's 4.1.2 did, ~53 seconds to reach a working multi-user.target shell prompt.
Comment 23 Josh Boyer 2015-07-30 10:27:35 EDT
(In reply to Adam Williamson from comment #21)
> So I did some package-level bisecting on this, building F22 images with F23
> kernels on top.
> 
> Result is that an image with kernel-4.1.0-0.rc5.git0.1.fc23 works, an image
> with kernel-4.1.0-0.rc6.git0.1.fc23 hits the error (and it's definitely the
> same error, we checked). So this came in somewhere between 4.1rc5 and
> 4.1rc6, it appears.

I have more kernels for testing (when they complete building)

rc5-git1: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542372
rc5-git2: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542418
rc5-git3: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542528
rc5-git4: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542589
rc5-git5: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542687
rc5-git6: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542748
rc5-git7: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542800
rc5-git8: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542841
rc5-git9: https://koji.fedoraproject.org/koji/taskinfo?taskID=10542886
Comment 24 Adam Williamson 2015-07-30 10:59:28 EDT
Felix: we already know that. There is something in the installer image boot process which triggers the kernel problem.
Comment 25 Adam Williamson 2015-07-30 16:42:08 EDT
rc5-git5 is good, rc5-git6 is bad.
Comment 26 Josh Boyer 2015-07-30 19:07:55 EDT
(In reply to Adam Williamson from comment #25)
> rc5-git5 is good, rc5-git6 is bad.

Here are the commits between those two snapshots:

[jwboyer@vader linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0 path multipath mapping on blk-mq devices
e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in free_rq_clone() for unmapped requests
45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path
3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED
4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
[jwboyer@vader linux]$ 

It is interesting to note that we're also carrying a patch in our 4.1 kernel for loop performance reasons that went into upstream 4.2.  That patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which corresponds to upstream commit 4d4e41aef9429872ea3b105e83426941f7185ab6
Comment 27 Josh Boyer 2015-07-31 07:34:49 EDT
More kernels to test, this time one for each commit mentioned above.

rc5.git5.2: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557623 (0001-dm-run-queue-on-re-queue.patch)

rc5.git5.3: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557656 (0002-dm-fix-NULL-pointer-when-clone_and_map_rq-returns-DM.patch)

rc5.git5.4: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557701 (0003-dm-mpath-fix-leak-of-dm_mpath_io-structure-in-blk-mq.patch)

rc5.git5.5: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557753 (0004-dm-requeue-from-blk-mq-dm_mq_queue_rq-using-BLK_MQ_R.patch)

rc5.git5.6: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557799 (0005-dm-fix-false-warning-in-free_rq_clone-for-unmapped-r.patch)

rc5.git5.7: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557857 (0006-dm-fix-reload-failure-of-0-path-multipath-mapping-on.patch)

rc5.git5.8: https://koji.fedoraproject.org/koji/taskinfo?taskID=10557909 (0007-dm-fix-casting-bug-in-dm_merge_bvec.patch)

Last night Mike mentioned that he thinks dm-fix-casting-bug-in-dm_merge_bvec would be the only thing relevant.  They should all complete building at a similar time, so testing them in either increasing or decreasing order is fine.
Comment 28 Adam Williamson 2015-07-31 14:26:38 EDT
So based on that I took an educated guess and tried rc5.git5.7 first - and indeed it boots. So it looks a lot like 0007-dm-fix-casting-bug-in-dm_merge_bvec.patch is the culprit indeed. I'll confirm that git5.8 doesn't boot in a few minutes.
Comment 29 Adam Williamson 2015-07-31 14:43:00 EDT
Yep, git5.8 fails, so that's definitely our culprit.
Comment 30 Josh Boyer 2015-07-31 16:31:23 EDT
https://koji.fedoraproject.org/koji/taskinfo?taskID=10564890 is a scratch build of today's kernel with that commit reverted.
Comment 31 Adam Williamson 2015-07-31 16:36:24 EDT
I think we can call this an automatic blocker: it satisfies " Complete failure of any release-blocking TC/RC image to boot at all under any circumstance - "DOA" image (conditional failure is not an automatic blocker) ", so far as reports indicate, there is no known circumstance in which the i686 netinst/DVD images boot to any kind of usable state.

https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Automatic_blockers
Comment 32 Adam Williamson 2015-08-01 01:24:13 EDT
The scratch build in #c30 works. Of course just reverting the commit probably isn't the right thing to do, but it does make it boot.
Comment 33 Josh Boyer 2015-08-03 10:52:04 EDT
(In reply to Adam Williamson from comment #32)
> The scratch build in #c30 works. Of course just reverting the commit
> probably isn't the right thing to do, but it does make it boot.

I've reverted it in F23 and rawhide with the 4.2-rc5 build.  Hopefully Joe and Mike come up with a better solution before 4.2 final is out.
Comment 34 Josh Boyer 2015-08-03 13:04:38 EDT
Adam, Mike has a similar yet more complete fix for this issue he'd like tested.  I've started another scratch build here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=10588836

Could you test that out when it completes?
Comment 35 Adam Williamson 2015-08-03 20:49:16 EDT
The build from #c34 seems to work, yep.
Comment 36 Fedora Update System 2015-08-04 03:03:36 EDT
kernel-4.2.0-0.rc5.git0.2.fc23 has been submitted as an update for Fedora 23.
https://admin.fedoraproject.org/updates/kernel-4.2.0-0.rc5.git0.2.fc23
Comment 37 Fedora Update System 2015-08-04 20:12:16 EDT
Package kernel-4.2.0-0.rc5.git0.2.fc23:
* should fix your issue,
* was pushed to the Fedora 23 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-4.2.0-0.rc5.git0.2.fc23'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-12746/kernel-4.2.0-0.rc5.git0.2.fc23
then log in and leave karma (feedback).
Comment 38 Fedora Update System 2015-08-06 02:01:56 EDT
kernel-4.2.0-0.rc5.git0.2.fc23 has been pushed to the Fedora 23 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 39 Adam Williamson 2015-09-11 14:50:24 EDT
This seems to have re-appeared in Rawhide on 2015-09-04. Rawhide 2015-09-03 had kernel-4.2.0-1.fc24 , Rawhide 2015-09-04 had kernel-4.3.0-0.rc0.git6.1.fc24 . The 32-bit openQA tests on 09-03 did not hit this bug, the 32-bit openQA tests on 09-04 did hit what looks to be this same bug or something very similar, so this seems to have somehow reappeared in kernel 4.3.

F23 is still fine, so dropping the F23 blocker stuff.
Comment 40 Josh Boyer 2015-09-11 15:31:53 EDT
The function that was fixed in 4.2 doesn't exist any longer in 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux v4.2-6105-gdd5cdb48edfd which contains commit 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So whatever fix was made in dm_merge_bvec doesn't seem to have made it to whatever replaced it.
Comment 41 Mike Snitzer 2015-09-11 17:29:06 EDT
(In reply to Josh Boyer from comment #40)
> The function that was fixed in 4.2 doesn't exist any longer in
> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
> v4.2-6105-gdd5cdb48edfd which contains commit
> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
> whatever replaced it.

The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is a clear equivalent in the late bio splitting code that replaced block core's merge_bvec logic.

merge_bvec was all about limiting bios (by asking "can/should this page be added to this bio?") whereas the late bio splitting is more "build the bios as large as possible and worry about splitting later".

Regardless, this regression needs to be reported to Ming Lin <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in maintaining the late bio splitting changes in block core.
Comment 42 Josh Boyer 2015-09-15 09:50:01 EDT
Adam, could you try this scratch build when it completes?  It contains the patch Ming requested testing on.

http://koji.fedoraproject.org/koji/taskinfo?taskID=11094555
Comment 43 Josh Boyer 2015-09-17 12:50:48 EDT
The patch was added to rawhide and it's queued for upstream inclusion.  Closing again.

Note You need to log in before you can comment on or make changes to this bug.