Bug 1514686 - Workstation-27-1.6.aarch64 hangs during boot
Summary: Workstation-27-1.6.aarch64 hangs during boot
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: aarch64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-18 01:57 UTC by jeff
Modified: 2018-11-19 15:51 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 15:51:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg-hdmi (36.90 KB, text/plain)
2018-05-27 16:17 UTC, Tom Lane
no flags Details
dmesg-dvi (60.43 KB, text/plain)
2018-05-27 16:18 UTC, Tom Lane
no flags Details

Description jeff 2017-11-18 01:57:15 UTC
Description of problem:

boot hangs
the last line printed is
[23.8] fb: switching to vc4drmfb from EFI VGA

I had a similar problem with fedora 26, see bug report 1471324
It might be nice if fedora worked on a rasberry pi 3.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.make a micro sd card
2.expand the partition
3.try to boot
4.cry

Actual results:

boot hangs

Expected results:

successful boot

Additional info:

I have a raspberry pi 3, and I am using a 32gb micro sd card.

Comment 1 Mark Lamourine 2017-12-02 16:21:59 UTC
I find the same behavior.  I am using a single port 2400ma power supply.  During the boot process I see the "low power" lightning bolt go on and off.  The red activity light continues changing as does the low power indicator.  The display does not progress, but it is not clear that the boot process has stopped.

No response to the keyboard.

I am going to try booting before expanding the / partition.

I also have serial port adapter coming and will see if I can get anything on that.

Comment 2 jeff 2017-12-02 16:35:45 UTC
I get some video, but after awhile, it freezes.
As I said in my initial report, the last line displayed is:
[23.8] fb: switching to vc4drmfb from EFI VGA

It sure seems as if the boot process is stuck / hung.

Comment 3 Mark Lamourine 2017-12-02 22:12:38 UTC
I enabled the uart and added a serial console.

The system continues booting and I was able to run the initialization to completion using the serial console.

The HDMI video however remained stuck at the indicated point.

The system hasn't hung, it's just stopped displaying to the monitor in use.

Comment 4 jeff 2017-12-02 22:32:50 UTC
That seems like a serious problem, as video is pretty useful.

I reported a similar issue with fedora 26, but nobody every responded to my bug report.

Comment 5 Brendan Conoboy 2017-12-07 18:16:40 UTC
Hi guys, I'm switching this from arm-boot-config to kernel.  The arm-boot-config package is a legacy wrapper uboot script generation tool.  What you're seeing looks more like a kernel or framebuffer issue.

Comment 6 Jussi Eloranta 2017-12-23 04:41:49 UTC
It is most likely the vc4 GPU driver, which has been messed up since day one. Just blacklist that in /etc/modprobe.d by creating blacklist-vc4.conf that contains single line:
blacklist vc4

I don't remember but you may also have run dracut to regenerate the ramdisk (?) Without vc4 at least my pi 3 is very stable with fedora. Of course, no playing youtube videos or anything that requires the GPU.

Comment 7 Jussi Eloranta 2017-12-23 04:44:28 UTC
Forgot to say, since you are on aarch64, the standard framebuffer seemed to have the colors mixed up (at least a while back it did). I reported that but I did not see any activity around the report, so the problem is still likely there. If you run 32bit, this problem is not there.

Comment 8 Laura Abbott 2018-02-20 19:56:20 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  As kernel maintainers, we try to keep up with bugzilla but due the rate at which the upstream kernel project moves, bugs may be fixed without any indication to us. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.
 
Fedora 27 has now been rebased to 4.15.3-300.f27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you experience different issues, please open a new bug report for those.

Comment 9 jeff 2018-02-20 20:51:02 UTC
It is not clear how a kernel update can be tested unless a new install image has been created, as this issue occurs during the first boot. If there is a new install image, please provide a URL to it.

As I said initially:

Steps to Reproduce:
1.make a micro sd card
2.expand the partition
3.try to boot
4.cry

Comment 10 Tom Lane 2018-05-26 22:04:45 UTC
FWIW, I'm seeing more or less the same behavior with current F28 on an RPI 3B+.  However, it's not 100% reproducible: about one time in three or four, the boot succeeds.  Here's some of the dmesg output after a successful boot:

[   23.345967] rc rc0: RC for vc4 as /devices/platform/soc/3f902000.hdmi/rc/rc0
[   23.368296] input: RC for vc4 as /devices/platform/soc/3f902000.hdmi/rc/rc0/input3
[   23.411773] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok
[   23.440752] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[   23.473579] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
[   23.473806] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[   23.474157] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[   23.474443] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[   23.474686] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[   23.527496] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
[   23.589012] checking generic (3e8a8000 753000) vs hw (0 ffffffffffffffff)
[   23.589036] fb: switching to vc4drmfb from EFI VGA
[   23.604884] Console: switching to colour dummy device 80x25
[   23.618011] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[   23.624867] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   23.631664] [drm] Driver supports precise vblank timestamp query.
[   23.684839] Console: switching to colour frame buffer device 240x67
[   23.757904] vc4-drm soc:gpu: fb0:  frame buffer device

What I see is that the screen goes black for a second or two (much longer than you'd think from this trace) after the "fb: switching ..." message and then comes back.  On an unsuccessful boot, it just goes black, and after a few seconds my monitor indicates it's getting no video signal.

Currently using kernel-4.16.11-300.fc28.aarch64 but the last couple of kernel updates acted the same.  While I can't positively swear to this, it seemed like there wasn't a problem when I first started playing around with the 3B+, around the beginning of May; at least, the crash frequency seemed a lot lower than it is now.

Comment 11 Tom Lane 2018-05-26 22:17:53 UTC
(In reply to Jussi Eloranta from comment #6)
> It is most likely the vc4 GPU driver, which has been messed up since day
> one. Just blacklist that in /etc/modprobe.d by creating blacklist-vc4.conf
> that contains single line:
> blacklist vc4

This works for me ... be nice to have the GPU though

Comment 12 Jussi Eloranta 2018-05-27 02:08:01 UTC
Still the same situation with 4.6.11. Booting with vc4 enabled ends up in a loop trying to sync with the monitor. No display ever shows up. I am guessing that this is likely a monitor dependent problem. All my monitors are DVI and have HDMI-DVI adapters. This is such a serious problem that vc4 should be disabled by default. It is going to keep lots of people away from using fedora on rpi3.

Comment 13 Tom Lane 2018-05-27 02:41:04 UTC
oooh ... I bet Jussi is on to something.  I too am using an old DVI monitor with an HDMI-to-DVI cable.  But when I was first playing with the RPI, I had it plugged into a newer TV via a native HDMI input --- and that might account for my not having seen the problem initially.  I can't say for certain that the problem started when I changed the display setup, but it seems plausible.

Comment 14 Tom Lane 2018-05-27 16:17:52 UTC
Created attachment 1442421 [details]
dmesg-hdmi

dmesg output after successful boot with HDMI cable

Comment 15 Tom Lane 2018-05-27 16:18:51 UTC
Created attachment 1442422 [details]
dmesg-dvi

dmesg output after failed boot with HDMI-to-DVI cable

Comment 16 Tom Lane 2018-05-27 16:32:05 UTC
OK, so after further experimentation I think Jussi's theory is on the nose.  Using a plain HDMI/HDMI cable into an HDMI-input TV, 4.16.11-300.fc28.aarch64 boots successfully every time: I did it ten times in a row without a problem.  Same kernel, same RPI, HDMI-to-DVI cable into a DVI monitor, it usually fails.  I can now also confirm the theory that the kernel hasn't crashed; it just takes significantly longer than normal to reach the point of responding to the network, and I'd not waited long enough for that.

I've attached dmesg output from both cases for comparison's sake, but the thing that is probably telling the tale is the repeated stack traces mentioning drm_atomic_helper_wait_for_vblanks in the dmesg output for the failure case.  Also note how "wlan0: link becomes ready" comes out a full minute later in one trace than the other.

I note that even in the working setup, the screen goes blank for ~2.75 seconds after the "fb: switching" message, which seems like it's not working quite right.

In case it matters, the HDMI-DVI cable I'm using is this:
https://www.amazon.com/gp/product/B014I8UQJY/ref=oh_aui_detailpage_o05_s00?ie=UTF8&psc=1

and the DVI monitor is this:
https://www.amazon.com/gp/product/B003HFCDLY/ref=oh_aui_search_detailpage?ie=UTF8&psc=1

Comment 17 Tom Lane 2018-05-27 17:47:12 UTC
Some further notes after a few more reboot cycles:

* Although the dmesg output seems to quiesce after a minute or so of "vblank wait timed out" complaints, there's still some lingering unhappiness somewhere, because after a failed sync shutdown will take about a minute longer than normal.  Presumably that's some other timeout, but I don't know how to investigate it.  sshd kicks me out immediately after the shutdown command, but the wifi interface continues to respond to pings for just about exactly one minute; once that stops, shutdown seems to proceed as normal.

* The behavior isn't 100% reproducible.  As I noted earlier, one time in four or so, it'll successfully sync.  I've also seen it take longer than depicted in "dmesg-dvi" to reach the wifi-alive point in a failure, and even not do so at all --- or at least, I gave up waiting after a full ten minutes.  So I'm no longer convinced that I was being unreasonably impatient when I concluded earlier that this was a kernel crash.  Maybe I just saw a couple of the very-slow or not-at-all cases.

Comment 18 Jussi Eloranta 2018-05-27 18:00:15 UTC
With my monitor dvi adapter it happens every time (tried about ten times). Otherwise, the symptoms are exactly the same.

Comment 19 Justin M. Forbes 2018-07-23 15:23:15 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.

Fedora 27 has now been rebased to 4.17.7-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28.

If you experience different issues, please open a new bug report for those.

Comment 20 Tom Lane 2018-07-23 16:16:55 UTC
This bug is certainly still there in recent F28 kernels, as per comments above.  I don't seem to have permissions to change the version field, though.

Comment 21 Laura Abbott 2018-10-01 21:27:03 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs.
 
Fedora 27 has now been rebased to 4.18.10-100.fc27.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 28 or Fedora 29, and are still experiencing this issue, please change the version to Fedora 28 or 29.
 
If you experience different issues, please open a new bug report for those.

Comment 22 Tom Lane 2018-10-04 23:14:29 UTC
AFAICT, this bug is fixed as of kernel 4.18.11-200.fc28.aarch64.  I just did four consecutive reboots without seeing the hang, whereas previously it hung much more often than not.

Comment 23 jeff 2018-11-19 15:51:02 UTC
Fixed in fedora 29.
Too bad it took a year to fix.


Note You need to log in before you can comment on or make changes to this bug.