Bug 1002732
Summary: | anaconda intermittently fails to install f19 on Calxeda Midway hardware | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mark Langsdorf <mark.langsdorf> | ||||||
Component: | kernel | Assignee: | Kyle McMartin <kmcmartin> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 19 | CC: | blc, gansalmon, g.kaviyarasu, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, marcelo.barbosa, mark.langsdorf, mkolman, peterm, sbueno, vanmeeuwen+fedora | ||||||
Target Milestone: | --- | Flags: | jforbes:
needinfo?
|
||||||
Target Release: | --- | ||||||||
Hardware: | arm | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-03-10 14:42:35 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Mark Langsdorf
2013-08-29 19:57:07 UTC
(In reply to Mark Langsdorf from comment #0) ... > Steps to reproduce > 1. Install f19 using PXE and the kickstart file found on the fedora project > wiki. ... Thanks for your report. Could you provide a link to the kickstart file you "found on the fedora project wiki"? http://fedorapeople.org/~pwhalen/f19/f19-highbank.ks, from http://fedoraproject.org/wiki/Architectures/ARM/F19/Installation Thanks. $ curl --location http://download.fedoraproject.org/pub/fedora-secondary/releases/19/Fedora/armhfp/os curl: (6) Could not resolve host: mirror.chpc.utah.edu; Name or service not known works for me mlangsdorf@mjl-dell:~/work/tmp$ curl --location http://download.fedoraproject.org/pub/fedora-secondary/releases/19/Fedora/armhfp/os <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <html> <head> <title>Index of /fedora-secondary/releases/19/Fedora/armhfp/os</title> </head> <body> <h1>Index of /fedora-secondary/releases/19/Fedora/armhfp/os</h1> <pre><a href="?C=N;O=D">Name</a> <a href="?C=M;O=A">Last modified</a> <a href="?C=S;O=A">Size</a> <hr><a href="/fedora-secondary/releases/19/Fedora/armhfp/">Parent Directory</a> - <a href="LiveOS/">LiveOS/</a> 28-Jun-2013 17:27 - <a href="Packages/">Packages/</a> 28-Jun-2013 17:28 - <a href="images/">images/</a> 28-Jun-2013 17:28 - <a href="repodata/">repodata/</a> 28-Jun-2013 17:28 - <a href="repoview/">repoview/</a> 27-Jun-2013 21:17 - <hr></pre> <address>Apache/2.2.15 (Red Hat) Server at mirrors.kernel.org Port 80</address> </body></html> <address>Apache/2.2.15 (Red Hat) Server at mirrors.kernel.org Port 80</address> OK, you are getting redirected to a different mirror.[1] In the interests of reproducibility, it might be better to configure a specific mirror: http://mirrors.fedoraproject.org/publiclist/Fedora/19/armhfp/ [1] This shows the redirect: $ curl -I http://download.fedoraproject.org/pub/fedora-secondary/releases/19/Fedora/armhfp/os HTTP/1.1 302 FOUND Date: Fri, 30 Aug 2013 16:11:12 GMT Server: Apache/2.2.15 (Red Hat) cache-control: no-cache location: http://mirror.chpc.utah.edu/pub/fedora-secondary/releases/19/Fedora/armhfp/os/ AppTime: D=25678 AppServer: mirrorlist-osuosl.fedoraproject.org Content-Type: text/html; charset=utf-8 ProxyTime: D=208047 ProxyServer: proxy04.fedoraproject.org I haven't tested this, but these might work better: url --url="http://mirrors.kernel.org/fedora-secondary/releases/19/Everything/armhfp/os/" repo --name=updates "http://mirrors.kernel.org/fedora-secondary/updates/19/armhfp/" == $ curl -I "http://mirrors.kernel.org/fedora-secondary/releases/19/Everything/armhfp/os/" HTTP/1.1 200 OK Date: Fri, 30 Aug 2013 16:34:54 GMT Server: Apache/2.2.15 (Red Hat) Content-Type: text/html;charset=UTF-8 $ curl -I "http://mirrors.kernel.org/fedora-secondary/updates/19/armhfp/" HTTP/1.1 200 OK Date: Fri, 30 Aug 2013 16:35:14 GMT Server: Apache/2.2.15 (Red Hat) Content-Type: text/html;charset=UTF-8 (In reply to Steve Tyler from comment #6) > I haven't tested this, but these might work better: > > url > --url="http://mirrors.kernel.org/fedora-secondary/releases/19/Everything/ > armhfp/os/" > repo --name=updates > "http://mirrors.kernel.org/fedora-secondary/updates/19/armhfp/" ... OK, I tested these, and there needs to be a patch ... :-) "--baseurl=" needs to be added to the repo option: url --url="http://mirrors.kernel.org/fedora-secondary/releases/19/Everything/armhfp/os/" repo --name=updates --baseurl="http://mirrors.kernel.org/fedora-secondary/updates/19/armhfp/" With x86_64, the install went fine, and the installed system booted to a root login. The kickstart file was not otherwise changed: url --url="http://mirrors.kernel.org/fedora/releases/19/Everything/x86_64/os/" repo --name=updates --baseurl="http://mirrors.kernel.org/fedora/updates/19/x86_64/" Tested in a VM with: $ qemu-kvm -m 4096 -hda f19-test-2.img -cdrom ~/xfr/fedora/F19/Fedora-19-x86_64-DVD.iso -vga std -boot menu=on Created attachment 792311 [details] f19-highbank.ks (reference copy to preserve with this bug report) http://fedorapeople.org/~pwhalen/f19/f19-highbank.ks (Comment 2) Created attachment 792313 [details] ks-bz1002732-1.cfg used for test in Comment 7. There's no difference between attachment 792311 [details] and Paul's original ks file. And attachment 792313 [details] is self-evidently not going to work on Calxeda Midway hardware, given that a Calxeda Midway is a 32-bit ARM part and is not going to work with 64-bit x86 RPMs. So I guess I don't understand if those are supposed to be solutions to my problem, and if so, how? FWIW, the same ks file consistently works on the Calxeda Highbank hardware. (In reply to Mark Langsdorf from comment #10) > There's no difference between attachment 792311 [details] and Paul's > original ks file. Right. It is better to attach files than to link files, so it is a reference copy that will not disappear at the whim of a user on an external site. Since I asked you for the link, I made the attachment myself. Ordinarily, if a bug reporter gives a link to an external site, I refuse to follow it and ask the bug reporter to provide a proper attachment. So I was doing you a favor ... :-) > And attachment 792313 [details] is self-evidently not going to work on > Calxeda Midway hardware, given that a Calxeda Midway is a 32-bit ARM part > and is not going to work with 64-bit x86 RPMs. Right, again. qemu may support arm emulation, but I am familiar with x86_64. If Calxeda Midway would like to ship me a test system, I would be delighted to make room for it my computer room. :-) > So I guess I don't understand if those are supposed to be solutions to my > problem, and if so, how? 1. We have confirmed that an arm mirror is down or unreliable. Since there is no guarantee that you will get kernel.org every time you run your kickstart file, you may have been getting the bad mirror site sometimes. That could explain the intermittent hangs you are seeing. If your net admins could look at your connection logs to verify what sites you were connecting to, that would be great. 2. Verifying the kickstart file with an independent test install, as I have done, tends to rule out any other problems with it. 3. BZ is a Fedora bug reporting site, not a technical support site. If you want professional tech support, I suggest you purchase Red Hat support: http://www.redhat.com/contact/sales.html > FWIW, the same ks file consistently works on the Calxeda Highbank hardware. AFAICT, the urls are the problem, not the kickstart file. Please try the urls I suggested in Comment 7. That will help rule out a problem with the mirrors. After that we can work on why there is a difference between Midway and Highbank. Are you able to switch to the installer console? There are normally installer logs in /tmp, but it is not clear from your report if you are able to switch to a console. Press ctrl-alt-f2, if possible. Otherwise, I would suggest connecting a keyboard and monitor to the system under test, so you don't lose control when the installer hangs. Sometimes the installer hangs, but the kernel and console remain functional. Do you have direct access to the system? Steve, These systems don't have a keyboard or a monitor. The console is Serial Over Lan. Think vt100. Is there a vt100 keyboard sequence to switch consoles? I'm also seeing the same symptom Mark has reported on Calxeda systems inside Red Hat. The system hangs on the first attempt to install, but succeeds on the second. It's improbably consistent. FYI, there is no way Calxeda can buy Red Hat support for Fedora :-) ctrl-alt-f3 gets intercepted by host computer and does not get sent over the SoL link. these are headless systems without keyboard, mouse, video, USB, floppy drives, CD-ROM drives, DVD drives, PCI, PCMCIA, or most other peripheral busses and interfaces. The only way to talk to them is a serial-over-lan connection that acts as a serial console. Thanks, Brendan and Mark. Ordinarily, you can get an installer shell console by pressing ctrl-alt-f2, and you can see various logs on the various console ttys by pressing ctrl-alt-fN. If you can't get in that way, you might be able to enable sshd on the kernel command-line by appending "inst.sshd" and then logging in via ssh over the network.[1] How are you booting the system? What install media are you using? BTW, I meant purchase RHEL support. Someone has to pay RH employees ... :-) For the record, I work for *free*. [1] From the anaconda tarball (https://git.fedorahosted.org/git/anaconda.git): $ less -N anaconda-20.9-1/docs/boot-options.txt ... 262 === inst.sshd === 263 Start up `sshd` during system installation. You can then ssh in while the 264 installation progresses to debug or monitor its progress. 265 266 *NOTE*: The `root` account has _no password by default_. You can set one using 267 the `sshpw` kickstart command. ... The install media is PXE, because there's nothing but network. What do I append the inst.sshd command to? The original PXE command line? Calxeda would be willing* to buy a RHEL support license if we could just convince Red Hat to sell one for ARM computers... * well, we'd probably expect an engineering support license like AMD and Intel have for development efforts, but you know what I mean. (In reply to Mark Langsdorf from comment #14) > ctrl-alt-f3 gets intercepted by host computer and does not get sent over the > SoL link. ctrl-alt-f2 is what is needed to get to the installer shell console. Could you try all of the ctrl-alt-fN keys and report what you see? There should be messages from various anaconda components on several of them. If there are, and you can't get to the installer shell console, could you take digital photos and attach them to this bug report? That may help determine where the hang is occurring. > these are headless systems without keyboard, mouse, video, USB, floppy > drives, CD-ROM drives, DVD drives, PCI, PCMCIA, or most other peripheral > busses and interfaces. The only way to talk to them is a serial-over-lan > connection that acts as a serial console. OK, thanks for clarifying that. This may be Big Iron, but it lacks a lot of features. :-) You do have networking, so you also have the possibility of logging in via ssh, as I noted in Comment 15. Also, do you have a way to do network sniffing? Getting a wireshark capture of any network traffic to and from the system might help determine where the problem is ... (In reply to Mark Langsdorf from comment #16) > The install media is PXE, because there's nothing but network. OK, thanks. > What do I append the inst.sshd command to? The original PXE command line? Since I have zero experience with PXE boots, I will have to research it. Ordinarily, you append those options to the kernel command-line, which looks like this: vmlinuz initrd=initrd.img ... I am guessing that would be on the PXE server somewhere. Or do you get a PXE boot menu? With the installer DVD, if you press Tab at a boot menu item, you can edit the kernel command-line and then press Enter to continue booting. > Calxeda would be willing* to buy a RHEL support license if we could just > convince Red Hat to sell one for ARM computers... Money talks ... > * well, we'd probably expect an engineering support license like AMD and > Intel have for development efforts, but you know what I mean. More money talks louder ... :-) (In reply to Steve Tyler from comment #18) ... > Or do you get a PXE boot menu? With the installer DVD, if you press Tab at a > boot menu item, you can edit the kernel command-line and then press Enter to > continue booting. ... There is indeed a PXE boot menu: There is an example PXE-boot config file midway down here: http://fedoraproject.org/wiki/Architectures/ARM/Anaconda This is the line to modify by appending "inst.sshd": append console=ttyAMA0 ip=eth0:dhcp ks=http://<ServerPathToKickstartConfig>/highbank.ks rd.debug rd.shell cmdline The wiki appears to have an error where it says: "... this file should be placed in the pxelinux.cfg directory ...". That should probably be "the pxelinux directory". The file name contains the MAC address of the system being booting. pxelinux is part of the syslinux package: $ repoquery syslinux syslinux-0:4.05-5.fc19.x86_64 In particular, there is documentation in pxelinux.txt: $ repoquery syslinux --list | grep pxelinux.txt /usr/share/doc/syslinux-4.05/pxelinux.txt "PXELINUX is a Syslinux derivative, for booting Linux off a network server, using a network ROM conforming to the Intel PXE (Pre-Execution Environment) specification." The directory containing the config file is indeed named "pxelinux.cfg": "Finally, create the directory "/tftpboot/pxelinux.cfg". The configuration file (equivalent of syslinux.cfg -- see syslinux.txt for the options here) will live in this directory. Because more than one system may be booted from the same server, the configuration file name depends on the IP address of the booting machine." pxelinux will search for a config file with a name that matches any of several possibilities, including one containing the MAC address. (In reply to Brendan Conoboy from comment #13) ... > I'm also seeing the same symptom Mark has reported on Calxeda systems inside > Red Hat. The system hangs on the first attempt to install, but succeeds on > the second. It's improbably consistent. ... Could you be more precise about what you mean by "first attempt"? First attempt after a cold boot? After a full moon? After what? (In reply to Mark Langsdorf from comment #16) > The install media is PXE, because there's nothing but network. > > What do I append the inst.sshd command to? The original PXE command line? > > Calxeda would be willing* to buy a RHEL support license if we could just > convince Red Hat to sell one for ARM computers... > > * well, we'd probably expect an engineering support license like AMD and > Intel have for development efforts, but you know what I mean. Mark, yes, inst.sshd would be added the pxeboot configuration as one of the "append" arguments to the command line. But there is an easier way. Anaconda runs inside tmux, and we have several other screens running to display logs and provide a shell. If you hit Ctrl-B and then 2, tmux will switch to a shell window where you can get to log in /tmp or see what's running. Hitting Ctrl-B and then 1 will take you back to the first window where anaconda is running. The fact that you're still seeing the "Not asking for VNC..." messages indicates that anaconda has at least started. When this error occurs, can you switch to the shell and see if anaconda (probably as "/usr/bin/python /sbin/anaconda") is still running, and whether there are any log files in /tmp? If you can attach the log files to this bug that would be great. The first attempt to install after a cold boot always fails. The first attempt to re-install after a successful install usually fails. The first attempt to install after a failed install usually succeeds. I can log in via ssh when the install works, but not when it fails. No luck so far in recovering the logs. ctrl-b does not seem to be working for me in either case. I think ^B is failing because the system itself has crashed. This is almost certainly a kernel or hardware issue. Reassigning to kernel, then. When using inst.sshd, sshd is started by systemd before anaconda is started, so there isn't much anaconda can do to break that. I don't know why ^B wouldn't work in the successful cases unless the console program is grabbing the control character. Thanks for the additional info, Mark. Brendan, I'm inclined to agree, but we still need a debugging methodology. Don't your engineers have some way to directly monitor and control the system while booting? I don't understand how you could possibly develop a headless system without such a method. What does manufacturing use for testing? Can you boot into rescue mode and access the system that way? tmux runs in rescue mode, so the ctrl-b method should be applicable.[1] There is also remote logging, but the documentation suggests that it does not get enabled until "installation is running": $ less -N anaconda-19.30.13-1/docs/boot-options.txt ... 290 === inst.syslog === 291 `inst.syslog=<host>[:<port>]`:: 292 Once installation is running, send log messages to the syslog process on 293 the given host. The default port is 514 (UDP). 294 + 295 Requires the remote syslog process to accept incoming connections. ... [1] In rescue mode, anaconda runs under tmux, but it does not install anything. We have a method to directly monitor the system: it's a console on serial-over-lan. It's generally sufficient for our purposes. We also have a management core with a seperate serial log (it's running an RTOS developed in-house). There I'm seeing something interesting: we're getting a resume request on the RTC IRQ, and the management core is responding by resetting the A15 cores that Linux is running on. This is the cause of the Anaconda lock-up. I can't see why the kernel or Anaconda would request an RTC resume without first going into WFI, and I don't know why it would happen only on the first boot after a cold reset. I'm looking into that now. (In reply to Mark Langsdorf from comment #27) > ... resetting the A15 cores that Linux is running on. ... Thanks for the update. That's excellent progress. For future testing, I would suggest booting the installer in rescue mode.[1] That boots the kernel, enables networking, optionally mounts file systems[2], and ends with a shell prompt[3]: Troubleshooting Rescue a Fedora system [1] I am guessing you can reproduce this problem by booting in rescue mode. [2] Tab to Skip to skip mounting file systems. [3] Although anaconda is running, no install is done. Log files are in /tmp, including syslog. Well "ctrl-b 2" doesn't work at all. I tested on two different x86_64 systems after booting a stock F19 DVD to the installer Installation Summary. There is no response whatsoever. ctrl-alt-fN (N in {1, ..., 7}) works as expected. Tested with: Fedora-19-x86_64-DVD.iso on optical media (DVD+R) I retested with f20-alpha-tc2. I'm not experiencing the failure, but anaconda is also failing due to some Python issue so I'm not sure how valid that is: anaconda 20.9-1 for Fedora 20-Alpha-TC2 (pre-release) started. 15:32:10 Not asking for VNC because of an automated install 15:32:10 Not asking for VNC because text mode was explicitly asked for in kickstart Traceback (most recent call last): File "/sbin/anaconda", line 985, in <module> File "/sbin/anaconda", line 615, in setupDisplay anaconda.initInterface(addons) File "/usr/lib/python2.7/site-packages/pyanaconda/__init__.py", line 205, in initInterface self._intf = TextUserInterface(self.storage, self.payload, File "/usr/lib/python2.7/site-packages/pyanaconda/__init__.py", line 155, in storage if self.instClass.defaultFS: File "/usr/lib/python2.7/site-packages/pyanaconda/__init__.py", line 90, in instClass from installclass import DefaultInstall File "/usr/lib/python2.7/site-packages/pyanaconda/installclass.py", line 239, in <module> baseclass = getBaseInstallClass() File "/usr/lib/python2.7/site-packages/pyanaconda/installclass.py", line 215, in getBaseInstallClass allavail = availableClasses(showHidden = 1) File "/usr/lib/python2.7/site-packages/pyanaconda/installclass.py", line 187, in availableClasses obj = loaded.InstallClass AttributeError: 'module' object has no attribute 'InstallClass' Pane is dead ctrl-b 2 to get to the shell works with f20-alpha-tc2. Mark, can you attach the log files from this failure? They should be in /tmp. Also, can you add the contents of /proc/cmdline? This error should only be fatal if anaconda is being run in debug mode, so if you have a way to remove inst.debug from the boot options, please try that. Oops, sorry, didn't see yet that you'd opened a new bug. (In reply to Mark Langsdorf from comment #30) > I retested with f20-alpha-tc2. I'm not experiencing the failure, but > anaconda is also failing due to some Python issue so I'm not sure how valid > that is: ... Thanks for your update. FYI, F20-Alpha-TC3 is available: https://dl.fedoraproject.org/pub/alt/stage/20-Alpha-TC3/ From: Andre Robatino <robatino@...> Subject: Fedora 20 Alpha Test Compose 3 (TC3) Available Now! Newsgroups: gmane.linux.redhat.fedora.test.announce Date: 2013-09-04 17:40:16 GMT (1 hour and 24 minutes ago) http://article.gmane.org/gmane.linux.redhat.fedora.test.announce/750 (In reply to Steve Tyler from comment #29) > Well "ctrl-b 2" doesn't work at all. I tested on two different x86_64 > systems after booting a stock F19 DVD to the installer Installation Summary. > There is no response whatsoever. ctrl-alt-fN (N in {1, ..., 7}) works as > expected. > > Tested with: > Fedora-19-x86_64-DVD.iso on optical media (DVD+R) That was with the installer in graphical mode, which is not applicable to the Calxeda system. With the installer in text mode, "ctrl-b 2" gives you a shell prompt on /dev/pts/1. "ctrl-alt-f2" gives you a shell prompt on /dev/tty2. Both per the "tty" command. So, "ctrl-alt-f2" works in both graphical and text mode, while "ctrl-b 2" only works in text mode. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. Was the installer respun? The bug occurs during the install with the install kernel. No, the installer doesn't get respun. This is going to be an F20 "fix". *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those. *********** MASS BUG UPDATE ************** This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested. |