Bug 805166 - anaconda doesn't start with direct kernel boot and root=live:nfs:
anaconda doesn't start with direct kernel boot and root=live:nfs:
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Will Woods
Fedora Extras Quality Assurance
RejectedBlocker
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-20 11:49 EDT by Kamil Páral
Modified: 2012-04-19 17:48 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-19 17:48:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
console log (44.91 KB, text/plain)
2012-03-20 11:50 EDT, Kamil Páral
no flags Details

  None (edit)
Description Kamil Páral 2012-03-20 11:49:56 EDT
Description of problem:
I used direct kernel boot (using a virt-manager or PXE, doesn't matter) and provided only this kernel argument:

root=live:nfs://server:/path/squashfs.img

The installation proceeds to "Starting the anaconda installation program..." message and then stops and nothing happens. After 3 or 4 minutes a kernel traceback is shown:

[  246.850168] INFO: task loop0:375 blocked for more than 120 seconds.
[  246.852055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.854606] loop0           D ffffffff  5568   375      2 0x00000000
[  246.856663]  f1d8fc88 00000092 00000000 ffffffff ffffffff dd5ec1dc 0000000b c0e70680
[  246.859260]  f67e0000 c0e70680 f5147680 f10f5640 f6739590 0001e88b 00000000 0000002b
[  246.861796]  c0c008c0 f12d8ecc 00016f84 f1d8fc58 c042a12f f1d8fc60 00000000 f1d8fc88
[  246.864438] Call Trace:
[  246.865240]  [<c042a12f>] ? kvm_clock_read+0x1f/0x30
[  246.866733]  [<c0490703>] ? ktime_get_ts+0xc3/0xf0
[  246.868182]  [<c09bd815>] schedule+0x35/0x50
[  246.869464]  [<c09bd8a8>] io_schedule+0x78/0xb0
[  246.870841]  [<c050bdfd>] sleep_on_page+0xd/0x20
[  246.872520]  [<c09bb349>] __wait_on_bit_lock+0x49/0xa0
[  246.874125]  [<c050bdf0>] ? __lock_page+0x80/0x80
[  246.875602]  [<c050bde8>] __lock_page+0x78/0x80
[  246.876970]  [<c045f900>] ? autoremove_wake_function+0x50/0x50
[  246.878726]  [<c058cfa7>] __generic_file_splice_read+0x4a7/0x5e0
[  246.880523]  [<c058b5d0>] ? page_cache_pipe_buf_release+0x20/0x20
[  246.882338]  [<c042a12f>] ? kvm_clock_read+0x1f/0x30
[  246.883976]  [<c0409a88>] ? sched_clock+0x8/0x10
[  246.885368]  [<c0475cb0>] ? sched_clock_local+0xf0/0x1e0
[  246.886952]  [<c0475ef7>] ? sched_clock_cpu+0xe7/0x190
[  246.888523]  [<c06482d3>] ? file_has_perm+0xe3/0xf0
[  246.889971]  [<c049a04b>] ? trace_hardirqs_off+0xb/0x10
[  246.891542]  [<c0476005>] ? local_clock+0x65/0x70
[  246.893039]  [<c049a87b>] ? lock_release_holdtime.part.27+0x8b/0xf0
[  246.895074]  [<f7da36b5>] ? nfs_have_delegation+0x95/0x160 [nfs]
[  246.896878]  [<f7da36c9>] ? nfs_have_delegation+0xa9/0x160 [nfs]
[  246.898692]  [<f7da364e>] ? nfs_have_delegation+0x2e/0x160 [nfs]
[  246.900489]  [<c058d16c>] generic_file_splice_read+0x8c/0xf0
[  246.902645]  [<f7d77e7f>] nfs_file_splice_read+0x6f/0x100 [nfs]
[  246.904692]  [<c0561131>] ? rw_verify_area+0x61/0x120
[  246.906604]  [<f7d77e10>] ? nfs_file_read+0x140/0x140 [nfs]
[  246.909063]  [<c058b900>] do_splice_to+0x60/0x80
[  246.910452]  [<c058bfba>] splice_direct_to_actor+0xaa/0x1d0
[  246.915214]  [<c0782cc0>] ? do_lo_send_write+0xe0/0xe0
[  246.922660]  [<c0783c9b>] loop_thread+0x2ab/0x520
[  246.927950]  [<c07839f0>] ? loop_control_ioctl+0x120/0x120
[  246.929851]  [<c045f38d>] kthread+0x7d/0x90
[  246.931107]  [<c045f310>] ? kthread_worker_fn+0x170/0x170
[  246.932862]  [<c09c7142>] kernel_thread_helper+0x6/0x10
[  246.934899] no locks held by loop0/375.

(and many more, see log)

The only way available is hard reboot.

Version-Release number of selected component (if applicable):
anaconda 17.13
F17 Beta TC2

How reproducible:
always
reproduced on 2 different bare metal machines over PXE and a VM using direct kernel boot in virt-manager

Steps to Reproduce:
1. use direct kernel boot with root=live:nfs://server:/path/to/squashfs.img
  
Actual results:
anaconda doesn't start

Expected results:
anaconda starts

Additional info:
There are a lot of AVC denials, but I tried to boot with selinux=0 and it behaves completely the same.
Comment 1 Kamil Páral 2012-03-20 11:50:29 EDT
Created attachment 571456 [details]
console log
Comment 2 Kamil Páral 2012-03-20 11:52:55 EDT
Proposing as F17 Beta blocker due to this criterion:

"The installer must be able to use the HTTP, FTP and NFS remote package source options"
https://fedoraproject.org/wiki/Fedora_17_Beta_Release_Criteria

This bug is not exactly about "package source options", but it is a pre-requisite to it. And it makes anaconda unbootable if you want to provide the root image over NFS. It is possible the same bug will manifest if I use repo=nfs:// option (not tested).
Comment 3 Will Woods 2012-03-21 17:52:27 EDT
Have you tried the traditional "repo=nfs://server:/path/to/repo" instead? Does it give the same result?
Comment 4 Adam Williamson 2012-03-21 19:48:36 EDT
I'm pretty ambivalent about blocker status for this one. We don't really cover remote delivery of the installer via nfs in the criteria, do we? It's not really a precursor to the criterion you cited, as you can use nfs remote package source without requiring nfs delivery of the installer...



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 5 Jóhann B. Guðmundsson 2012-03-21 20:23:48 EDT
+1 to blocker
Comment 6 Bruno Wolff III 2012-03-21 22:35:27 EDT
+0 blocker +1 NTH This seems like a corner case. But it would be nice to have it fixed for the release media.
Comment 7 Kamil Páral 2012-03-22 06:14:29 EDT
(In reply to comment #3)
> Have you tried the traditional "repo=nfs://server:/path/to/repo" instead? Does
> it give the same result?

Yes and no. It mounts squashfs.img correctly, but anaconda fails in the repo selection step. The problem is in 'nfs://'. If I change it to "repo=nfs:server:/path/to/repo" (which is according to the documentation) it works on both steps.

I realized I had wrong syntax even in my original description. But it still behaves the same even after using "root=live:nfs:server:/path/squashfs.img".


(In reply to comment #4)
> We don't really cover
> remote delivery of the installer via nfs in the criteria, do we?

What if I want to boot from PXE, but want to use online repos for installation? Isn't that a valid use case? I have to provide it with squashfs.img, which I have stored on NFS. This seems to me like a reasonable PXE setup.
Comment 8 Jóhann B. Guðmundsson 2012-03-22 06:33:05 EDT
Not sure how anaconda is doing stuff nfs wize but if you guys are using your own units you need to update them to something similar to what is in bug 769879.

Last time I checked nfs in general was broken in F17 ( Granted that check was performed a while back).
Comment 9 Adam Williamson 2012-03-22 12:26:59 EDT
kamil: so if I understand comment #7, this can actually be made to work with current anaconda, and in fact the syntax specified in the documentation works - but other syntaxes which previously worked are now broken?

johan: I'm using F17 as an NFS client and it seems fine, but that's pretty light duty (just for mounting my IRC logs from my IRC bouncer machine).



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 10 Kamil Páral 2012-03-23 04:43:17 EDT
Adam, please note that I'm talking about 'repo=' option and 'root=' option. Those are different things.

To summarize:

1. Anaconda works fine with 'repo=nfs:' option. It finds the root image (squashfs.img), mounts it, and also uses the package repository in there.

2. If you don't want to use 'repo=' option (e.g. you don't have the package repository mirrored), but still want to run anaconda (just the installer itself) from PXE, you need to use 'root=' option. This option syntax is *undocumented* (we should probably create a new bug about this).

3. 'root=' option syntax was mentioned in several bug reports by anaconda team members. It is basically the same as 'repo=VALUE' syntax, it looks like 'root=live:VALUE/LiveOS/squashfs.img'.

4. 'root=live:nfs:' is broken (as described in this bug). That means the following use case is broken: "start anaconda from PXE (or other direct kernel boot, like in VM) when you have the root image stored on NFS and you don't have package repositories mirrored". Currently either 'repo=' option is required, or you have to use other protocol than nfs in 'root=' option.

In Fedora 16 we didn't have this problem because stage2 was part of initrd.img (IIRC). Therefore we have no criterion for fetching the anaconda root image from remote locations. That's why I used the "remote package source" criterion, which is very similar, but true, not the same.
Comment 11 Adam Williamson 2012-03-23 23:47:09 EDT
right, I kinda got that but the report got rather confused because you kept trying different things.

if you think the test cases need to be modified or added to in the noloader era, please draft that up, we should certainly test the functionality it makes sense to test.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 12 Adam Williamson 2012-03-26 13:47:41 EDT
Discussed at 2012-03-26 QA meeting acting as a blocker review meeting. Agreed this is not a blocker per anaconda team's statement that using root= in this way is not supported or expected to work. Anaconda team is to update the documentation regarding what the repo= parameter is capable of and how it should be used. We should then look at any possible changes/improvements to the install guide and the release criteria.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 13 Adam Williamson 2012-03-27 15:04:42 EDT
Follow-up on this, AIUI: anaconda team says that repo= should always be used, even in the case Kamil describes as "If you don't want to use 'repo=' option (e.g. you don't have the package repository mirrored), but still want to run anaconda (just the installer itself) from PXE". repo= is expected to work with a partial repository - if you have the squashfs file in the appropriate place, even if the rest of the tree is missing, it should at least pull squashfs from there and then use any other repo you provide at the package selection stage for package installation. Please correct me if I'm wrong. Should we then mark this as NOTABUG or WONTFIX?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 14 Kamil Páral 2012-03-28 05:28:30 EDT
(In reply to comment #13)
> repo= is expected to work with
> a partial repository - if you have the squashfs file in the appropriate place,
> even if the rest of the tree is missing, it should at least pull squashfs from
> there and then use any other repo you provide at the package selection stage
> for package installation. Please correct me if I'm wrong.

You are right, wwoods assumed that. But currently that doesn't work, see bug 790348 (the first few comments might be confusing, until we realized what the root cause is). If anaconda team claims root= option should not be used by users (they just use it internally), let's close this bug, but we should then discuss whether bug 790348 is a blocker for some of our PXE/VM boot use cases (except we don't have any ATM).
Comment 15 Kamil Páral 2012-04-19 17:48:38 EDT
I'm closing this one (root= is not for public usage) and proposing bug 790348 as a blocker due to the new criterion.

Note You need to log in before you can comment on or make changes to this bug.