Bug 1395248 - [RFE] Kernel address space layout randomization [KASLR] support (virsh)
Summary: [RFE] Kernel address space layout randomization [KASLR] support (virsh)
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt   
(Show other bugs)
Version: 7.3
Hardware: All Linux
medium
high
Target Milestone: rc
: 7.5
Assignee: Martin Kletzander
QA Contact: yafu
Jiri Herrmann
URL:
Whiteboard:
Keywords: FutureFeature, OtherQA
: 1451550 (view as bug list)
Depends On: 1290840 1398633 1411490 1424943 1493125 1519748
Blocks: 1288169 1298243 1317091 1522983 1555276 1568461 1469590 1555268 1568736
TreeView+ depends on / blocked
 
Reported: 2016-11-15 14:05 UTC by Ademar Reis
Modified: 2018-04-18 08:03 UTC (History)
29 users (show)

Fixed In Version: libvirt-3.9.0-5.el7
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1290840
: 1398633 1555268 1555276 (view as bug list)
Environment:
Last Closed: 2018-04-10 10:39:40 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0704 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2018-04-10 15:57:35 UTC

Description Ademar Reis 2016-11-15 14:05:12 UTC
Patches are ready for most components, but we need a solution virsh dump when KVM guests have KASLR enabled.

The discussion upstream appears to be converging to a qemu-guest-agent solution for now: http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg01618.html

+++ This bug was initially created as a clone of Bug #1290840 +++

Description of problem:
Kernel Address Space Randomization [KASLR] allows to randomize the physical and virtual address at which the kernel image is decompressed, as a security feature that deters exploit attempts relying on knowledge of the location of kernel internals. 

The feature has been described in LWN article:
https://lwn.net/Articles/569635/

With upstream patchsets of:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e8236c4d9338d52d0f2fcecc0b792ac0542e4ee9

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da2b6fb990cf782b18952f534ec7323453bc4fc9

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a653f3563c51c7bb7de63d607bef09d3baddaeb8

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5bfce5ef55cbe78ee2ee6e97f2e26a8a582008f3

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6145cfe394a7f138f6b64491c5663f97dba12450

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=19259943f0954dcd1817f94776376bf51c6a46d5

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f32360ef6608434a032dc7ad262d45e9693c27f3

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8ab3820fd5b2896d66da7bb2a906bc382e63e7bc

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=82fa9637a2ba285bcc7c5050c73010b2c1b3d803

Version-Release number of selected component (if applicable):
went upstream in 3.14


Additional info:

https://lwn.net/Articles/569635/

--- Additional comment from Baoquan He on 2016-06-22 00:13:24 BRT ---


Hi,

Currently kernel text mapping separate randomization has been through several rounds. The latest one is v9 to address the last part of actual work as below:

https://lkml.org/lkml/2016/5/25/687
Now the status is Ingo has added them into tip-bot tree for testing.

Then Thomas Garnier from google raised another aslr realted issue: memory area address randomizatiuon. Ingo accepted the idea, and is reviewing the patchset.

[PATCH v7 0/9] x86/mm: memory area address KASLR
http://www.gossamer-threads.com/lists/linux/kernel/2467722

It could be merged into v4.8.

Just update the progress here for reference.

Thanks
Baoquan

--- Additional comment from Baoquan He on 2016-07-27 23:50:51 BRT ---

Separating kernel image virtual address randomization from physical address randomization and extending kernel physical address randomization to be above 4G; 

Randomize kernel memory regions;

Both of these two new features have been merged into Linus's tree.

90397a4 x86/mm: Add memory hotplug support for KASLR memory randomization
a95ae27 x86/mm: Enable KASLR for vmalloc memory regions
021182e x86/mm: Enable KASLR for physical mapping memory regions
0483e1f x86/mm: Implement ASLR for kernel memory regions
d899a7d x86/mm: Refactor KASLR entropy functions
6daa2ec x86/KASLR: Fix boot crash with certain memory configurations
e066cc4 x86/KASLR: Allow randomization below the load address
ed9f007 x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
8391c73 x86/KASLR: Randomize virtual address separately
11fdf97 x86/KASLR: Clarify identity map interface
65fe935 x86/KASLR, x86/power: Remove x86 hibernation restrictions
d2d3462 x86/KASLR: Clarify purpose of each get_random_long()
071a749 x86/KASLR: Add virtual address choosing function
06486d6 x86/KASLR: Return earliest overlap when avoiding regions
c401cf1 x86/KASLR: Add 'struct slot_area' to manage random_addr slots
434a6c9 x86/KASLR: Initialize mapping_info every time
3a94707 x86/KASLR: Build identity mappings on demand
ed09acd x86/KASLR: Improve comments around the mem_avoid[] logic
549f90d x86/boot: Simplify pointer casting in choose_random_location()
9dc1969 x86/KASLR: Consolidate mem_avoid[] entries
4d2d542 x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
6f9af75 x86/KASLR: Handle kernel relocations above 2G correctly
0f8ede1b x86/KASLR: Warn when KASLR is disabled
e8581e3 x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
4252db1 x86/KASLR: Update description for decompressor worst case size
9016875 x86/KASLR: Rename "random" to "random_addr"
7de828d x86/KASLR: Clarify purpose of kaslr.c
206f25a x86/KASLR: Remove unneeded boot_params argument
9b23874 x86/KASLR: Rename aslr.c to kaslr.c

--- Additional comment from Dave Young on 2016-11-10 00:09:55 BRST ---

Memo:
userspace support status:
Kexec/kdump: 
 kernel: ready
 kexec-tools: ready
 makedumpfile: patches are ready. will be in makedumpfile 1.6.1
 crash: ready

Systemtap:
 Per systemtap maintainer, it is ready

Kpatch:
 Kpatch team can do it after kernel backport, opened a bug

Crash:
 kdump is ready
 virsh dump is not ready
 dyoung: opened an upstream thread:
 http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg01618.html

So we need monitor and wait for virsh dump in qemu upstream before enabling the kernel kaslr.

Comment 1 Ademar Reis 2016-11-25 12:22:54 UTC
The upstream discussion appears to have converged around the idea that an initial implementation could be made via the guest-agent.

The guest-agent part shold be relatively trivial to implement, so maybe could be implemented by the same developer working on the libvirt feature.

Comment 2 Daniel Berrange 2016-11-25 12:26:27 UTC
FYI, from libvirt POV, I am *not* in favour of using guest agent for it - IMHO there should be a mechanism to feed this data back to the host via the core platform without requiring running special processes - perhaps something in an ACPI hook that can be exposed via the monitor, so that it is available from the very moment the kernel boots and configures KASLR, instead of only some arbitrary time later.

Comment 3 Ademar Reis 2016-11-25 12:41:19 UTC
(In reply to Daniel Berrange from comment #2)
> FYI, from libvirt POV, I am *not* in favour of using guest agent for it -
> IMHO there should be a mechanism to feed this data back to the host via the
> core platform without requiring running special processes - perhaps
> something in an ACPI hook that can be exposed via the monitor, so that it is
> available from the very moment the kernel boots and configures KASLR,
> instead of only some arbitrary time later.

OK, so no consensus yet... I've created the clone for the QEMU part: Bug 1398633

Comment 8 Marc-Andre Lureau 2017-03-02 14:36:37 UTC
work in progress here, 

I have some experimental qemu-ga patches to export phys_base, and _text value (before we have pstore).

With this, I can compute the kaslr offset and launch crash manually with --machdep phys_base=0x69000000 --kaslr 0x4000000 for example.

I don't know if those values could be stored in the ELF core, I am still looking for answers.

taking the bug for now

Comment 9 Dave Anderson 2017-03-02 16:08:52 UTC
this is a snippet of a private email in response to the above:


The traditional kdump facility stores the relocation information in the 
original /proc/kcore ELF file in a "VMCOREINFO" PT_NOTE (along with 
a whole bunch of other stuff). 

The traditional kdump facility doesn't store the phys_base information
in the ELF header directly, but it can be calculated by checking the
PT_LOAD segment that describes the __START_KERNEL_map region (x86_64).

Then if/when makedumpfile is used to create a compressed dumpfile:

 (1) the VMCOREINFO PT_NOTE (and each of the per-cpu NT_PRSTATUS PT_NOTE's) 
     are copied wholesale into a section of the compressed dumpfile header.  
 (2) the phys_base value is calculated, and then stored in its own field
     in the compressed dumpfile header.

The problem at hand is that dumpfiles created by the virsh dump facility
do not have VMCOREINFO PT_NOTE sections, nor do they have any concept 
of the kernel virtual addresses associated with each physical memory 
PT_LOAD segment, so the PT_LOAD segments have "0" fields in their p_vaddr
fields.

I've attached a sample output of a kdump compressed dumpfile header to
this email, but to answer your question, here are the items of interest, 
i.e., the "phys_base" value and the two relevant fields in the VMCOREINFO data:
  
  crash> help -D
  diskdump_data: 
            filename: vmcore
  
  ... [ cut ] ...
  
    sub_header_kdump: 2f14ff0 
             phys_base: 16a000000
  ... [ cut ] ...
     offset_vmcoreinfo: 7072 (0x1ba0)
       size_vmcoreinfo: 1837 (0x72d)
                        OSRELEASE=3.10.0-576.el7.bz1424943.x86_64
  ... [ cut ] ...
                        SYMBOL(_stext)=ffffffff9d2002b8
  ... [ cut ] ...
                        KERNELOFFSET=1c200000
  ...

The KERNELOFFSET item is new, and for backwards-compatibility, the crash 
utility currently doesn't use it.  Instead it compares the (relocated) 
"SYMBOL(_stext)" value with the "_stext" symbol value found in the vmlinux
file to come up with the relocation value.

And again, the phys_base value is the calculated value from the PT_LOAD 
segment of the mapped kernel text/static-data region (__START_KERNEL_map)
in the ELF header.

Now FWIW, you'll also note in the header sample this "empty" entry:

      num_qemu_notes: 0

Since the sample comes from a bare-metal kernel kdump, there are
no qemu notes.  But when they do exist, they consist of a set of
per-cpu QEMUCPUState structures.  Similar to the NT_PRSTATUS notes
in the traditional kdump, "virsh dump" creates its own "QEMU" PT_NOTE
section in the ELF header, which contains "QEMUCPUState" structures
for each cpu.  And those get copied to the compressed dumpfile header.

I've also attached "virsh-dump.ELF" and "virsh-dump.compressed"
header dumps.  Note that they contain both the traditional NT_PRSTATUS
notes plus the "QEMU" notes.  And the compressed version shows an
uninitialized "phys_base" value of 0.

Currently, without KASLR, the "phys_base" value is always a megabyte-aligned
value that is 16MB or less, so the crash utility does a series of reads
of the "linux_banner" symbols at set of 1MB physical offsets until
it "finds" a linux_banner string.  It's a kludge, but there's no other
way to do it.  But with KASLR in play, the phys_base value can be 
basically anywhere, hence the problem at hand.

So anyway, it would seem to make the most sense to store the relocation
and the phys_base data in the ELF header, perhaps in a "VMCOREINFO" PT_NOTE.
And then if it's to be compressed, copy the note to the compressed dumpfile,
header, and initialize its phys_base field accordingly (although it might
be redundant because it would also be in the VMCOREINFO data).  That way, 
there would be no need to have to update the format of the compressed 
dumpfile header.

That being the case, I don't see the need for any additional --kaslr-text option,
or for that matter, the need for any additional command line options.  It 
should just require "crash vmlinux dumpfile".

Does that all make sense?

Dave

Comment 10 Jaroslav Suchanek 2017-03-03 10:23:55 UTC
(In reply to Marc-Andre Lureau from comment #8)
> 
> taking the bug for now

Marc-Andre, what's your plan with this libvirt bug? Are you going to implement it for libvirt as well? If not, please reassign it back to Martin. Thanks.

Comment 11 Marc-Andre Lureau 2017-03-03 10:37:15 UTC
(In reply to Jaroslav Suchanek from comment #10)
> (In reply to Marc-Andre Lureau from comment #8)
> > 
> > taking the bug for now
> 
> Marc-Andre, what's your plan with this libvirt bug? Are you going to
> implement it for libvirt as well? If not, please reassign it back to Martin.
> Thanks.

I think we won't need anything from libvirt, in fact. But since I am looking at the issue, and this is the top-level bug, I suggest to keep me assign for now.

Comment 12 Marc-Andre Lureau 2017-03-08 21:43:41 UTC
qemu WIP in this github branch: https://github.com/elmarco/qemu/tree/kaslr
(produces kdump that work with crash, ELF is lacking phys_base detail for now)

It won't probably need changes to libvirt.

Comment 13 Laszlo Ersek 2017-03-09 10:29:01 UTC
Marc-André, did you test it with OVMF? If you don't have time for that, I can add it to my todo list, but I won't be fast. Thanks.

Comment 14 Marc-Andre Lureau 2017-03-15 14:27:53 UTC
checked, it seems to work fine with uefi guest

Comment 16 Dave Anderson 2017-05-18 15:05:17 UTC
*** Bug 1452016 has been marked as a duplicate of this bug. ***

Comment 17 Ademar Reis 2017-05-18 16:08:29 UTC
*** Bug 1451550 has been marked as a duplicate of this bug. ***

Comment 18 fj-lsoft-kernel-it 2017-10-23 08:32:44 UTC
Marc-Andre,

Could you tell me the current status of the upstream development for
this ticket?

It looks to me there's no response for both qemu and kernel patch sets
since the last post on Sep 11 and on Sep 22, respectively. I'm
concerned in particular about progress of the patch set for kernel,
which still includes proof-of-concept and RFC patches.

    https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg06783.html
    https://lkml.org/lkml/2017/9/22/337

Fujitsu needs the fix for this issue in RHEL7.5GA. Could you share any
trouble if you have in the development? Maybe, we could help your
development.

For example, I think you can reuse a wide part of implementation from
pvpanic device in this work:

    # find ./qemu -name "*pvpanic*"
    ./tests/pvpanic-test.c
    ./hw/misc/pvpanic.c
    ./docs/specs/pvpanic.txt
    # find ./linux -name "*pvpanic*"
    ./drivers/platform/x86/pvpanic.c

The pvpanic device is used to send a PANIC event from a guest to a
host. The implementation is to expose a I/O port to a guest machine
and the guest machine writes a bit into the I/O port at panic to tell
the host the panic state.

Because pvpanic is implemented this way, I guess the I/O port
interface is not in any experimental state as the DMA write you are
now using.

Thanks.
HATAYAMA, Daisuke

Comment 19 Marc-Andre Lureau 2017-10-23 09:30:24 UTC
(In reply to fj-lsoft-kernel-it from comment #18)
> Marc-Andre,
> 
> Could you tell me the current status of the upstream development for
> this ticket?
> 

The qemu bits are upstream:
https://git.qemu.org/?p=qemu.git;a=commit;h=6e43353f10c6688060af0bc26bdfdd4cf9c96ea2

> It looks to me there's no response for both qemu and kernel patch sets
> since the last post on Sep 11 and on Sep 22, respectively. I'm
> concerned in particular about progress of the patch set for kernel,
> which still includes proof-of-concept and RFC patches.
> 
>     https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg06783.html
>     https://lkml.org/lkml/2017/9/22/337

There is no review of kernel patches yet. I ping Michael T. about it last week. But any help welcome!

> 
> Fujitsu needs the fix for this issue in RHEL7.5GA. Could you share any
> trouble if you have in the development? Maybe, we could help your
> development.
> 
> For example, I think you can reuse a wide part of implementation from
> pvpanic device in this work:
> 
>     # find ./qemu -name "*pvpanic*"
>     ./tests/pvpanic-test.c
>     ./hw/misc/pvpanic.c
>     ./docs/specs/pvpanic.txt
>     # find ./linux -name "*pvpanic*"
>     ./drivers/platform/x86/pvpanic.c
> 
> The pvpanic device is used to send a PANIC event from a guest to a
> host. The implementation is to expose a I/O port to a guest machine
> and the guest machine writes a bit into the I/O port at panic to tell
> the host the panic state.
> 

pvpanic is x86 only, and doesn't provide interface to write data.

> Because pvpanic is implemented this way, I guess the I/O port
> interface is not in any experimental state as the DMA write you are
> now using.

fw_cfg DMA write has been in qemu since 2.9.

Thanks

Comment 20 fj-lsoft-kernel-it 2017-10-24 09:04:21 UTC
Marc-Andre,

(In reply to Marc-Andre Lureau from comment #19)
> (In reply to fj-lsoft-kernel-it from comment #18)
> > Marc-Andre,
> > 
> > Could you tell me the current status of the upstream development for
> > this ticket?
> > 
> 
> The qemu bits are upstream:
> https://git.qemu.org/?p=qemu.git;a=commit;
> h=6e43353f10c6688060af0bc26bdfdd4cf9c96ea2

I didn't notice this, thanks.

> > It looks to me there's no response for both qemu and kernel patch sets
> > since the last post on Sep 11 and on Sep 22, respectively. I'm
> > concerned in particular about progress of the patch set for kernel,
> > which still includes proof-of-concept and RFC patches.
> > 
> >     https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg06783.html
> >     https://lkml.org/lkml/2017/9/22/337
> 
> There is no review of kernel patches yet. I ping Michael T. about it last
> week. But any help welcome!
>

I think it important to make the review progress as soon as
possible. I'll monitor the thread and might comment something from the
crash dump feature's point of view.

> > 
> > Fujitsu needs the fix for this issue in RHEL7.5GA. Could you share any
> > trouble if you have in the development? Maybe, we could help your
> > development.
> > 
> > For example, I think you can reuse a wide part of implementation from
> > pvpanic device in this work:
> > 
> >     # find ./qemu -name "*pvpanic*"
> >     ./tests/pvpanic-test.c
> >     ./hw/misc/pvpanic.c
> >     ./docs/specs/pvpanic.txt
> >     # find ./linux -name "*pvpanic*"
> >     ./drivers/platform/x86/pvpanic.c
> > 
> > The pvpanic device is used to send a PANIC event from a guest to a
> > host. The implementation is to expose a I/O port to a guest machine
> > and the guest machine writes a bit into the I/O port at panic to tell
> > the host the panic state.
> > 
> 
> pvpanic is x86 only, and doesn't provide interface to write data.
>

I can understand you'd like to support as many architectures as
possible. But I'd like you to consider changing the policy when the
deadline gets closer...

Also, according to ./drivers/platform/x86/pvpanic.c:

     57 static void
     58 pvpanic_send_event(unsigned int event)
     59 {
     60         outb(event, port);
     61 }
     62 
     63 static int
     64 pvpanic_panic_notify(struct notifier_block *nb, unsigned long code,
     65                      void *unused)
     66 {
     67         pvpanic_send_event(PVPANIC_PANICKED);
     68         return NOTIFY_DONE;
     69 }

This appears to write a status from guest machine into host machine
via ioport.

> > Because pvpanic is implemented this way, I guess the I/O port
> > interface is not in any experimental state as the DMA write you are
> > now using.
> 
> fw_cfg DMA write has been in qemu since 2.9.
> 

But you write in the patch description: "such usage is strongly
discouraged by the maintainers." and so I think you are also concerned
about the status of the fw_cfg DMA write.

Thanks.
HATAYAMA, Daisuke

Comment 21 Marc-Andre Lureau 2017-10-31 15:41:18 UTC
(In reply to fj-lsoft-kernel-it from comment #20)
> 
> I think it important to make the review progress as soon as
> possible. I'll monitor the thread and might comment something from the
> crash dump feature's point of view.

Thanks, I sent v4:
https://lkml.org/lkml/2017/10/31/552

> > pvpanic is x86 only, and doesn't provide interface to write data.
> >
> 
> I can understand you'd like to support as many architectures as
> possible. But I'd like you to consider changing the policy when the
> deadline gets closer...
> 
> Also, according to ./drivers/platform/x86/pvpanic.c:
> 
>      57 static void
>      58 pvpanic_send_event(unsigned int event)
>      59 {
>      60         outb(event, port);
>      61 }
>      62 
>      63 static int
>      64 pvpanic_panic_notify(struct notifier_block *nb, unsigned long code,
>      65                      void *unused)
>      66 {
>      67         pvpanic_send_event(PVPANIC_PANICKED);
>      68         return NOTIFY_DONE;
>      69 }
> 
> This appears to write a status from guest machine into host machine
> via ioport.

The interface can only be used to notify with a status. I don't see how you could use it to write arbitrary data from guest.

> > > Because pvpanic is implemented this way, I guess the I/O port
> > > interface is not in any experimental state as the DMA write you are
> > > now using.
> > 
> > fw_cfg DMA write has been in qemu since 2.9.
> > 
> 
> But you write in the patch description: "such usage is strongly
> discouraged by the maintainers." and so I think you are also concerned
> about the status of the fw_cfg DMA write.

This last patch is explicitely marked as experimental sysfs write support. The maintainers are afraid that userspace starts to rely on write support so they would rather not expose it.

It is not about usage of fw_cfg write in general (which is also being used by the bios)

I dropped that patch from the series in the last iteration to make that clearer.

thanks

Comment 22 fj-lsoft-kernel-it 2017-11-01 04:26:40 UTC
Marc-Andre,

> (In reply to fj-lsoft-kernel-it from comment #20)
> >
> > I think it important to make the review progress as soon as
> > possible. I'll monitor the thread and might comment something from the
> > crash dump feature's point of view.
> 
> Thanks, I sent v4:
> https://lkml.org/lkml/2017/10/31/552
>

I see.

> > > pvpanic is x86 only, and doesn't provide interface to write data.
> > >
> >
> > I can understand you'd like to support as many architectures as
> > possible. But I'd like you to consider changing the policy when the
> > deadline gets closer...
> >
> > Also, according to ./drivers/platform/x86/pvpanic.c:
> >
> >      57 static void
> >      58 pvpanic_send_event(unsigned int event)
> >      59 {
> >      60         outb(event, port);
> >      61 }
> >      62
> >      63 static int
> >      64 pvpanic_panic_notify(struct notifier_block *nb, unsigned long code,
> >      65                      void *unused)
> >      66 {
> >      67         pvpanic_send_event(PVPANIC_PANICKED);
> >      68         return NOTIFY_DONE;
> >      69 }
> >
> > This appears to write a status from guest machine into host machine
> > via ioport.
> 
> The interface can only be used to notify with a status. I don't see how you
> could use it to write arbitrary data from guest.
>

Yes, to write arbitrary data, it's necessary to write it multiple
times while communicating with the host. In this approach, you need to
implement a bit of a state transition machine that controls status
register.

Please look at the design of IPMI KCS interface, 9.5 KCS Interface
Registers in the specification below, for example. I imagine this kind
of design.

    Intelligent Platform Management Interface Specification v2.0 rev. 1.1
    https://www.intel.co.jp/content/www/jp/ja/servers/ipmi/ipmi-second-gen-interface-spec-v2-rev1-1.html

Another approach is to simplify the data to be passed by the device,
for example, into just two values: kaslr_offset and phys_base. Then,
the mechanism of writing multiple bytes like above is not necessary.

> > > > Because pvpanic is implemented this way, I guess the I/O port
> > > > interface is not in any experimental state as the DMA write you are
> > > > now using.
> > >
> > > fw_cfg DMA write has been in qemu since 2.9.
> > >
> >
> > But you write in the patch description: "such usage is strongly
> > discouraged by the maintainers." and so I think you are also concerned
> > about the status of the fw_cfg DMA write.
> 
> This last patch is explicitely marked as experimental sysfs write support. The
> maintainers are afraid that userspace starts to rely on write support so they
> would rather not expose it.

Thanks, it's helpful for me to understand what they are concerned
about further.

> 
> It is not about usage of fw_cfg write in general (which is also being used by
> the bios)
>

It's interesting. I found some codes related to fw_cfg in seabios.

    # git grep fw_cfg
    docs/Releases.md:  CBFS/fw_cfg "bootorder" file)
    docs/Releases.md:  for extracting option roms from qemu "fw_cfg".
    docs/Runtime_config.md:natively on QEMU the files are passed from QEMU via the fw_cfg
    docs/Runtime_config.md:| pci-optionrom-exec  | Controls option ROM execution for roms found on PCI devices (as opposed to roms found in CBFS/fw_cfg).  Valid values are 0: Execute no ROMs, 1: Execute only VGA ROMs, 2: Execute all ROMs. T
    he default is 2 (execute all ROMs).
    src/Kconfig:            Support controlling of the boot order via the fw_cfg/CBFS
    src/Kconfig:        bool "Floppy images from CBFS or fw_cfg"
    src/Kconfig:            QEMU fw_cfg.
    src/fw/paravirt.c: * QEMU firmware config (fw_cfg) interface
    src/fw/paravirt.c:// List of QEMU fw_cfg entries.  DO NOT ADD MORE.  (All new content
    src/fw/paravirt.c:// should be passed via the fw_cfg "file" interface.)
    src/fw/paravirt.c:// Populate romfile entries for legacy fw_cfg ports (that predate the
    src/fw/paravirt.c:    // Detect fw_cfg interface.
    src/fw/paravirt.c:    dprintf(1, "Found QEMU fw_cfg\n");
    src/fw/paravirt.c:        dprintf(1, "QEMU fw_cfg DMA interface supported\n");
    src/fw/paravirt.c:    // Populate romfiles for legacy fw_cfg entries
    src/fw/paravirt.c:    // Load files found in the fw_cfg file directory
    src/fw/romfile_loader.c:void romfile_fw_cfg_resume(void)
    src/fw/romfile_loader.h:void romfile_fw_cfg_resume(void);
    src/hw/rtc.h:// be passed via the fw_cfg "file" interface.)
    src/resume.c:#include "fw/romfile_loader.h" // romfile_fw_cfg_resume
    src/resume.c:    /* Replay any fw_cfg entries that go back to the host */
    src/resume.c:    romfile_fw_cfg_resume();

> I dropped that patch from the series in the last iteration to make that
> clearer.
> 

I see.

Thanks.
HATAYAMA, Daisuke

Comment 23 Marc-Andre Lureau 2017-11-06 12:11:19 UTC
Sent libvirt RFC patch:
https://www.redhat.com/archives/libvir-list/2017-November/msg00170.html

Comment 24 fj-lsoft-kernel-it 2017-11-10 08:31:42 UTC
Marc-Andre, Dave,

Thanks Marc-Andre for your work to fix the KASLR issue.

I know you are currently working to fix the issue on upstream now, but
the RHEL7.5 alpha is getting closer. I think it better to begin to
consider another alternative approach at this time to address the
KASLR issue independently of the Marc-Andre's work for in case.

The approach is a workaround patch for crash utility to recalculate
kaslr_offset and phys_base by taking advantage of qemu's runtime data.

The patch was first included in the initial version of the patch set
fixing the KASLR issue on sadump but was removed at the second version
as Dave told us progress of the Marc-Andre"s work:

    https://www.redhat.com/archives/crash-utility/2017-October/msg00004.html

The idea in the workaround patch used for virsh dump is the same as
for sadump, so the most part of the workaround patch can be
implemented by reusing the part of the already included sadump
code. It's responsible for Fujitsu to maintain the sadump code, so I
guess inclusion of the workaround patch into crash utility never
increases maintenance risk for Dave.

Thanks.
HATAYAMA, Daisuke

Comment 25 Dave Anderson 2017-11-10 14:27:02 UTC
If Marc-Andre is confident of this patch making it into RHEL7 GA, I don't
see the need for the crash patch.

Comment 26 Marc-Andre Lureau 2017-11-10 14:52:35 UTC
(In reply to Dave Anderson from comment #25)
> If Marc-Andre is confident of this patch making it into RHEL7 GA, I don't
> see the need for the crash patch.

It's hard to tell, as I am not a kernel maintainer. People need to review and help get this series https://lkml.org/lkml/2017/11/7/560 somehow merged ..

Comment 31 Jaroslav Suchanek 2017-11-22 14:16:58 UTC
Fixed upstream:

commit 7e4177a35bae49a53b04940be04418daaa988734
Author:     Marc-André Lureau <marcandre.lureau@redhat.com>
AuthorDate: Thu Nov 16 17:49:38 2017 +0100
Commit:     Martin Kletzander <mkletzan@redhat.com>
CommitDate: Sat Nov 18 10:45:10 2017 +0100

    qemu: add vmcoreinfo support
    
    Starting from qemu 2.11, the `-device vmcoreinfo` will create a fw_cfg
    entry for a guest to store dump details, necessary to process kernel
    dump with KASLR enabled and providing additional kernel details.
    
    In essence, it is similar to -fw_cfg name=etc/vmcoreinfo,file=X but in
    this case it is not backed by a file, but collected by QEMU itself.
    
    Since the device is a singleton and shouldn't use additional hardware
    resources, it is presented as a <feature> element in the libvirt
    domain XML.
    
    The device is arm/x86 only for now (targets that support fw_cfg+dma).
    
    Related to:
    https://bugzilla.redhat.com/show_bug.cgi?id=1395248
    
    Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Comment 35 fj-lsoft-kernel-it 2017-12-05 02:18:18 UTC
Martin,

We Fujitsu like to verify the fix for this issue, so I have added
OtherQA in this ticket.

Could you provide us with test rpm and source rpm packages?

Thanks.
HATAYAMA, Daisuke

Comment 37 fj-lsoft-kernel-it 2017-12-11 08:27:12 UTC
Martin,

Could you provide us with test rpm and source rpm packages?

If difficult now, could you tell me current plan?

Thanks.
HATAYAMA, Daisuke

Comment 39 Ademar Reis 2017-12-18 12:33:45 UTC
(In reply to fj-lsoft-kernel-it from comment #37)
> Martin,
> 
> Could you provide us with test rpm and source rpm packages?
> 
> If difficult now, could you tell me current plan?
> 
> Thanks.
> HATAYAMA, Daisuke

The libvirt changes are in libvirt-3.9.0-5.el7, to be included in RHEL-7.5. We're still working on QEMU and Kernel changes, downstream patches are being reviewed, also planned for RHEL-7.5.

Comment 40 yafu 2018-01-19 09:02:11 UTC
Test with:
kernel-3.10.0-831.el7.x86_64
libvirt-3.9.0-8.el7.x86_64
qemu-kvm-rhev-2.10.0-17.el7.x86_64
crash-7.2.0-3.el7.x86_64

Test steps:
1.Start a guest with vmcoreinfo feature and panic device:
#virsh dumpxml vm1
<features>
...
    <vmcoreinfo/>
  </features>
<on_crash>coredump-restart</on_crash>
<device>
...
<panic model='isa'>
    <address type='isa' iobase='0x505'/>
</panic>
</device>

2.Stop kdump service and trigger os crash in the guest os:
#systemctl stop kdump
#echo c > /proc/sysrq-trigger

3.Check the coredump file:
# ll /var/lib/libvirt/qemu/dump/
total 3164108
-rw-------. 1 root root 3240045752 Jan 19 01:16 7-vm1-2018-01-19-01:16:09

4.Use crash tool to anaylaze the coredump file:
#crash  /usr/lib/debug/lib/modules/3.10.0-831.el7.x86_64/vmlinux 7-vm1-2018-01-19-01\:16\:09 

crash 7.2.0-3.el7
Copyright (C) 2002-2017  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [502MB]: patching 82021 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-831.el7.x86_64/vmlinux 
    DUMPFILE: 7-vm1-2018-01-19-01:16:09
        CPUS: 1
        DATE: Fri Jan 19 01:16:08 2018
      UPTIME: 00:02:11
LOAD AVERAGE: 0.54, 0.42, 0.17
       TASKS: 483
    NODENAME: localhost.localdomain
     RELEASE: 3.10.0-831.el7.x86_64
     VERSION: #1 SMP Wed Jan 17 15:59:59 EST 2018
     MACHINE: x86_64  (2099 Mhz)
      MEMORY: 2.9 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 3875
     COMMAND: "bash"
        TASK: ffff94deb173dee0  [THREAD_INFO: ffff94de961f0000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 3875   TASK: ffff94deb173dee0  CPU: 0   COMMAND: "bash"
 #0 [ffff94de961f3b68] panic at ffffffffa0cef172
 #1 [ffff94de961f3be8] oops_end at ffffffffa0cffac5
 #2 [ffff94de961f3c10] no_context at ffffffffa0cee682
 #3 [ffff94de961f3c60] __bad_area_nosemaphore at ffffffffa0cee719
 #4 [ffff94de961f3cb0] bad_area at ffffffffa0ceeaa9
 #5 [ffff94de961f3cd8] __do_page_fault at ffffffffa0d02bbf
 #6 [ffff94de961f3d40] trace_do_page_fault at ffffffffa0d02cb6
 #7 [ffff94de961f3d80] do_async_page_fault at ffffffffa0d02242
 #8 [ffff94de961f3da0] async_page_fault at ffffffffa0cfe928
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffffa0a289e6  RSP: ffff94de961f3e58  RFLAGS: 00010246
    RAX: 000000000000000f  RBX: ffffffffa12d79c0  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff94deb9e13938  RDI: 0000000000000063
    RBP: ffff94de961f3e58   R8: ffffffffa15bd8bc   R9: 00000000000001a0
    R10: 000000000000037e  R11: 000000000000037d  R12: 0000000000000063
    R13: 0000000000000000  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff94de961f3e60] __handle_sysrq at ffffffffa0a29207
#10 [ffff94de961f3e90] write_sysrq_trigger at ffffffffa0a2967f
#11 [ffff94de961f3ea8] proc_reg_write at ffffffffa088cedd
#12 [ffff94de961f3ec8] vfs_write at ffffffffa0817dfd
#13 [ffff94de961f3f08] sys_write at ffffffffa0818c0f
#14 [ffff94de961f3f50] system_call_fastpath at ffffffffa0d07afd
    RIP: 00007f2b965e4ab0  RSP: 00007ffde1885d18  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000063
    RDX: 0000000000000002  RSI: 00007f2b96f0f000  RDI: 0000000000000001
    RBP: 00007f2b96f0f000   R8: 000000000000000a   R9: 00007f2b96ef9740
    R10: 00007f2b96ef9740  R11: 0000000000000246  R12: 00007f2b968bc400
    R13: 0000000000000002  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

Comment 41 Xuesong Zhang 2018-01-29 03:06:12 UTC
hi, HATAYAMA Daisuke

libvirt QE test and verify this BZ in comments 40. Would you please provide your test result for reference if possible? we can only update the BZ status to VERIFIED after getting your testing result, since this BZ with keyword OtherQA. Thanks.

Comment 42 fj-lsoft-rh-dump 2018-01-29 04:50:59 UTC
Hi Zhang,

I have yet to complete the verification successfully. It looks that
although -device vmcoreinfo option is assigned correctly by libvirt
when I add <vmcoreinfo /> into <devices></deivces> via virsh edit command,
vmcoreinfo device is not detected by the guest kernel.

I guess the cause is that the qemu-kvm package provided on RHEL7.5 beta,
qemu-kvm-1.5.3-152.el7.x86_64, is slightly older than described in
BZ#1411490, qemu-kvm-1.5.3-154.el7.

I'm now about to request the latest qemu-kvm package on BZ#1411490
for our testing.

By the way, to enable vmcoreinfo device, we need to apply -device vmcoreinfo
option to qemu-kvm process via virsh edit command as above.
Do you have any plan to write that in any documentation?

Thanks.
HATAYAMA, Daisuke

Comment 43 Martin Kletzander 2018-01-29 08:04:49 UTC
(In reply to fj-lsoft-rh-dump from comment #42)
Just to make sure there is no confusion, <vmcoreinfo/> should be in <features></features>, not in devices.

Comment 44 Xuesong Zhang 2018-01-29 08:18:28 UTC
As Martin answered above, <vmcoreinfo/> is configured in <features></features>.
And here is the doc for your reference, search keyword vmcoreinfo
https://libvirt.org/formatdomain.html#elementsFeatures

(In reply to fj-lsoft-rh-dump from comment #42)
> Hi Zhang,
> 
> I have yet to complete the verification successfully. It looks that
> although -device vmcoreinfo option is assigned correctly by libvirt
> when I add <vmcoreinfo /> into <devices></deivces> via virsh edit command,
> vmcoreinfo device is not detected by the guest kernel.
> 
> I guess the cause is that the qemu-kvm package provided on RHEL7.5 beta,
> qemu-kvm-1.5.3-152.el7.x86_64, is slightly older than described in
> BZ#1411490, qemu-kvm-1.5.3-154.el7.
> 
> I'm now about to request the latest qemu-kvm package on BZ#1411490
> for our testing.
> 
> By the way, to enable vmcoreinfo device, we need to apply -device vmcoreinfo
> option to qemu-kvm process via virsh edit command as above.
> Do you have any plan to write that in any documentation?
> 
> Thanks.
> HATAYAMA, Daisuke

Comment 45 fj-lsoft-rh-dump 2018-01-29 23:46:52 UTC
Zhang,

Sorry for making you confused. I checked my configuration and confirmed
that I had certainly added <vmcoreinfo/> into *<features>~</features>*.
The description in Comment 42 is wrong.

Thanks.
HATAYAMA, Daisuke

Comment 46 Xuesong Zhang 2018-01-30 06:47:31 UTC
OK, do you get the right qemu-kvm version? If not, I think you can get it via the RHEL7.5 snapshot1 later days.

(In reply to fj-lsoft-rh-dump from comment #45)
> Zhang,
> 
> Sorry for making you confused. I checked my configuration and confirmed
> that I had certainly added <vmcoreinfo/> into *<features>~</features>*.
> The description in Comment 42 is wrong.
> 
> Thanks.
> HATAYAMA, Daisuke

Comment 47 fj-lsoft-rh-dump 2018-02-02 06:38:44 UTC
Hi Zhang,

As you mention, the appropriate version of qemu-kvm package is shipped
with RHEL7.5 snapshot1, and I finished verifying libvirt package
in combination with new enough qemu-kvm and kernel packages correctly.
I confirmed that the issue here has been fixed correctly.

Thanks.
HATAYAMA, Daisuke

Comment 48 Xuesong Zhang 2018-02-02 06:45:34 UTC
Thanks for your confirmation. Update the BZ status to VERIFIED per comments 40 and 47.

(In reply to fj-lsoft-rh-dump from comment #47)
> Hi Zhang,
> 
> As you mention, the appropriate version of qemu-kvm package is shipped
> with RHEL7.5 snapshot1, and I finished verifying libvirt package
> in combination with new enough qemu-kvm and kernel packages correctly.
> I confirmed that the issue here has been fixed correctly.
> 
> Thanks.
> HATAYAMA, Daisuke

Comment 52 errata-xmlrpc 2018-04-10 10:39:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704


Note You need to log in before you can comment on or make changes to this bug.