Bug 1097779

Summary: gdb no longer works to debug qemu: Remote 'g' packet reply is too long
Product: [Fedora] Fedora Reporter: Robin Hack <rhack>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: amit.shah, berrange, cfergeau, ciro.santilli, dwmw2, gbenson, itamar, jan.kratochvil, palves, patrickm, pbonzini, pmuldoon, rhack, rjones, sassmann, scottt.tw, sergiodj, todoleza, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-29 20:37:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robin Hack 2014-05-14 13:56:35 UTC
Description of problem:
Hi all. I hope that I choose right component. If not, please change it to right component.
Maybe... is gdb right component?


Version-Release number of selected component (if applicable):
Host configuration:

    Qemu packages installed:
    qemu-img-1.6.2-4.fc20.x86_64
    qemu-system-arm-1.6.2-4.fc20.x86_64
    qemu-system-x86-1.6.2-4.fc20.x86_64
    qemu-common-1.6.2-4.fc20.x86_64
    qemu-guest-agent-1.6.2-4.fc20.x86_64
    

    Distro: fedora 20

    gdb:
    GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word".

    Kernel:
    3.14.3-200.fc20.x86_64

    No gdbserver package installed on both machines.

Guest configuration:
    Distro: fedora 20

    gdb:
    same as on host

    Kernel:
    3.13.5-200.fc20.x86_64


How reproducible:
always

Steps to Reproduce:
1. configure virsh xml like
<domain type='kvm'>

This has to be changed to

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

and you also need to add:

<qemu:commandline>
<qemu:arg value='-s'/>
</qemu:commandline>

(original url: http://gymnasmata.wordpress.com/2010/12/02/setting-up-gdb-to-work-with-qemu-kvm-via-libvirt/
2. start vm machine
3. try to connect from host:
(gdb) target remote localhost:1234


Actual results:
Remote debugging using localhost:1234
Remote 'g' packet reply is too long: edffffff00000000ffffffffffffffff0000000000000001000000000000000000000000000000004600000000000000b81ec081ffffffffb81ec081ffffffff000000000000000000000000000000002900000000000000a8e7ec81ffffffff0000000000000000c0f9cf81ffffffffc0a2dc81ffffffffd81fc081ffffffff26f30481ffffffff8602000010000000180000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f0300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffff00ffff0000ffff00ffffff000028732900d67f0000606e00c8d67f00000000ff0000000000ff00000000000000706174683d272f6f72672f667265656465736b746f702f44427573272c696e746572666163653d276f72672e667265656465736b746f702e44427573272c6d650000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffff00ffffffffffffff00ffffffffffffff000000000000ffff00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000

And then virtual machine looks freezed.

Expected results:
Successful connection. No machine freezed.

Additional info:
This is how it looks guest kernel in crash:
      KERNEL: vmlinux                           
    DUMPFILE: fedora20.dump
        CPUS: 2
        DATE: Wed May 14 14:57:23 2014
      UPTIME: 00:00:21
LOAD AVERAGE: 0.19, 0.05, 0.02
       TASKS: 107
    NODENAME: gremlinka.brq.redhat.com
     RELEASE: 3.13.5-200.fc20.x86_64
     VERSION: #1 SMP Mon Feb 24 16:51:35 UTC 2014
     MACHINE: x86_64  (2893 Mhz)
      MEMORY: 1 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff81c13460  (1 of 2)  [THREAD_INFO: ffffffff81c00000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

Also I'm able to share memory dump of guest kernel if you need them.

Comment 1 Richard W.M. Jones 2014-05-14 14:53:34 UTC
I have also seen this.  It's very annoying because debugging
qemu is a useful feature.  I suspect it's a bug in gdb however.

Comment 2 Pedro Alves 2014-05-14 15:03:56 UTC
Sounds like you didn't specify the binary to gdb before connecting.

Comment 3 Richard W.M. Jones 2014-05-14 15:07:35 UTC
Shouldn't the error message be "you didn't specify the binary"
instead of "remote 'g' packet reply is too long"?

I'm fairly sure when I did this, I did specify the vmlinux,
but I would have to go back and check that.

Comment 4 Richard W.M. Jones 2014-05-14 15:22:30 UTC
Here is a small, self-contained reproducer.  You will first need to
install 'kernel' and 'kernel-debuginfo' packages, and alter the
version strings below according to what kernel version you actually
installed.

In one window, run:

sudo qemu-system-x86_64 -s -S -kernel /boot/vmlinuz-3.14.3-200.fc20.x86_64 -initrd /boot/initramfs-3.14.3-200.fc20.x86_64.img 

In a second window, run:

$ gdb /usr/lib/debug/lib/modules/3.14.3-200.fc20.x86_64/vmlinux
GNU gdb (GDB) Fedora 7.6.50.20130731-19.fc20
Reading symbols from /usr/lib/debug/lib/modules/3.14.3-200.fc20.x86_64/vmlinux...done.
(gdb) target remote :1234
Remote debugging using :1234
0x0000000000000000 in irq_stack_union ()
(gdb) cont
Continuing.

The virtual machine should start booting.  At some point hit ^C
in the gdb window, and you will see the error:

^CRemote 'g' packet reply is too long: [...]

Also if you kill qemu, then gdb segfaults but that's probably
a different bug.

qemu-system-x86-2.0.0-0.1.rc0.fc21.x86_64
gdb-7.6.50.20130731-19.fc20.x86_64

Comment 5 Robin Hack 2014-05-14 15:31:14 UTC
(In reply to palves from comment #2)
> Sounds like you didn't specify the binary to gdb before connecting.

Yes. I didn't specify vmlinux binary before connection. I think that more userfriendly (also not dead machine from my point of view) approach will be better.

Additional info:
With vmlinux specified:
# gdb vmlinux
...
(gdb) target remote localhost:1234


Works almost perfectly.

This is annoying too:
Program received signal SIGINT, Interrupt.
native_safe_halt ()
    at /usr/src/debug/kernel-3.13.fc20/linux-3.13.5-200.fc20.x86_64/arch/x86/include/asm/irqflags.h:50
50	in /usr/src/debug/kernel-3.13.fc20/linux-3.13.5-200.fc20.x86_64/arch/x86/include/asm/irqflags.h

but is relative easy to solve.

Comment 6 Pedro Alves 2014-05-14 19:11:26 UTC
(In reply to Richard W.M. Jones from comment #4)
> Here is a small, self-contained reproducer.  You will first need to
> install 'kernel' and 'kernel-debuginfo' packages, and alter the
> version strings below according to what kernel version you actually
> installed.

I can reproduce it.  And indeed qemu grows the g packet reply for some (broken) reason:

Right after connection:

Sending packet: $g#67...Ack
Packet received: 0000000000000000230600000000000000000000000000000000000000000000f0ff00000200000000f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000

I let the kernel boot a bit, and ctrl-c.  Still the same size.  Let it boot some more, and then:

Sending packet: $g#67...Ack
Packet received: 6ca40400000000007d6000000000000071000000000000005313a081ffffffffe53c58060088ffff7b00000000000000d03c58060088ffffd03c58060088ffff0100000000000000ceba9781ffffffffd4ba9781ffffffff000000000000000060a4040000000000ab1ea281ffffffff8f1101000000000000000000000000001d721081ffffffff0602000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f0300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000


So indeed GDB is right in its complaint:

 ^CRemote 'g' packet reply is too long: [...]

Sounds like something changed on the qemu end.

Comment 7 Pedro Alves 2014-05-14 19:17:21 UTC
(In reply to Richard W.M. Jones from comment #4)

> Also if you kill qemu, then gdb segfaults but that's probably
> a different bug.

Yeah.  I couldn't reproduce this one.  But definitely a different bug.

Comment 8 Pedro Alves 2014-05-14 19:25:24 UTC
(In reply to Richard W.M. Jones from comment #3)
> Shouldn't the error message be "you didn't specify the binary"
> instead of "remote 'g' packet reply is too long"?

Thing is, specifying a binary is not always required, if the server sends all the necessary bits.  I see that qemu doesn't send a xml target (registers) description, for example.  Actually I'm not coming up with a reason you'd see that on initial connection if you don't specify a binary.  Absent a description, GDB tries to figure out the architecture from the g packet size.  So if you do see that on initial connection without a binary, then that's something to look at.

Comment 9 Pedro Alves 2014-05-14 19:31:31 UTC
And without a binary, indeed I don't get the error immediately:

 (gdb) tar remote :1234
 Remote debugging using :1234
 0x0000fff0 in ?? ()

only if I let qemu boot for a little while, then I get:

(gdb) c
Continuing.
^CRemote 'g' packet reply is too long: d0bfc7c0ff7f00000b000000000000000000000000000000d0bfc7c0ff7f00000000000000000000e6bfc7c0ff7f000060c5c7c0ff7f0000d0bfc7c0ff7f00000090314d6d7f0000a0b9314d6d7f000000a0314d6d7f000046020000000000000b00000000000000f0bfc7c0ff7f0000d095314d6d7f00000000000000000000fde3104d6d7f000002020000330000002b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002424242424242424242424242424242400000000000000000000000000000000000000000000000000000000000000ff000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000


I wonder whether this isn't the old problem of mode switching?  As in, first qemu is reporting 32-bit mode registers, and then later 64-bit mode?

https://sourceware.org/ml/gdb/2009-01/msg00008.html

Comment 10 Pedro Alves 2014-05-14 19:35:58 UTC
(BTW, GDB has indeed become multi-arch since and every frame has an architecture associated with it, but, there's no x86 support for such in the RSP currently.)

Comment 11 Richard W.M. Jones 2014-05-14 19:42:50 UTC
When a virtual machine (or indeed, a real PC) boots, it switches
through 16 bit (8086), 32 bit and 64 bit (long) modes, as a kind of
"ontogeny recapitulates phylogeny".

Comment 12 Pedro Alves 2014-05-14 19:47:23 UTC
Yes, but from Jan's old post I thought qemu nowadays sticked with 64-bit layout, always.

"(*) QEMU recently decided to stick with 64 bit layout even if the x86-64
target is running in 16 or 32 bit mode. Before that the remote protocol
used to be switched between 32 and 64 bit dynamically, depending on the
current target mode. That solved many issues, but not all (manual 'set
arch' was required, and gdb became confused in a few cases). We are now
discussing again on qemu-devel how to deal with 16/32 bit system-level
debugging in 64 bit emulation environment: either try to improve gdb
quickly or reintroduce the old workaround, at least temporarily."

That was in 2009.  Sounds like qemu decided to change this back?

Or is this not a new bug?  Did this ever work before?

Comment 13 Richard W.M. Jones 2014-05-14 19:49:50 UTC
It worked until fairly recently, probably within the last two months.

Comment 14 Pedro Alves 2014-05-14 19:55:59 UTC
OK.  Would be good to see the RSP log with an older set of tools that worked.  I suspect the change was on the qemu side, but someone should really bisect qemu/gdb looking for whatever change is causing this.  Can't be me though.

Comment 15 Robin Hack 2014-07-16 16:58:15 UTC
(In reply to Pedro Alves from comment #14)
> OK.  Would be good to see the RSP log with an older set of tools that
> worked.  I suspect the change was on the qemu side, but someone should
> really bisect qemu/gdb looking for whatever change is causing this.  Can't
> be me though.

Hi. I can try. I have time and resources.
Is this also upstream bug?

Comment 16 Richard W.M. Jones 2014-07-16 17:05:24 UTC
(In reply to Robin Hack from comment #15)
> (In reply to Pedro Alves from comment #14)
> > OK.  Would be good to see the RSP log with an older set of tools that
> > worked.  I suspect the change was on the qemu side, but someone should
> > really bisect qemu/gdb looking for whatever change is causing this.  Can't
> > be me though.
> 
> Hi. I can try. I have time and resources.
> Is this also upstream bug?

Pretty sure yes.  Fedora follows upstream qemu very closely.

Comment 17 Robin Hack 2014-07-18 09:37:07 UTC
Well. After little bit testing I found maybe interesting think.

I have x86_64 host and guest. Guest is fully booted (64-bit protected mode).

(gdb) target remote :1234
Remote debugging using :1234
312 -> rsa->sizeof_g_packet  1072 - packet size
Remote 'g' packet reply is too long: edffffff00000000d81fc081ffffffff0000000000000001000000000000000000000000000000004600000000000000981ec081ffffffff981ec081ffffffff000000000000000000000000000000000100000000000000a879ed81ffffffff0000000000000000d81fc081ffffffffc002dd81ffffffffd81fc081ffffffff66450581ffffffff8602000010000000180000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002f2f2f2f2f2f2f2f2f2f2f2f2f2f2f2f00000000000000000000000000000000ff0000000000000000000000000000ff000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000
(gdb) set architecture i386:x86-64
The target architecture is assumed to be i386:x86-64
(gdb) target remote :1234
Remote debugging using :1234
544 - rsa->sizeof_g_packet 1072 - packet size
0xffffffff81054566 in ?? ()

Comment 18 Robin Hack 2014-07-18 09:38:02 UTC
(In reply to Robin Hack from comment #17)
> Well. After little bit testing I found maybe interesting think.
*thing of course!

Comment 19 Robin Hack 2014-07-18 09:50:08 UTC
I dumped tcp connection between gdb and qemu gdb:

gdb sends:
+$qSupported:multiprocess+;xmlRegisters=i386;qRelocInsn+#b5+$Hg0#df+$qTStatus#49+$?#3f+$Hc-1#09+$qAttached#8f+$g#67+

gdb receive:
+$PacketSize=1000#f1+$OK#9a+$#00+$T05thread:01;#07+$OK#9a+$#00+$edffffff00000000d81fc081ffffffff0000000000000001000000000000000000000000000000004600000000000000981ec081ffffffff981ec081ffffffff00000000000000000000000000000000000000000000000000000000000000000000000000000000d81fc081ffffffffc002dd81ffffffffd81fc081ffffffff66450581ffffffff8602000010000000180000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002f2f2f2f2f2f2f2f2f2f2f2f2f2f2f2f00000000000000000000000000000000ff0000000000000000000000000000ff000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000#55

Comment 20 Robin Hack 2014-07-19 07:29:46 UTC
QEMU recognizes if remote gdb understand XML just by checking qXfer:features:read: in query packet.

Then I just enabled that feature:
(gdb) set remote target-features-packet  on

And found:
(gdb) target remote :1234
Remote debugging using :1234
Enabled packet qXfer:features:read (target-features) not recognized by stub

I dug in qemu codes (v1.6.2) and I found (file: gdbstub.c):

        if (strncmp(p, "Supported", 9) == 0) {
            snprintf(buf, sizeof(buf), "PacketSize=%x", MAX_PACKET_LENGTH);
            cc = CPU_GET_CLASS(first_cpu);
            if (cc->gdb_core_xml_file != NULL) {
                pstrcat(buf, sizeof(buf), ";qXfer:features:read+");
            }
            put_packet(s, buf);
            break;
        }
        if (strncmp(p, "Xfer:features:read:", 19) == 0) {
            const char *xml;
            target_ulong total_len;

            cc = CPU_GET_CLASS(first_cpu);
            if (cc->gdb_core_xml_file == NULL) {
                goto unknown_command;
            }

            gdb_has_xml = true;

But it looks like (cc->gdb_core_xml_file == NULL) is always true and unknown_command is always reached on my system.

Comment 21 Jan Kratochvil 2014-07-22 09:28:08 UTC
Therefore cc->gdb_core_xml_file is NULL while it should not be NULL.
This seems to be qemu problem.  Or maybe even just qemu build configuration/installation problem.

Comment 22 Robin Hack 2014-07-22 11:51:43 UTC
Yes. I agree.

1) cc->gdb_core_xml is NULL on x86_64 arch
2) I don't see xml arch specific files in gdb-xml
(gdbstub-xml.c files are generated from xml files in gdb-xml directory)

Comment 23 Fedora End Of Life 2015-05-29 11:50:46 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2015-06-29 20:37:08 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 Ciro Santilli 2015-08-12 13:09:38 UTC
Similar on the GDB bugtracker: https://sourceware.org/bugzilla/show_bug.cgi?id=13984t