Bug 870311 - qemu-ga: guest-file-open after fsfreeze hangs qemu-ga
qemu-ga: guest-file-open after fsfreeze hangs qemu-ga
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity medium
: rc
: ---
Assigned To: Laszlo Ersek
Virtualization Bugs
:
: 580449 870703 (view as bug list)
Depends On:
Blocks: 559201 580953 580954
  Show dependency treegraph
 
Reported: 2012-10-26 02:45 EDT by langfang
Modified: 2013-11-13 11:34 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-04-17 11:46:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description langfang 2012-10-26 02:45:58 EDT
Description of problem:
guest can not execute any command after run {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"} when guest frozen.


Version-Release number of selected component (if applicable):
host:
# uname -r
2.6.32-336.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.331.el6.x86_64

guest:
# uname -r
2.6.32-336.el6.x86_64
qemu-guest-agent-0.12.1.2-2.329.el6.x86_64

How reproducible:

100%

1.boot guest with 
-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device  virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 ...
2.on host

# nc -U /tmp/qga.sock

3. in guest
#service qemu-ga start


4.host
# nc -U /tmp/qga.sock

  
Actual results:

# nc -U /tmp/qga.sock
{ "execute": "guest-ping"}
{"return": {}}
{"execute":"guest-fsfreeze-freeze" }
{"return": 5}
{"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
{"error": {"class": "CommandDisabled", "desc": "The command guest-file-open has been disabled for this instance", "data": {"name": "guest-file-open"}}}--->give friendly error for execute this command
{"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}
{ "execute": "guest-ping"}---->can not execute any command any more
{ "execute": "guest-sync"}
{"execute":"guest-fsfreeze-status"}
{"execute":"guest-fsfreeze-thaw"}
{ "execute": "guest-sync"}
{"execute":"guest-fsfreeze-freeze" }



Expected results:


Additional info:
1.when hit the probelm  can not execute any command,check the guest qemu-ga status is running
2.tried to  after run "r+" a file when guest frozen--->give friendly error tip ---->guest can be run other command (example: "guest-ping" "guest-fsfreeze-thaw"}
Comment 1 langfang 2012-10-26 02:57:50 EDT
after hit the problem,will see the guest hang(this is trigger another bug:

Bug 865217 - fsfreeze lead to guest cpu usage continual increases, final guest will full hang
https://bugzilla.redhat.com/show_bug.cgi?id=869993


the log:
...
Starting ksmtuned: [  OK  ]
Starting crond: [  OK  ]
Starting atd: [  OK  ]
Starting virt-who: [  OK  ]
Starting libvirtd daemon: [  OK  ]
Starting rhsmcertd...[  OK  ]
Starting certmonger: [  OK  ]

Red Hat Enterprise Linux Server release 6.4 Beta (Santiago)
Kernel 2.6.32-336.el6.x86_64 on an x86_64

localhost.localdomain login: INFO: task master:1872 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task python:1987 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task NetworkManager:1583 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task master:1872 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task python:1987 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task Xorg:2229 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task rs:main Q:Reg:1436 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task NetworkManager:1583 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task master:1872 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task python:1987 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Comment 2 langfang 2012-10-26 02:59:36 EDT
(In reply to comment #1)
> after hit the problem,will see the guest hang(this is trigger another bug:
> 

sorry .please wait more time the trigger the bug 865217.
> Bug 865217 - fsfreeze lead to guest cpu usage continual increases, final
> guest will full hang
> https://bugzilla.redhat.com/show_bug.cgi?id=869993
> 
> 
> the log:
> ...
> Starting ksmtuned: [  OK  ]
> Starting crond: [  OK  ]
> Starting atd: [  OK  ]
> Starting virt-who: [  OK  ]
> Starting libvirtd daemon: [  OK  ]
> Starting rhsmcertd...[  OK  ]
> Starting certmonger: [  OK  ]
> 
> Red Hat Enterprise Linux Server release 6.4 Beta (Santiago)
> Kernel 2.6.32-336.el6.x86_64 on an x86_64
> 
> localhost.localdomain login: INFO: task master:1872 blocked for more than
> 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task python:1987 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task NetworkManager:1583 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task master:1872 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task python:1987 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task Xorg:2229 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task rs:main Q:Reg:1436 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task NetworkManager:1583 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task master:1872 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task python:1987 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Comment 4 Luiz Capitulino 2012-10-29 08:42:57 EDT
Generally speaking, this is not a bug. The FS is frozen so any FS operation will block the running process.

However, there are two important points here:

1. We try not to get qemu-ga blocked. This means that we could consider to refuse to execute commands that do FS operations when the FS is frozen (ie. return an error). I'm not sure how doable this is though, as this can get complicated to do as the number of commands grow. I'll discuss this upstream.

2. This really seems to be a bug:

{"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
{"error": {"class": "CommandDisabled", "desc": "The command guest-file-open has been disabled for this instance", "data": {"name": "guest-file-open"}}}--->give friendly error for execute this command
{"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}

Did you restart qemu-ga before sending the second command or did it just work?
Comment 5 langfang 2012-10-29 22:31:43 EDT
(In reply to comment #4)
> Generally speaking, this is not a bug. The FS is frozen so any FS operation
> will block the running process.
> 
> However, there are two important points here:
> 
> 1. We try not to get qemu-ga blocked. This means that we could consider to
> refuse to execute commands that do FS operations when the FS is frozen (ie.
> return an error). I'm not sure how doable this is though, as this can get
> complicated to do as the number of commands grow. I'll discuss this upstream.
> 
> 2. This really seems to be a bug:
> 
> {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
> {"error": {"class": "CommandDisabled", "desc": "The command guest-file-open
> has been disabled for this instance", "data": {"name":
> "guest-file-open"}}}--->give friendly error for execute this command
> {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}
> 
> Did you restart qemu-ga before sending the second command or did it just
> work?

1)i did not restart qemu-ga server before  sending the second command .


2)i also tried execute the second command("w+") first when guest frozen,then will hit the same problem (all the command can not execute).
Comment 7 Laszlo Ersek 2013-04-05 06:53:13 EDT
Luiz,

(In reply to comment #4)
> Generally speaking, this is not a bug. The FS is frozen so any FS operation
> will block the running process.

(I agree.)

> However, there are two important points here:
> 
> 1. We try not to get qemu-ga blocked. This means that we could consider to
> refuse to execute commands that do FS operations when the FS is frozen (ie.
> return an error). I'm not sure how doable this is though, as this can get
> complicated to do as the number of commands grow. I'll discuss this upstream.

Can you point me to the discussion please?

BTW I think this should be solved from the documentation side, ie. "after a successful freeze request, the host admin controlling qga is allowed to request only a small set of operations: {..., ..., ..., and finally, thaw }". File ops would definitely be absent from that set.

> 2. This really seems to be a bug:
> 
> {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
> {"error": {"class": "CommandDisabled", "desc": "The command guest-file-open
> has been disabled for this instance", "data": {"name":
> "guest-file-open"}}}--->give friendly error for execute this command
> {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}
> 
> Did you restart qemu-ga before sending the second command or did it just
> work?

Hmmm. The only path I can see on which "guest-file-open" could be reenabled is

qmp_guest_fsfreeze_thaw()
  ga_unset_frozen()
    ga_enable_non_blacklisted()

But thaw was not invoked.
Comment 8 Laszlo Ersek 2013-04-17 11:46:58 EDT
(In reply to comment #0)
> Description of problem:
> guest can not execute any command after run {"execute":"guest-file-open",
> "arguments":{"path":"/tmp/testqga","mode":"w+"} when guest frozen.

> { "execute": "guest-ping"}
> {"return": {}}

> {"execute":"guest-fsfreeze-freeze" }
> {"return": 5}

> {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
> {"error": {"class": "CommandDisabled", "desc": "The command guest-file-open
> has been disabled for this instance", "data": {"name":
> "guest-file-open"}}}--->give friendly error for execute this command

> {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}
> { "execute": "guest-ping"}---->can not execute any command any more

The JSON string

  {
    "execute":"guest-file-open",
    "arguments":{
      "path":"/tmp/testqga",
      "mode":"w+"
    }

is not terminated; the outermost opening parenthesis is not balanced by a corresponding outermost closing parenthesis. The perceived "lockup" is simply that qemu-ga continues to read the command string until the final closing paren arrives.

Executing the steps in comment 0 with the missing closing paren added,

> {"execute":"guest-file-open",
> "arguments":{"path":"/tmp/testqga","mode":"w+"}}
                                                 ^

everything works as expected (friendly error message is returned for the w+ open as well, and further commands are parsed and run). Closing as NOTABUG.
Comment 9 Laszlo Ersek 2013-08-06 17:14:21 EDT
*** Bug 870703 has been marked as a duplicate of this bug. ***
Comment 10 Laszlo Ersek 2013-11-13 11:34:15 EST
*** Bug 580449 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.