My recent post on fixing terminator was months in the making, partly because I lost the environment I was using to double-check all the steps and partly because other things just kept coming up. Several months back when I found the original fix it was only available as a patch in bugzilla. I’d constructed a whole article about how to get the patch, update the spec file, build a new rpm, etc. but in the midst of this I lost the virtual machine I was using twice to a suspend that would not resume. I also ran into some problems using yum-downloader to get the source rpm I could not reliably reproduce or document.
Time went by and an updated source rpm was available to fix vte so it seemed silly to document all the tedious steps to patch the spec file and rebuild when that work was already done, but then I was thwarted again when I rebooted my laptop and forgot I had a guest running.
It was a frustrating situation from the virt-manager GUI and the command line–my only options were resume (which didn’t work because of the error message below) and shut-down (which did the opposite of what I wanted the virtual machine to do).
On Google I found scant references to this situation so I turned to a company-wide mailing list at work where anyone can post technical questions. I had the solution in 30 minutes!
This was the cryptic error message I received in virt-manager when trying to resume the guest:
Error restoring domain: Unable to read from monitor: Connection reset by peer Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/domain.py", line 1050, in startup self._backend.create() File "/usr/lib64/python2.6/site-packages/libvirt.py", line 510, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: Unable to read from monitor: Connection reset by peer
The error message made no sense to me--I can’t resume the domain because the monitor cannot be read? How about, “Unable to resume domain. If this persists try ‘virsh mangedsave-remove’ from a command line to remove the suspended session and reboot your machine.” Including the word “suspend” in the command instead of “managedsave” would also be more intuitive.
Here is the command to remove a corrupted suspended virtual machine session so you can boot your machine again–naturally you’ll lose the suspended session you had:
$ su -c 'virsh managedsave-remove NameOfDomain'
I’ve also confirmed on RHEL 6.2 (beta) that rebooting the hypervisor with a running guests, smoothly suspends and resumes them again when the hypervisor returns.
April 21, 2021 at 12:47 am
Life saver, thanks
January 7, 2019 at 6:36 am
Good command line, excellent work.
February 24, 2018 at 10:43 am
Thanks John..I am able to start my domain 🙂
October 22, 2016 at 8:49 pm
Very helpful ! Thanks for this post! Managed to start corrupted domains.
May 19, 2016 at 8:55 pm
wow 🙂
awesome stuff ..worked like a charm
May 13, 2015 at 5:56 am
Mil gracias
Me salvaste la patria
April 1, 2015 at 12:05 am
Thanks a lot ! So glad that the VMs can be working again
March 26, 2015 at 7:50 am
thanks man, really useful stuff
November 15, 2014 at 4:36 pm
At the risk of being just another “me, too”, many thanks. It’s been a long and frustrating day but this post let me eat dinner a happy man.
November 15, 2014 at 6:29 pm
“Me toos” are welcome any time! Glad it helped and sorry that confusing error message is still out there. I filed a bug a while ago.
March 26, 2014 at 2:14 pm
Thanks John this just worked for me on my Centos server.
I have been getting this for months and couldn’t resolve it.
Once again thanks.
March 26, 2014 at 2:18 pm
Thanks for the feedback Reggie and I’m glad this post helped solve your problem!
August 6, 2013 at 5:58 am
John,
Thanks for taking the time to blog this. Exactly what I was looking for (running a XP VM in CentOS 6). Kudos and regards to the RH team 🙂
Steve
August 5, 2013 at 11:22 am
I am using the latest LinuxMint. I had a VM working fine but then it stopped working (won’t start anymore) for no aparent reason. I have not fiddled with anything. I only used default settings that came with the Virtual Machine Manager. I have 8GB on the host, and 2GB for the VM. There are no suspended images. I am attempting a clean startup. What I get is:
Error starting domain: Unable to read from monitor: Connection reset by peer
Traceback (most recent call last):
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 96, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 117, in tmpcb
callback(*args, **kwargs)
File “/usr/share/virt-manager/virtManager/domain.py”, line 1092, in startup
self._backend.create()
File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 681, in create
if ret == -1: raise libvirtError (‘virDomainCreate() failed’, dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer
Any help would be greatly appreciated.
Blake McBride
May 12, 2014 at 12:39 am
This happened to me too, moments ago, and the solution in my case was to remove a certain Filesystem Passthrough hardware “device” that I had added, but the path that was shared on the host was no longer available, apparently causing this weird error message.
October 27, 2014 at 2:58 am
Thanks for the hint, my problem was similar to what you describe. But in my case it was because virsh didn’t accepted the qcow2 format of the attached device.
August 1, 2013 at 9:18 pm
is this problem related to img file of OS..??
August 2, 2013 at 7:38 am
Sorry, I don’t know.
August 1, 2013 at 3:08 pm
This just worked for me on a CentOS 6.4-based environment. I rebooted the server, with all the KVM instances going into suspended mode. When coming up, one VM showed this behavior. *Luckily* of all the 13 VM instances, this one was the less critical… Command given from inside the virsh shell. All OK:
virsh # managedsave-remove ORION
Removed managedsave image for domain ORION
Thanks
July 31, 2013 at 10:16 pm
Im installing a feather linux (a lightweight OS).. because I have to run many VMs on Xen hypervisor so i used virt-manager to install OS…But after first it is installed , second time when i started it after shutting it down, it says no bootable device.
July 31, 2013 at 2:13 am
I have installed VM on xen hypervisor using virt-manager. when i start the vm after shutting it down , it gives me an error saying “no bootable device find”. what am i missing ? is there something related to its setting??
July 31, 2013 at 9:49 pm
Are you booting a LiveCD? If so, that will only work once. After that it expects to boot from a regular device. I’ve always thought this was a really weird and unfinished part of virt-manager.
July 31, 2013 at 9:52 pm
I have given an iso file in virt-manager. the OS is installed but it isnt booted when i start the VM next time…The boot priority order is Hard disk…
July 31, 2013 at 9:56 pm
and one thing Im not booting from Live CD
July 31, 2013 at 10:13 pm
Very strange. I guess I would try the whole process again. What OS are you installing? Are you setting a bootable partition with the installer? If it’s Fedora 18 or 19 the installer is very confusing.
Probably not related, but I recently saw a problem where Fedora 18 installed, but would not boot on two different brand new Lenovo T430s. Fedora 19 did.
July 29, 2013 at 9:45 am
This post really pulled me out of the fire. I’m in involved in validation for server hardware at a major company and my admin stations are in VMs on a RHEL workstation. I had my host power cycle due to a fat finger on my part and when it came back up one of my main admin VMs was broken. Being behind as it is I certainly didn’t’ have time to rebuild the VM, this post played a role in making sure this project didn’t break down. Thanks!
July 31, 2013 at 9:49 pm
Love to hear it! Well, not that you had problems, but that “my pain was your gain.” Hope the rest of your project goes well.
July 16, 2013 at 5:21 pm
thanks a lot! it works for me. thanks for google too.
June 18, 2013 at 1:32 pm
Correction did not work for me. I generated this error by accidently checking read only on IDE disk 1 in details.
April 9, 2013 at 8:46 am
This same message occurs if one overallocates memory to the VM. I got this message and it took a while before I realized that the issue was not a suspended machine, but overallocation of memory!!
April 7, 2013 at 5:44 pm
I filed a bug for this issue because the comments continue to come in on this blog post–is there anyone out there that that has a reliable reproducer to get into the error state?
https://bugzilla.redhat.com/show_bug.cgi?id=892007
February 27, 2013 at 10:05 am
This saved my work!!! Thanks!!!
February 20, 2013 at 8:33 am
Seriously, thanks for posting this. Hit the error, googled, found this and had my problem fixed in under a minute. 🙂
February 15, 2013 at 8:51 am
Besides, i have also this error message
Error restoring domain: operation failed: failed to read qemu header
Traceback (most recent call last):
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 44, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 65, in tmpcb
callback(*args, **kwargs)
File “/usr/share/virt-manager/virtManager/domain.py”, line 1050, in startup
self._backend.create()
File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 511, in create
if ret == -1: raise libvirtError (‘virDomainCreate() failed’, dom=self)
libvirtError: operation failed: failed to read qemu header
February 15, 2013 at 8:50 am
I have a problem, when i run the command it says
error: unknown command: ‘managedsave-remove’
and its appear the same problem when i run inside virsh
or outside.
January 21, 2013 at 3:55 pm
Thank you for this information. It saved me a lot of time.
January 11, 2013 at 5:57 am
Thanks for sharing!!!
January 4, 2013 at 10:46 am
Awesome post. Helped resolve by long lasting VM issue!
November 8, 2012 at 12:42 am
I hit the same issue,but can not solve it with this solution.Anyone can help me?
virsh start test001
error: Failed to start domain test001
error: Unable to read from monitor: Connection reset by peer
virsh managedsave-remove test001
Domain test001 has no manage save image; removal skipped
November 19, 2012 at 1:30 pm
Tianchen,
Make sure you have enough ram to allocate another instance.
November 1, 2012 at 12:51 pm
Another heartfelt thank you!
October 17, 2012 at 7:59 am
I had an instance in the ‘Shutoff’ state according to virt-manager. it wouldn’t start, giving the same error. (This is after a reboot of the host)
I had the exact same exception trace as you though and your answer also solved my issue! Thanks!
FTR, the only output from virsh was:
virsh # start NameOfInstance
error: Failed to start domain NameOfInstance
error: Unable to read from monitor: Connection reset by peer
October 17, 2012 at 8:55 am
Thanks for your feedback Joseph. Glad this post helped solved your problem and hopefully your comment will help other people too 🙂
August 21, 2012 at 2:14 pm
Thank you. This solved the issue of my guest not starting up but instead qemu spitting out these lines in /var/log/libvirt/qemu/my.domain.log:
qemu: warning: error while loading state section id 3
load of migration failed
July 23, 2012 at 8:44 am
Thought I had this problem because I got the same errors when rebooting the host. Guests would not resume. Could only clear saved state and boot a new instance of each guest.
Turns out my images were stored someplace other than the default location (/var/lib/libvirt/images). Once I updated the pool location, the resume on host reboot worked as expected.
July 24, 2012 at 6:07 am
Sorry, I posted this too quickly. Changing the storage location did not solve the problem.
When rebooting the host, guests are paused, but the will not resume after host reboot complete.
I tried moving the guests to another host with same OS, and do not see this problem. Therefore I think it has something to do with the system, but have not figured out what that is yet.
July 19, 2012 at 1:00 pm
Thanks a lot u .Hats off ,john u rock
July 19, 2012 at 10:49 am
Just hit this problem on SL 6.2 running kernel 2.6.32-220.23.1.el6.x86_64 and libvirt-0.9.10-21.el6.x86_64. Is this corruption on shutdown issue a problem with the guest OS, or kvm/qemu/kernel of the host?
July 20, 2012 at 12:29 am
It appears to be a “corruption on suspend” problem in the sense that you can’t resume the guest. I have no idea what causes it. I’m running latest RHEL 6.3 packages and have not seen the problem again for a long time. I’m also not sure what causes it. Sorry I couldn’t be of more help.
July 15, 2012 at 4:10 pm
Thank you very much, for sharing this INFO.
After Updating CentOS 6.2 to 6.3 the Guests not automatically starting up again. But with this nice virsh command everything was quickly solved. Greetings.
July 14, 2012 at 9:54 pm
er I meant I was glad to find the article.. and chalk up my not knowing it needed more info to me being less experienced. Yup.. i need caffeine..
July 14, 2012 at 9:53 pm
no problem. I was glad to find it. Chalk it up to me being less experienced too 😉
July 14, 2012 at 9:45 pm
Thanks so much Bruce! I will give that a whirl when I am a bit more caffeinated and less yawning!
July 14, 2012 at 9:49 pm
Sorry about that Erica. My example was misleading (I’ve updated it) and Bruce has it right, you have to supply the domain name want to manage.
July 14, 2012 at 4:37 pm
erica
I ran into the same problem. I could reboot the host without shutting down the vms on Centos 6.2 so I figured I could do it on 6.3…..not.
The Option it is talking about is the name of the virtual machine.
I accomplished this by running the following commands.
[root@gwdt ~]# virsh
Welcome to virsh, the virtualization interactive terminal.
Type: ‘help’ for help with commands
‘quit’ to quit
virsh # managedsave-remove win7
Removed managedsave image for domain win7
virsh #
***** win7 is the name of the virtual that would not resume.
July 13, 2012 at 10:16 pm
Thank you very much !! You saved my 3 VM
July 13, 2012 at 7:17 am
I should have mentioned RHEL 6.3
July 13, 2012 at 7:10 am
Hi, I ran into this problem with a RHEL machine at work. Thank you for the fix.
I did run into an issue trying to run the fix:
error: command ‘managedsave-remove’ requires option
I am still researching it. Will let you know if I find it. Thanks!
July 11, 2012 at 7:53 pm
Thank you! Panic mode over.
July 1, 2012 at 11:09 am
Ran into this same issue as well while setting up Spacewalk. Very glad I don’t have to start from a previous snapshot or from scratch! Nice find!
June 5, 2012 at 12:52 pm
Thanks John.!
It solved my issue on 2 different scenarios. It happened when I reset my KVM hosts (gracefully) but after reboot the VM’s doesn’t come up. My questioned is if you say its corrupted how could we possible retrieve the corrupted VM ? so it not the data corruption ?
June 5, 2012 at 4:37 pm
Excellent! I’m glad to hear I’ve saved other people time and pain. 🙂
My (limited technical) understanding is that that saved (suspended) session information is corrupted, but the VM’s main disk and machine configuration is not–kind of like if you suspend a notebook and the suspended session doesn’t resume. After a hard reset the notebook boots normally again but that session is lost.
June 4, 2012 at 12:17 am
Thanks this was my problem.
you fixed it and iam very happy now 🙂
April 12, 2012 at 6:58 pm
Thank you! Good thing I landed here searching for a fix.
January 28, 2012 at 10:04 am
I was able to solve this issue within minutes thanks to your post 😉
January 30, 2012 at 7:19 am
I’m glad! That’s why I like to post stuff like this, so other people don’t have to experience as much pain and frustration as I did.
November 28, 2011 at 5:49 am
THANK YOU. Busy setting up a demo for an exhibition and had this error – last thing you want to see when you have 2 hours left to set up!!