Google Cloud Platform
Compute Engine

Tips, Troubleshooting, & Known Issues

This page describes tips, troubleshooting, and known issues that you might find helpful if you run into problems using Google Compute Engine.

Contents

General tips

Viewing different response formats

gcloud compute performs most of its actions by making REST API calls. The pretty-printed results show only the most important information returned by any specific command. To see the different response output formats, use the --format flag which displays the format in a different format than the pretty-printed version. Different output formats include json, yaml, and text. For example, to see a list of instances in JSON, use --format json:

gcloud compute instances list --format json

Viewing gcloud compute logs

gcloud compute creates and stores logs in a log file that you can query, located at $HOME/.config/gcloud/logs. To see the latest log file on a Linux-based operating system, run:

$ less $(find ~/.config/gcloud/logs | sort | tail -n 1)

The log file includes information about all requests and responses made using the gcloud compute tool.

Selecting resource names

When selecting names for your resources, keep in mind that these friendly-names may be visible on support and operational dashboards within Google Compute Engine. For this reason, it is recommended that resource names that do not expose any sensitive information.

Communicating between your instances and the Internet

An instance has direct Internet access only if it has an external IP address. An instance with an external IP can always initiate connections to the Internet. It can also receive connections, provided that a firewall rule is configured to allow access. You can add a custom firewall rule to the default network, or add a new network with custom firewalls. In addition, you can set up a network proxy within the virtual network environment in order to provide proxied access from an instance without an external IP address.

Note that idle TCP connections are disconnected after 10 minutes. If your instance initiates or accepts long-lived connections with an external host, you can adjust TCP keep-alive settings to prevent these timeouts from dropping connections. You can configure the keep-alive settings on the Compute Engine instance, your external client, or both, depending on the host that typically initiates the connection. You should set the keep-alives to less than 600 seconds to ensure that connections are refreshed before the timeout occurs. The following examples sets the keep-alives to one minute (60 seconds). Note that applications running on Linux systems don't enable keep-alive by default. Thus server or client need to explicitly set the SO_KEEPALIVE socket option when opening TCP connections (see also Linux TCP Keepalive HOWTO).

Compute Engine instance or Linux client


Run the following command:

sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5
To ensure that the settings survive a reboot, add the settings to your /etc/sysctl.conf file.

Mac OSX client


Run the following command:

sudo sysctl -w net.inet.tcp.always_keepalive=1 net.inet.tcp.keepidle=60000 net.inet.tcp.keepinit=60000 net.inet.tcp.keepintvl=60000

Windows client


Under the registry path HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\, add the following settings, using the DWORD data type, or edit the values if the settings already exist:

KeepAliveInterval: 1000
KeepAliveTime: 60000
TcpMaxDataRetransmissions: 10

Accessing Google Compute Engine as a different SSH user

By default, gcloud compute uses the $USER variable to add users to the /etc/passwd file for connecting to virtual machine instances using SSH. You can specify a different user using the --ssh-key-file PRIVATE_KEY_FILE flag when running the gcloud compute ssh command. For example:

gcloud compute ssh example-instance --ssh-key-file my-private-key-file

See the gcloud reference documentation for more information.

Avoiding packet fragmentation to instances built from custom images

The Google Compute Engine network has a maximum transmission unit (MTU) of 1460 bytes. The operating system images provided by Compute Engine are configured with this MTU, so no action is required if you're using one of those images. For custom images, set the MTU to 1460 to avoid increased latency and packet overhead caused by fragmentation.

When creating client applications that communicate with Compute Engine instances over UDP sockets, send a maximum payload of 1432 bytes to avoid fragmentation.

Troubleshooting

My instance will not start up. What can I do?

Here are some tips to help troubleshoot your persistent boot disk if it doesn't boot.

  • Examine your virtual machine instance's serial port output.

    An instance's BIOS, bootloader, and kernel will print their debug messages into the instance's serial port output, providing valuable information about any errors or issues that the instance experienced. To get your serial port information, run:

    gcloud compute instances get-serial-port-output INSTANCE
    

    You can also access this information in the Google Cloud Platform Console:

    1. Go to VM instances page in the Cloud Platform Console.
    2. Click the instance that is not booting up.
    3. On the instance's page, scroll to the bottom and click Serial console output.
  • Validate that your disk has a valid file system.

    If your file system is corrupted or otherwise invalid, you won't be able to launch your instance. Validate your disk's file system:

    1. Detach the disk in question from any instance it is attached to, if applicable:

      gcloud compute instances delete old-instance --keep-disks boot
      
    2. Start a new instance with the latest Google-provided image:

      gcloud compute instances create debug-instance --image debian-7
      
    3. Attach your disk as a non-boot disk but don't mount it. Replace DISK with the name of the disk that won't boot. Note that we also provide a device name so that the disk is easily identifiable on the instance:

      gcloud compute instances attach-disk debug-instance --disk DISK --device-name debug-disk
      
    4. Connect to the instance:

      gcloud compute ssh debug-instance
      
    5. Look up the root partition of the disk, which is identified with the part1 notation. In this case, the root partition of the disk is at /dev/sdb1:

      $ user@debug-instance:~$ ls -l /dev/disk/by-id
      total 0
      lrwxrwxrwx 1 root root  9 Jan 22 17:09 google-debug-disk -> ../../sdb
      lrwxrwxrwx 1 root root 10 Jan 22 17:09 google-debug-disk-part1 -> ../../sdb1
      lrwxrwxrwx 1 root root  9 Jan 22 17:02 google-persistent-disk-0 -> ../../sda
      lrwxrwxrwx 1 root root 10 Jan 22 17:02 google-persistent-disk-0-part1 -> ../../sda1
      lrwxrwxrwx 1 root root  9 Jan 22 17:09 scsi-0Google_PersistentDisk_debug-disk -> ../../sdb
      lrwxrwxrwx 1 root root 10 Jan 22 17:09 scsi-0Google_PersistentDisk_debug-disk-part1 -> ../../sdb1
      lrwxrwxrwx 1 root root  9 Jan 22 17:02 scsi-0Google_PersistentDisk_persistent-disk-0 -> ../../sda
      lrwxrwxrwx 1 root root 10 Jan 22 17:02 scsi-0Google_PersistentDisk_persistent-disk-0-part1 -> ../../sda1
      
    6. Run a file system check on the root partition:

      $ user@debug-instance:~$ sudo fsck /dev/sdb1
      fsck from util-linux 2.20.1
      e2fsck 1.42.5 (29-Jul-2012)
      /dev/sdb1: clean, 19829/655360 files, 208111/2621184 blocks
      
    7. Mount your file system:

      $ user@debug-instance:~$ sudo mkdir /mydisk
      $ user@debug-instance:~$ sudo mount /dev/sdb1 /mydisk
      
    8. Check that the disk has kernel files:

      $ user@debug-instance~:$ ls /mydisk/boot/vmlinuz-*
      /mydisk/boot/vmlinuz-3.2.0-4-amd64
      
  • Validate that the disk has a valid master boot record (MBR).

    Run the following command on the debug instance that has attached the persistent boot disk, such as /dev/sdb1:

    $ sudo parted /dev/sdb1 print
    

    If your MBR is valid, it should list information about the filesystem:

    Disk /dev/sdb1: 10.7GB
    Sector size (logical/physical): 512B/4096B
    Partition Table: loop
    
    Number  Start  End     Size    File system  Flags
     1      0.00B  10.7GB  10.7GB  ext4
    

What does it mean for my instance to be in TERMINATED state?

A TERMINATED instance is a stopped instance that can be restarted later. Uptime for a TERMINATED instance is not billed. For more information, see Stopping or Deleting an Instance.

Why is network traffic to/from my instance being dropped?

Google Compute Engine only allows network traffic that is explicitly permitted by your project's firewall rules to reach your instance. By default, all projects automatically come with a default network that only allows SSH and internal Compute Engine traffic. If you deny all traffic by default, that will also deny SSH connections and all internal traffic. For more information, see Networking page.

In addition, you may need to adjust TCP keep-alive settings to work around the default idle connection timeout of 10 minutes. For more information, see Communicating between your instances and the Internet.

Troubleshooting SSH errors

Under certain conditions, it is possible a Google Compute Engine instance will no longer accept SSH connections. There are many reasons this could happen, from a full disk to an accidental misconfiguration of sshd. If this happens, accessing the instance can be quite challenging. This section describes a number of tips and approaches to troubleshoot and resolve common SSH issues.

Check your firewall rules

Google Compute Engine provisions each project with a default set of firewall rules which permit SSH traffic. If the default firewall rule that permits SSH connections is somehow removed, you'll be unable to access your instance. Check your list of firewalls with gcloud compute and ensure the default-allow-ssh rule is present. If it is missing, add it back:

gcloud compute firewall-rules list
NAME                   NETWORK SRC_RANGES    RULES                        SRC_TAGS TARGET_TAGS
default-allow-icmp     default 0.0.0.0/0     icmp
default-allow-internal default 10.240.0.0/16 tcp:1-65535,udp:1-65535,icmp
gcloud compute firewall-rules create default-allow-ssh --allow tcp:22

Test the network

You can use the netcat tool to connect to your instance on port 22, and see if the network connection is working. If you connect and see an ssh banner (e.g. SSH-2.0-OpenSSH_6.0p1 Debian-4), your network connection is working, and you can rule out firewall problems. First, use the gcloud tool to obtain the external natIP for your instance:

gcloud compute instances describe example-instance --format yaml | grep natIP
natIP: 108.59.82.95

Use the nc command to connect to your instance:

user@local:~$ nc 173.255.115.70 22 # Check for SSH banner
SSH-2.0-OpenSSH_6.0p1 Debian-4

Try a fresh user

The issue that prevents you from logging in might be limited to your account (e.g. if the permissions on the ~/.ssh/authorized_keys file on the instance were set incorrectly).

The first thing to try is creating a new account on the instance. Because gcloud compute sets up keys and accounts based on your username, the easiest way to do this is to create a new instance (using a f1-micro machine type is fine), log in, add a new user, and switch to this user's account. Then, you can use gcloud compute to try to connect to your existing instance. If this works, you will be able to use this new account to fix the permissions on your primary user's account.

  1. Create the instance:

    gcloud compute instances create temp-machine --scopes compute-rw
    
  2. Connect to the instance:

    gcloud compute ssh temp-machine
    
  3. On the instance, create a new user:

    user@temp-instance:~$ sudo useradd -m tempuser
    
  4. Connect to the instance with the new user:

    user@temp-instance:~$ sudo su - tempuser
    
    user@temp-instance:~$ gcloud compute ssh example-instance
    

Use your disk on a new instance

If the above set of steps doesn't work for you, and the instance you're interested in is booted from a persistent disk, you can detach the persistent disk and attach this disk to use on new instance. Replace DISK in the following example with your disk name:

gcloud compute instances delete old-instance --keep-disk boot
gcloud compute instances create new-instance --disk name=DISK boot=yes auto-delete=no
gcloud compute ssh new-instance

Inspect an instance without shutting it down

You might have an instance you can't connect to that continues to correctly serve production traffic. In this case, you might want to inspect the disk without interrupting the instance's ability to serve users. First, take a snapshot of the instance's boot disk, then create a new disk from that snapshot, create a temporary instance, and finally attach and mount the new persistent disk to your temporary instance to troubleshoot the disk.

  1. Create a new network to host your cloned instance:

    gcloud compute networks create debug-network
    
  2. Add a firewall rule to allow SSH connections to the network:

    gcloud compute firewall-rules create debug-network-allow-ssh --allow tcp:22
    
  3. Create a snapshot of the disk in question, replacing DISK with the disk name:

    gcloud compute disks snapshot DISK --snapshot-name debug-disk-snapshot
    
  4. Create a new disk with the snapshot you just created:

    gcloud compute disks create example-disk-debugging --source-snapshot debug-disk-snapshot
    
  5. Create a new debugging instance without an external IP address:

    gcloud compute instances create debugger --network debug-network --no-address
    
  6. Attach the debugging disk to the instance:

    gcloud compute instances attach-disk debugger --disk example-disk-debugging
    
  7. Follow the instructions to connect to an instance without an external IP address.

  8. Once logged into the debugger instance, troubleshoot the instance. For example, you can look at the instance logs:

    user@debugger:~$ sudo su -
    user@debugger:~$ mkdir /mnt/myinstance
    user@debugger:~$ mount /dev/disk/by-id/scsi-0Google_PersistentDisk_example-disk-debugging /mnt/myinstance
    user@debugger:~$ cd /mnt/myinstance/var/log
    user@debugger:~$ ls # Identify the issue preventing ssh from working
    

Use a startup script

If none of the above helped, you can create a startup script to collect information right after boot time. Follow the instructions for running a startup script.

Afterwards, you will also need to reset your instance before the metadata will take affect using gcloud compute instances reset. Alternatively, you can also recreate your instance with a diagnostic startup script:

  1. Run gcloud compute instances delete with the --keep-disks flag.

    gcloud compute instances delete INSTANCE --keep-disks boot
    
  2. Add a new instance with the same disk and specify your startup script.

    gcloud compute instances create example-instance --disk name=DISK boot=yes --startup-script-url URL
    

As a starting point, you can use the compute-ssh-diagnostic script to collect diagnostics information for most common issues.

Known Issues

CentOS image v20131120 introduced a breaking change where iptables are turned on by default.

The v20131120 release of CentOS 6 image, centos-6-v20131120, has a breaking change where iptables are turned on by default. This prevents external traffic from reaching CentOS instances that are running centos-6-v20131120, even if there is a relevant Firewall Rule resource permitting the connection.

As a workaround, users will need to disable iptables or update iptables to permit the desired connection (in addition to permitting the traffic using firewall rules). To disable iptables, run:

# Save your iptable settings
user@centos-instance:~$ sudo service iptables save

# Stop the iptables service
user@centos-instance:~$ sudo service iptables stop

# Disable iptables on start up
user@centos-instance:~$ sudo chkconfig iptables off

To update iptables, review the iptables documentation.

Google-provided images have known bug with the ext4/scsi driver in the stable Debian and CentOS kernels

A known ext4 bug may cause memory leak and eventual crash of a virtual machine instance under heavy persistent disk load. Both centos-6-v20131120 and debian-7-wheezy-v20131120 images are affected. For details, please refer to this Linux Kernel Mailing list thread.

Instance names longer than 32 characters can cause problems with various UNIX tools.

Date Reported: June 2012

Although instance names can be up to 63 characters, names that are longer than 32 characters may cause some tools to be unreliable, including tools that may run during boot. As a workaround, choose instance names that are shorter than 32 characters.