Optimizing persistent disk performance

Optimizing persistent disks

Persistent disks give you the performance described in the disk type chart if the VM drives usage that is sufficient to reach the performance caps. After you size your persistent disk volumes to meet your performance needs, your app and operating system might need some tuning.

In the following sections, we describe a few key elements that can be tuned for better performance and how to apply some of them to specific types of workloads.

Limit heavy I/O loads to a 50 TB span

Heavy I/O loads achieve maximum performance when limited to a 50 TB span. Spans on separate persistent disks that add up to 50 TB or less can be considered equal to a single 50 TB span for performance purposes. A span refers to a contiguous range of logical block addresses on a single physical disk.

Disable lazy initialization and enable DISCARD commands

Persistent disks support DISCARD or TRIM commands, which allow operating systems to inform the disks when blocks are no longer in use. DISCARD support allows the OS to mark disk blocks as no longer needed, without incurring the cost of zeroing out the blocks.

On most Linux operating systems, you enable DISCARD when you mount a persistent disk to your instance. Windows Server 2012 R2 instances enable DISCARD by default when you mount a persistent disk. Windows Server 2008 R2 does not support DISCARD.

Enabling DISCARD can boost general runtime performance, and it can also speed up the performance of your disk when it is first mounted. Formatting an entire disk volume can be time consuming, so "lazy formatting" is a common practice. The downside of lazy formatting is that the cost is often then paid the first time the volume is mounted. By disabling lazy initialization and enabling DISCARD commands, you can get fast format and mount.

  • Disable lazy initialization and enable DISCARD during format by passing the following parameters to mkfs.ext4:

    -E lazy_itable_init=0,lazy_journal_init=0,discard
    

    The lazy_journal_init=0 parameter does not work on instances with CentOS 6 or RHEL 6 images. For those instances, format persistent disks without that parameter.

    -E lazy_itable_init=0,discard
    
  • Enable DISCARD commands on mount by passing the following flag to the mount command:

    -o discard
    

Persistent disks work well with the discard option enabled. However, you can optionally run fstrim periodically in addition to, or instead of using the discard option. If you do not use the discard option, run fstrim before you create a snapshot of your disk. Trimming the file system lets you create smaller snapshot images, which reduces the cost of storing snapshots.

I/O queue depth

Many apps have settings that affect their I/O queue depth. Higher queue depths increase IOPS but can also increase latency. Lower queue depths decrease per-I/O latency, but might result in lower maximum IOPS.

Readahead cache

To improve I/O performance, operating systems employ techniques such as readahead, where more of a file than was requested is read into memory with the assumption that subsequent reads are likely to need that data. Higher readahead increases throughput at the expense of memory and IOPS. Lower readahead increases IOPS at the expense of throughput.

On Linux systems, you can get and set the readahead value with the blockdev command:

$ sudo blockdev --getra /dev/[DEVICE_ID]
$ sudo blockdev --setra [VALUE] /dev/[DEVICE_ID]

The readahead value is <desired_readahead_bytes> / 512 bytes.

For example, for an 8-MB readahead, 8 MB is 8388608 bytes (8 * 1024 * 1024).

8388608 bytes / 512 bytes = 16384

You set blockdev to 16384:

$ sudo blockdev --setra 16384 /dev/[DEVICE_ID]

Free CPUs

Reading and writing to persistent disk requires CPU cycles from your VM. To achieve very high, consistent IOPS levels, you must have CPUs free to process I/O.

IOPS-oriented workloads

Databases, whether SQL or NoSQL, have usage patterns of random access to data. Google recommends the following values for IOPS-oriented workloads:

  • I/O queue depth values of 1 per each 400–800 IOPS, up to a limit of 64 on large volumes

  • One free CPU for every 2,000 random read IOPS and 1 free CPU for every 2,500 random write IOPS

Lower readahead values are typically suggested in best practices documents for MongoDB, Apache Cassandra, and other database applications.

Throughput-oriented workloads

Streaming operations, such as a Hadoop job, benefit from fast sequential reads, and larger I/O sizes can increase streaming performance. For throughput-oriented workloads, we recommend I/O sizes of 256 KB or greater.

Optimizing standard persistent disk performance

To achieve maximum throughput levels for standard persistent disks consistently, use the following best practices:

  • Use parallel sequential IO streams when possible

    Use sequential IO on standard persistent disks because the system is designed to optimize IO performance for sequential disk access similar to a real HDD hard drive.

    Distributing IO across multiple sequential streams will improve performance significantly. To achieve the best level of consistency, use 8 or more sequential streams.

  • Use large IO size

    Standard persistent disks provide very high throughput at the limit. To ensure IOPS limits and latency don't bottleneck your application performance, use a minimum IO size of 256 KB or higher.

    Use large stripe sizes for distributed file system applications. A random IO workload using large stripe sizes(e.g 4MB+) will achieve great performance on standard persistent disks due to how closely the workload mimics multiple sequential stream disk access.

  • Make sure to provide I/O with enough parallelism

    Use as high queue depth as possible such that you're leveraging the parallelism of the OS. Using a high enough queue depth is especially important for standard persistent disks to allow achieving throughput at the limit without bottlenecking your application by IO latency.

Optimizing SSD persistent disk performance

The performance by disk type chart describes the expected, maximum achievable performance for solid-state persistent disks. To optimize your apps and VM instances to achieve these speeds, use the following best practices:

  • Make sure your app is generating enough I/O

    If your app is generating fewer IOPS than the limit described in the earlier chart, you won't reach that level of IOPS. For example, on a 500-GB disk, the expected IOPS limit is 15,000 IOPS. However, if you generate fewer IOPS than that or the I/O operations are larger than 8 KB, you won't achieve 15,000 IOPS.

  • Make sure to provide I/O with enough parallelism

    Use a high-enough queue depth such that you're leveraging the parallelism of the OS. If you provide 1,000 IOPS but do so in a synchronous manner - with a queue depth of 1, you will achieve far fewer IOPS than the limit described in the chart. At a minimum, your app should have a queue depth of at least 1 per every 400–800 IOPS.

  • Make sure there is enough available CPU on the instance that is generating the I/O

    If your VM instance is starved for CPU, your app won't be able to manage the IOPS described earlier. We recommend that you have one available CPU for every 2,000–2,500 IOPS of expected traffic.

  • Make sure your app is optimized for a reasonable temporal data locality on large disks

    If your app accesses data that is distributed across different parts of a disk over a short period of time (hundreds of GB per vCPU), you won't achieve optimal IOPS. For best performance, optimize for temporal data locality, weighing factors like the fragmentation of the disk and the randomness of accessed parts of the disk.

  • Make sure the I/O scheduler in the OS is configured to meet your specific needs

    On Linux-based systems, you can set the I/O scheduler to noop to achieve the highest number of IOPS on SSD-backed devices.

Benchmarking SSD persistent disk performance

The following commands assume a 2,500 GB PD-SSD device. If your device size is different, modify the value of the --filesize argument. This disk size is necessary to achieve the 32 vCPU VM throughput limits.

    # Install dependencies
    sudo apt-get update
    sudo apt-get install -y fio
  1. Fill the disk with nonzero data. Persistent disk reads from empty blocks have a latency profile that is different from blocks that contain data. We recommend filling the disk before running any read latency benchmarks.

    # Running this command causes data loss on the second device.
    # We strongly recommend using a throwaway VM and disk.
    sudo fio --name=fill_disk \
      --filename=/dev/sdb --filesize=2500G \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=128K --iodepth=64 --rw=randwrite
    
  2. Test write bandwidth by performing sequential writes with multiple parallel streams (8+), using 1 MB as the I/O size and having an I/O depth that is greater than or equal to 64.

    # Running this command causes data loss on the second device.
    # We strongly recommend using a throwaway VM and disk.
    sudo fio --name=write_bandwidth_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=1M --iodepth=64 --rw=write --numjobs=8 --offset_increment=100G
    
  3. Test write IOPS. To achieve maximum PD IOPS, you must maintain a deep I/O queue. If, for example, the write latency is 1 millisecond, the VM can achieve, at most, 1,000 IOPS for each I/O in flight. To achieve 15,000 write IOPS, the VM must maintain at least 15 I/Os in flight. If your disk and VM are able to achieve 30,000 write IOPS, the number of I/Os in flight must be at least 30 I/Os. If the I/O size is larger than 4 KB, the VM might reach the bandwidth limit before it reaches the IOPS limit.

    # Running this command causes data loss on the second device.
    # We strongly recommend using a throwaway VM and disk.
    sudo fio --name=write_iops_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=4K --iodepth=64 --rw=randwrite
    
  4. Test write latency. While testing I/O latency, the VM must not reach maximum bandwidth or IOPS; otherwise, the observed latency won't reflect actual persistent disk I/O latency. For example, if the IOPS limit is reached at an I/O depth of 30 and the fio command has double that, then the total IOPS remains the same and the reported I/O latency doubles.

    # Running this command causes data loss on the second device.
    # We strongly recommend using a throwaway VM and disk.
    sudo fio --name=write_latency_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=4K --iodepth=4 --rw=randwrite
    
  5. Test read bandwidth by performing sequential reads with multiple parallel streams (8+), using 1 MB as the I/O size and having an I/O depth that is equal to 64 or greater.

    sudo fio --name=read_bandwidth_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=1M --iodepth=64 --rw=read --numjobs=8 --offset_increment=100G
    
  6. Test read IOPS. To achieve the maximum PD IOPS, you must maintain a deep I/O queue. If, for example, the I/O size is larger than 4 KB, the VM might reach the bandwidth limit before it reaches the IOPS limit. To achieve the maximum 100k read IOPS, specify --iodepth=256 for this test.

    sudo fio --name=read_iops_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=4K --iodepth=256 --rw=randread
    
  7. Test read latency. It's important to fill the disk with data to get a realistic latency measurement. It's important that the VM not reach IOPS or throughput limits during this test because after the persistent disk reaches its saturation limit, it pushes back on incoming I/Os and this is reflected as an artificial increase in I/O latency.

    sudo fio --name=read_latency_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --bs=4K --iodepth=4 --rw=randread
    
  8. Test sequential read bandwidth.

    sudo fio --name=read_bandwidth_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --numjobs=4 --thread --offset_increment=500G \
      --bs=1M --iodepth=64 --rw=read
    
  9. Test sequential write bandwidth.

    sudo fio --name=write_bandwidth_test \
      --filename=/dev/sdb --filesize=2500G \
      --time_based --ramp_time=2s --runtime=1m \
      --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
      --numjobs=4 --thread --offset_increment=500G \
      --bs=1M --iodepth=64 --rw=write
    

What's next

このページは役立ちましたか?評価をお願いいたします。

フィードバックを送信...

Compute Engine Documentation