perfdiag - Run performance diagnostic

Synopsis

gsutil perfdiag [-i in.json]
gsutil perfdiag [-o out.json] [-n objects] [-c processes]
    [-k threads] [-p parallelism type] [-y slices] [-s size] [-d directory]
    [-t tests] [-j ratio] gs://<bucket_name>...

Description

The perfdiag command runs a suite of diagnostic tests for a given Cloud Storage bucket.

The bucket_name parameter must name an existing bucket to which the user has write permission. Several test files will be uploaded to and downloaded from this bucket. All test files will be deleted at the completion of the diagnostic if it finishes successfully. For a list of relevant permissions, see Cloud IAM permissions for gsutil commands.

gsutil performance can be influenced by a number of factors originating at the client, server, or network level. Some examples include the following:

  • CPU speed

  • Available memory

  • The access path to the local disk

  • Network bandwidth

  • Contention and error rates along the path between gsutil and Google servers

  • Operating system buffering configuration

  • Firewalls and other network elements

The perfdiag command is provided so that customers can run a known measurement suite when troubleshooting performance problems.

Providing Diagnostic Output To The Cloud Storage Team

If the Cloud Storage team asks you to run a performance diagnostic please use the following command, and email the output file (output.json) to the @google.com address provided by the Cloud Storage team:

gsutil perfdiag -o output.json gs://your-bucket

Additional resources for discussing perfdiag results include the Stack Overflow tag for Cloud Storage and the gsutil GitHub repository.

Options

-n

Sets the number of objects to use when downloading and uploading files during tests. Defaults to 5.

-c

Sets the number of processes to use while running throughput experiments. The default value is 1.

-k

Sets the number of threads per process to use while running throughput experiments. Each process will receive an equal number of threads. The default value is 1.

-p

Sets the type of parallelism to be used (only applicable when threads or processes are specified and threads * processes > 1). The default is to use fan. Must be one of the following:

fan

Use one thread per object. This is akin to using gsutil -m cp, with sliced object download / parallel composite upload disabled.

slice

Use Y (specified with -y) threads for each object, transferring one object at a time. This is akin to using parallel object download / parallel composite upload, without -m. Sliced uploads not supported for s3.

both

Use Y (specified with -y) threads for each object, transferring multiple objects at a time. This is akin to simultaneously using sliced object download / parallel composite upload and gsutil -m cp. Parallel composite uploads not supported for s3.

-y

Sets the number of slices to divide each file/object into while transferring data. Only applicable with the slice (or both) parallelism type. The default is 4 slices.

-s

Sets the size (in bytes) for each of the N (set with -n) objects used in the read and write throughput tests. The default is 1 MiB. This can also be specified using byte suffixes such as 500K or 1M.

-d

Sets the directory to store temporary local files in. If not specified, a default temporary directory will be used.

-t

Sets the list of diagnostic tests to perform. The default is to run the lat, rthru, and wthru diagnostic tests. Must be a comma-separated list containing one or more of the following:

lat

For N (set with -n) objects, write the object, retrieve its metadata, read the object, and finally delete the object. Record the latency of each operation.

list

Write N (set with -n) objects to the bucket, record how long it takes for the eventually consistent listing call to return the N objects in its result, delete the N objects, then record how long it takes listing to stop returning the N objects.

rthru

Runs N (set with -n) read operations, with at most C (set with -c) reads outstanding at any given time.

rthru_file

The same as rthru, but simultaneously writes data to the disk, to gauge the performance impact of the local disk on downloads.

wthru

Runs N (set with -n) write operations, with at most C (set with -c) writes outstanding at any given time.

wthru_file

The same as wthru, but simultaneously reads data from the disk, to gauge the performance impact of the local disk on uploads.

-m

Adds metadata to the result JSON file. Multiple -m values can be specified. Example:

gsutil perfdiag -m "key1:val1" -m "key2:val2" gs://bucketname

Each metadata key will be added to the top-level "metadata" dictionary in the output JSON file.

-o

Writes the results of the diagnostic to an output file. The output is a JSON file containing system information and performance diagnostic results. The file can be read and reported later using the -i option.

-i

Reads the JSON output file created using the -o command and prints a formatted description of the results.

-j

Applies gzip transport encoding and sets the target compression ratio for the generated test files. This ratio can be an integer between 0 and 100 (inclusive), with 0 generating a file with uniform data, and 100 generating random data. When you specify the -j option, files being uploaded are compressed in-memory and on-the-wire only. See cp -j for specific semantics.

Measuring Availability

The perfdiag command ignores the boto num_retries configuration parameter. Instead, it always retries on HTTP errors in the 500 range and keeps track of how many 500 errors were encountered during the test. The availability measurement is reported at the end of the test.

Note that HTTP responses are only recorded when the request was made in a single process. When using multiple processes or threads, read and write throughput measurements are performed in an external process, so the availability numbers reported won't include the throughput measurements.

Note

The perfdiag command runs a series of tests that collects system information, such as the following:

  • Retrieves requester's IP address.

  • Executes DNS queries to Google servers and collects the results.

  • Collects network statistics information from the output of netstat -s and evaluates the BIOS product name string.

  • If a proxy server is configured, attempts to connect to it to retrieve the location and storage class of the bucket being used for performance testing.

None of this information will be sent to Google unless you proactively choose to send it.