Cloud Storage FUSE is an open source FUSE adapter that allows you to mount Cloud Storage buckets as file systems on Linux or macOS systems. It also provides a way for applications to upload and download Cloud Storage objects using standard file system semantics. Cloud Storage FUSE can be run anywhere with connectivity to Cloud Storage, including Google Compute Engine VMs or on-premises systems1.
Cloud Storage FUSE works by translating object storage names into a file and directory system, interpreting the “/” character in object names as a directory separator so that objects with the same common prefix are treated as files in the same directory. Applications can interact with the mounted bucket like a simple file system, providing virtually limitless file storage running in the cloud.
While Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the backend. Cloud Storage FUSE retains the same fundamental characteristics of Cloud Storage, preserving the scalability of Cloud Storage in terms of size and aggregate performance while maintaining the same latency and single object performance. As with the other access methods, Cloud Storage does not support concurrency and locking. For example, if multiple Cloud Storage FUSE clients are writing to the same file, the last flush wins.
For more information about using Cloud Storage FUSE or to file an issue, go to the Google Cloud GitHub repository. In the repository, we recommend you review README, semantics, installing, and mounting.
Using Cloud Storage FUSE
Full details for installing and working with Cloud Storage FUSE are described in the GoogleCloudPlatform/gcsfuse GitHub repository. The following steps provide a quick overview of how to work with Cloud Storage FUSE interactively, that is, mounting your bucket manually.
Follow the instructions for installing Cloud Storage FUSE and its dependencies.
Set up credentials for Cloud Storage FUSE.
Cloud Storage FUSE auto-discovers credentials based on application default credentials:
If you are running on a Google Compute Engine instance with scope
storage-fullconfigured, then Cloud Storage FUSE can use the Compute Engine built-in service account. For more information, see Using Service Accounts with Applications.
If you installed the Google Cloud SDK and ran
gcloud auth application-default login, then Cloud Storage FUSE can use these credentials.
If you set the environment variable
GOOGLE_APPLICATION_CREDENTIALSto the path of a service account's JSON key file, then Cloud Storage FUSE will use this credential. For more information about creating a JSON key file for a service account using the Google Cloud Console, see Creating Service Account Keys.
If more than one credential type is specified, see How the Application Default Credentials Work to learn about the order that credentials are used.
Create a directory.
$ mkdir /path/to/mount
Create the bucket you wish to mount, if it doesn't already exist, using the Google Cloud Console.
Use Cloud Storage FUSE to mount the bucket (e.g.
$ gcsfuse example-bucket /path/to/mount
Start working with the mounted bucket.
$ ls /path/to/mount
Key differences from a POSIX file system
Cloud Storage FUSE helps you make better and quicker use of Cloud Storage by allowing file-based applications to use Cloud Storage without rewriting their I/O code. It is ideal for use cases where Cloud Storage has the right performance and scalability characteristics for an application, and only the file system semantics are missing. When deciding if Cloud Storage FUSE is an appropriate solution, there are some additional differences compared to local file systems that you should take into account:
Pricing: Cloud Storage FUSE access is ultimately Cloud Storage access. All data transfer and operations performed by Cloud Storage FUSE map to Cloud Storage transfers and operations, and are charged accordingly. See the pricing section below for details before using Cloud Storage FUSE.
Performance: Cloud Storage FUSE has much higher latency than a local file system. As such, throughput may be reduced when reading or writing one small file at a time. Using larger files and/or transferring multiple files at a time will help to increase throughput.
- Individual I/O streams run approximately as fast as gsutil.
gsutil rsynccommand can be particularly affected by latency because it reads and writes one file at a time. Using the top-level -m flag with the command is often faster.
- Small random reads are slow due to latency to first byte (don't run a database over Cloud Storage FUSE!)
Random writes are done by reading in the whole blob, editing it locally, and writing the whole modified blob back to Cloud Storage. Small writes to large files work as expected, but are slow and expensive.
Metadata: Cloud Storage FUSE does not transfer metadata along with the file when uploading to Cloud Storage. This means that if you wish to use Cloud Storage FUSE as an uploading tool, you will not be able to set metadata such as content type and acls as you would with other uploading methods. If metadata properties are critical, considering using gsutil, the JSON API or the Google Cloud Console.
- The exception to this is that Cloud Storage FUSE does store mtime and symlink targets.
Concurrency: There is no concurrency control for multiple writers to a file. When multiple writers try to replace a file the last write wins and all previous writes are lost - there is no merging, version control, or user notification of the subsequent overwrite.
Linking: Cloud Storage FUSE does not support hard links.
Semantics: Some semantics are not exactly what they would be in a traditional file system. The list of exceptions is here. For example, metadata like last access time are not supported, and some metadata operations like directory rename are not atomic.
Access: Authorization for files is governed by Cloud Storage permissions. POSIX-style access control does not work.
Availability: Transient errors do at times occur in distributed systems like Cloud Storage, leading to less than 100% availability. It is recommended that retries be attempted using the guidelines of truncated exponential backoff.
Local storage: Objects that are new or modified will be stored in their entirety in a local temporary file until they are closed or synced. When working with large files, be sure you have enough local storage capacity for temporary copies of the files, particularly if you are working with Google Compute Engine instances. For more information, see the readme documentation.
Directories: By default, only directories that are explicitly defined (that is, they are their own object in Cloud Storage) will appear in the file system. Implicit directories (that is, ones that are only parts of the pathname of other files or directories) will not appear by default. If there are files whose pathname contain an implicit directory, they will not appear in the overall directory tree (since the implicit directory containing them does not appear). A flag is available to change this behavior. For more information, see the semantics documentation.
Charges incurred with Cloud Storage FUSE
Cloud Storage FUSE is available free of charge, but the storage, metadata, and network I/O it generates to and from Cloud Storage are charged like any other Cloud Storage interface. To avoid surprises, you should estimate how your use of Cloud Storage FUSE will translate to Cloud Storage charges. For example, if you are using Cloud Storage FUSE to store log files, you can incur charges quickly if logs are aggressively flushed on hundreds or thousands of machines at the same time.
You should be aware of the following categories of charges related to using Cloud Storage FUSE:
Normal object operations (create, delete, and list) incur charges as described in the Operations section of the Cloud Storage pricing page.
Nearline Storage, Coldline Storage, and Archive Storage objects have costs associated with retrieval and early deletion. See the Retrieval and early deletion section in the Cloud Storage pricing page.
Network egress and data transfer between locations incur costs. See the Network section in the Cloud Storage pricing page.
Cost breakdown example
To get an idea of how using Cloud Storage FUSE translates to Cloud Storage costs,
consider the following sequence of commands and their associated JSON API
operations. You can display information about operations using the
|Command||JSON API Operations|
||Objects.list (to check credentials)|
Objects.insert("subdir/local.txt"), to create an empty object
Objects.insert("subdir/local.txt"), when closing after done writing
Using the operation charges for the JSON API, we can calculate
for the 14 operations that there are 8 Class A operations, 4 Class B operations,
and 2 free operations. There is also a charge incurred for the storage of the
local.txt file. If you delete the file soon after creating it, that charge
will be negligible. For just the 12 charged operations, the cost of this
sequence of commands is $0.000084.
1 Cloud Storage FUSE
is supported in Linux kernel version 3.10 and newer. To check your kernel
version, you can use