This tutorial shows you how to build an event-driven pipeline that can help you automate the evaluation of documents for malicious code.
Manually evaluating the large number of documents uploaded to Cloud Storage is too time-consuming for most apps.
This pipeline is built by using Google Cloud products along with an open source antivirus engine called ClamAV. For this tutorial, ClamAV runs in a Docker container hosted in the App Engine flexible environment. The pipeline also writes log entries to Cloud Logging when a malware-infected document is detected.
You can trigger log-based alerts for documents that are infected by using these Logging log entries, but setting up these alerts is outside the scope of this tutorial.
The term malware is used throughout this tutorial as an umbrella term to describe trojans, viruses, and other malicious code.
This tutorial assumes that you are familiar with the basic functionality of Cloud Storage, App Engine, Cloud Functions, Docker, and Node.js.
Architecture
The following diagram outlines the steps in the pipeline.
The following steps outline the architectural pipeline:
- You upload files to Cloud Storage.
- The upload event automatically triggers a Cloud Function.
- The Cloud Function invokes the malware-scanner service running in App Engine.
- The malware-scanner service scans the uploaded document for malware.
- If the document is infected, the service moves it to a quarantined bucket; otherwise the document is moved into another bucket that holds uninfected scanned documents.
Objectives
Build an App Engine flexible environment malware-scanner service to scan documents for malware by using ClamAV.
Build a Node.js Cloud Function to invoke the malware-scanner service when a document is uploaded to Cloud Storage.
Build services to move scanned documents to clean or quarantined buckets based on the outcome of the scan.
Costs
This tutorial uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Cloud Functions and App Engine APIs.
-
In the Cloud Console, activate Cloud Shell.
At the bottom of the Cloud Console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the
gcloud
command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.
In this tutorial, you run all commands from Cloud Shell.
Setting up your environment
In this section, you assign settings for values that are used
throughout the tutorial, such as
region and zone.
In this tutorial, you use us-central1
as the region andus-central1-b
as the zone.
In Cloud Shell, set the region and zone:
gcloud config set compute/zone us-central1-b
Create an environment variable for your Google Cloud project ID:
export PROJECT_NUMBER=$(gcloud projects describe $DEVSHELL_PROJECT_ID \ --format='value(projectNumber)')
Create three Cloud Storage buckets with unique names:
Cloud Shell
Create three buckets:
gsutil mb gs://unscanned-$DEVSHELL_PROJECT_ID gsutil mb gs://quarantined-$DEVSHELL_PROJECT_ID gsutil mb gs://scanned-$DEVSHELL_PROJECT_ID
$DEVSHELL_PROJECT_ID
is an environment variable that sets Cloud Shell to point to the active Google Cloud project in the Cloud Console. It's used to make sure that the bucket names are unique.
Cloud Console
In the Cloud Console, go to the Browser:
Click Create bucket.
In the Bucket name text box, enter the name of your bucket
unscanned-PROJECT_ID
, and then click Create.Replace
PROJECT_ID
with your Google Cloud project ID.Repeat these steps to create two more buckets called
quarantined-PROJECT_ID
andscanned-PROJECT_ID
.
The three buckets you create hold the document at various stages during the pipeline:
unscanned-PROJECT_ID
: Holds documents before they're processed. It's the bucket where you upload your documents to.PROJECT_ID
represents your Google Cloud project ID.quarantined-PROJECT_ID
: Holds documents that the malware-scanner service scans and deems to contain malware.scanned-PROJECT_ID
: Holds documents that the malware-scanner service scans and are found to be uninfected.
Creating the malware-scanner service in App Engine
In this section, you deploy the server.js
script to run the malware-scanner
service in the App Engine flexible environment. The service runs in a
Docker container in the App Engine flexible environment and contains the
following:
- A Node.js script called
server.js
for the malware-scanner service. - A Dockerfile to build an image with the service and ClamAV binaries.
- An
app.yaml
file, which is a configuration file that outlines the definition of the service deployed to App Engine. - A
bootstrap.sh
shell script to run the clamAV andfreshclam
daemons when the container starts.
Create the malware-scanner service:
In Cloud Shell, clone the GitHub repository that contains the code files:
git clone https://github.com/GoogleCloudPlatform/docker-clamav-malware-scanner.git
Change to the
appengine-malwarescanningservice-node
directory:cd malware-scanner-tutorial/appengine-malwarescanningservice-node
Run the following
sed
command to replace the placeholders in theapp.yaml
file with your Google Cloud project ID:sed -i -e "s/PROJECT_ID/$DEVSHELL_PROJECT_ID/g" app.yaml
If it's the first service that you're deploying to App Engine, set the service name in the current
app.yaml
file in this directory todefault
:service: default
If it's not your first service that you're deploying to App Engine, replace the service name:
sed -i -e "s/malware-scanner/default/g" app.yaml
For more information about how App Engine services are structured, see the default service.
Create the service and deploy it to App Engine:
gcloud app create --project=$DEVSHELL_PROJECT_ID --region=us-central gcloud app deploy
When prompted, enter
Y
.Make a note of the service URL in the output when the deployment of your app is complete. You need the app's URL in a later step. The service URL has the following format:
https://service-name-dot-PROJECT_ID.appspot.com
If it's your first App Engine service, then the service URL has the following format:
https://PROJECT_ID.appspot.com
Assign bucket permissions
Locate the App Engine flexible environment service account name because you need it in the next step to assign permissions to access the buckets you created. Your service account is in the following format:
service-${PROJECT_NUMBER}@gae-api-prod.google.com.iam.gserviceaccount.com
In Cloud Shell, add the App Engine service account as a member with the
roles/storage.legacyBucketWriter
role to theunscanned-PROJECT_ID
bucket:gsutil iam ch \ serviceAccount:service-${PROJECT_NUMBER}@gae-api-prod.google.com.iam.gserviceaccount.com:roles/storage.legacyBucketWriter \ gs://unscanned-$DEVSHELL_PROJECT_ID
Add the App Engine service account as a member with the
roles/storage.objectCreator
role to thequarantined-PROJECT_ID
bucket:gsutil iam ch \ serviceAccount:service-${PROJECT_NUMBER}@gae-api-prod.google.com.iam.gserviceaccount.com:roles/storage.objectCreator \ gs://quarantined-$DEVSHELL_PROJECT_ID
Add the App Engine service account as a member with the
roles/storage.objectCreator
role to thescanned-PROJECT_ID/var>
bucket:gsutil iam ch \ serviceAccount:service-${PROJECT_NUMBER}@gae-api-prod.google.com.iam.gserviceaccount.com:roles/storage.objectCreator \ gs://scanned-$DEVSHELL_PROJECT_ID
Creating a Cloud Function to trigger the malware-scanner service
In these steps, you deploy the index.js
script that contains the Cloud
Function that is called when a document is uploaded to your
unscanned-PROJECT_ID
Cloud Storage bucket. This
function runs as a background function and is invoked in response to
Cloud Storage events, such as uploading new documents or changing
document versions.
Cloud Shell
In Cloud Shell, change directories to the
function-scantrigger-node
folder of the repository that was cloned:cd ../function-scantrigger-node
Deploy the function, replacing
https://malware-scanner-dot-PROJECT_ID.appspot.com
with the service URL that you copied previously.gcloud functions deploy requestMalwareScan \ --runtime nodejs8 \ --set-env-vars SCAN_SERVICE_URL=your-service-url/scan \ --trigger-resource gs://unscanned-$DEVSHELL_PROJECT_ID \ --trigger-event google.storage.object.finalize
Validate that the function successfully deployed:
gcloud functions describe requestMalwareScan
A successful deployment displays a ready status similar to the following:
status: ACTIVE timeout: 60s
GCP CONSOLE
In the Cloud Console, go to the Cloud Functions Overview page.
Select the project for which you enabled Cloud Functions.
Click Create function.
In the Name text box, replace the default name with
requestMalwareScan
.In the Trigger field, select Cloud Storage.
In the Bucket field, click Browse, click your
unscanned-PROJECT_ID
bucket in the drop-down list, and then click Select.Under Runtime, select Node.js 8.
Under Source code, check Inline editor.
Paste the following code into the index.js box, replacing the existing text:
In the Function to execute text box, replace
helloWorld
withrequestMalwareScan
.Paste the following code into the package.json text box, replacing the existing text:
{ "name": "function_malware_scanner", "version": "1.0.0", "description": "Triggers the Malware Scanner service when a document is uploaded to Cloud Storage", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "author": "Google LLC.", "license": "Apache-2.0", "dependencies": { "request": "^2.88.0", "request-promise": "^4.2.4" } }
Click More and set the Service Account to App Engine default service account.
Click Environment Variables.
In the Name field, enter
SCAN_SERVICE_URL
.In the Value field, enter the malware-scanner service URL that you copied previously appended with
/scan
.https://malware-scanner-dot-PROJECT_ID.appspot.com/scan
If it's your first App Engine service, the service URL is in the following format:
https://PROJECT_ID.appspot.com/scan
Click Save. A green check mark next to the function indicates a successful deployment.
Testing the pipeline by uploading files
You upload one clean (malware-free) file and one infected file to test the pipeline.
Create a sample text file or use an existing clean file to test the pipeline processes.
Copy the sample data file to the unscanned files bucket:
gsutil cp filename gs://unscanned-$DEVSHELL_PROJECT_ID
Replace
filename
with the name of the clean text file. The malware-scanner service inspects each document and moves it to an appropriate bucket. This document is moved to thescanned-PROJECT_ID
bucketCheck your
scanned-PROJECT_ID
bucket to see if the processed document is there:gsutil ls -r gs://scanned-PROJECT_ID
In Cloud Shell, create a document called
eicar-infected.txt
and add the malware text to it to test the workflow for when infected documents are uploaded to yourunscanned-PROJECT_ID
bucket:echo -e 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' \ > eicar-infected.txt
Upload the document to your
unscanned-PROJECT_ID
bucket:gsutil cp eicar-infected.txt gs://unscanned-$DEVSHELL_PROJECT_ID
Give the pipeline a few seconds to process the document and then check your
quarantined-PROJECT_ID
bucket to see if your document successfully went through the pipeline. The service also logs a Logging log entry when a malware infected document is detected.gsutil ls -r gs://quarantined-PROJECT_ID
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Learn about automating the classification of data uploaded to Cloud Storage.
- Explore Cloud Storage documentation.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.