DataCapture policy

You're viewing Apigee X documentation.
View Apigee Edge documentation.

DataCapture icon

What

The DataCapture policy captures data (such as payload, HTTP headers, and path or query parameters) from an API Proxy for use in Analytics. You can use captured data in custom Analytics reports, as well as to implement Sense, Monetization, and Monitoring rules.

Data Collector REST resource

To use the DataCapture policy, you must first create a Data Collector REST resource. To do so, send an API request like the following:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -X POST -H "content-type:application/json" \
  -d '
{
  "name": "dc_data_collector",
  "description": "Collects data for analysis.",
  "type": "STRING",
}' \
  "https://apigee.googleapis.com/v1/organizations/$PROJECT_ID/datacollectors"

This creates a resource named dc_data_collector, which you can use in the DataCapture policy. For a list of the quantities related to API traffic that you can capture using the Data Collector resource, see Metrics.

<DataCapture>

The <DataCapture> element defines a DataCapture policy.

<DataCapture async="true" continueOnError="true" enabled="true" name="DC">

Here's an example of a DataCapture policy:

<DataCapture name="capturepayment">
    <Capture>
        <DataCollector>dc_data_collector</DataCollector>
        <Collect ref="my_data_variable" />
    </Capture>
</DataCapture>

The main element of the DataCapture policy is the <Capture> element, which specifies the means of capturing the data. It has two required child elements:

In this simple example, the data is extracted from a variable named my_data_variable, which has been created elsewhere in the proxy. The variable is specified by the ref attribute.

The <Collect> element also provides several other ways of capturing data from various sources through its child elements. See Examples for more examples of capturing data with the DataCapture policy.

The DataCapture element has the following syntax.

<DataCapture name="capturepayment" continueOnError="false" enabled="true"> 
  <DisplayName>Data-Capture-Policy-1</DisplayName>
  <IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
  <ThrowExceptionOnLimit>false</ThrowExceptionOnLimit>

  <!-- Existing Variable -->
  <Capture>
    <Collect ref="existing-variable" default="0"></Collect>
    <DataCollector>dc_1</DataCollector>
  </Capture>

  <!-- JSONPayload -->
  <Capture>
    <DataCollector>dc_2</DataCollector>
    <Collect default="0">
      <Source>request</Source>
      <JSONPayload>
        <JSONPath>result.var</JSONPath>
      </JSONPayload>
    </Collect>
  </Capture>

  <!-- URIPath -->
  <Capture>
    <DataCollector>dc_3</DataCollector>
    <Collect default="0">
      <URIPath>
        <!-- All patterns must specify a single variable to extract named $ -->
        <Pattern ignoreCase="false">/foo/{$}</Pattern>
        <Pattern ignoreCase="false">/foo/bar/{$}</Pattern>
      </URIPath>
    </Collect>
  </Capture>
</DataCapture>

This element has the following attributes that are common to all policies:

Attribute Default Required? Description
name N/A Required

The internal name of the policy. The value of the name attribute can contain letters, numbers, spaces, hyphens, underscores, and periods. This value cannot exceed 255 characters.

Optionally, use the <DisplayName> element to label the policy in the management UI proxy editor with a different, natural-language name.

continueOnError false Optional Set to false to return an error when a policy fails. This is expected behavior for most policies. Set to true to have flow execution continue even after a policy fails.
enabled true Optional Set to true to enforce the policy. Set to false to turn off the policy. The policy will not be enforced even if it remains attached to a flow.
async   false Deprecated This attribute is deprecated.

The following table provides a high-level description of the child elements of <DataCapture>.

Child Element Required Description
<Capture> Required Captures the data for a specified variable.

Examples

The following examples illustrate various ways to use the DataCapture policy.

Capturing data for a built-in variable

The code sample below illustrates how to capture data for a built-in variable, message.content, which contains the content of the request, response, or error message. See Flow variables for more information about built-in variables.

<DataCapture name="capturepayment">
    <Capture>
        <DataCollector>dc_data_collector</DataCollector>
        <Collect ref="message.content" >
    </Capture>
</DataCapture>

In the code above, the refattribute of the </Collect> element specifies the variable to capture, which in this example is named "message.content".

The sample captures the data with a <Capture> element, which also contains a <DataCollector> element specifying the name of the Data Collector resource.

Capturing data inline

The next example shows how to capture data inline using <JSONPayload>, a child element of the <Collect> element.

<DataCapture name="capturepayment">
    <Capture>
        <DataCollector>dc_data_collector<DataCollector>
        <Collect>
            <JSONPayload>
                <JSONPath>$.results[0].currency</JSONPath>
            </JSONPayload>
        </Collect>
    </Capture>
</DataCapture>

In the code above:

  • The <JSONPayload> element specifies the JSON-formatted message from which the value of the variable is extracted.
  • The <JSONPath> element specifies the JSON path used to extract the value from the message, which in this case is $.results[0].currency.

As an illustration, suppose the value extracted at the time the message was received is 1120. Then the resulting entry sent to Analytics would be

{
    "dc_data_collector": "1120"
}

<Capture>

The <Capture> element specifies the means of capturing the data.

<Capture />

The following table provides a high-level description of the child elements of <Capture>.

Child Element Required? Description
<DataCollector> Required Specifies the Data Collector resource.
<Collect> Required Specifies the means for capturing data.

<DataCollector>

The <DataCollector> element specifies the Data Collector resource.

<DataCollector>dc_data_collector</DataCollector>

The body of the <DataCollector> element contains the name of the Data Collector resource.

<Collect>

The <Collect> element specifies the means for capturing data.

<Collect ref="existing-variable" default="0"/>

The following table describes the attributes of the <Collect> element.

Attribute Description Default Presence Type
ref

The variable for which you are capturing data.

N/A Optional—If ref is omitted, exactly one of the following must be specified: QueryParam, Header, FormParam, URIPath, JSONPayload, or XMLPayload. String
default Specifies the value that is sent to Analytics if the value of the variable is not populated at runtime. For example, if you set default="0", the value sent to Analytics would be 0. If you don't specify the value of default, and the value of the variable is not populated at runtime, the value sent to Analytics is null for a numeric variable or "Not set" for a string variable. Required String

The data can be captured from an existing variable using the ref attribute, or by child elements <Collect>.

Child elements of <Collect>

The following table provides a high-level description of the child elements of <Collect>:

Child Element Required? Description
<Source> Optional Specifies the variable to be parsed.
<URIPath> Optional Extracts a value from the proxy.pathsuffix of a request source message.
<QueryParam> Optional Extracts a value from the specified query parameter of a request source message.
<Header> Optional Extracts a value from the specified HTTP header of the specified request or response message.
<FormParam> Optional Extracts a value from the specified form parameter of the specified request or response message.
<JSONPayload> Optional Specifies the JSON-formatted message from which the value of the variable will be extracted.
<XMLPayload> Optional Specifies the XML-formatted message from which the value of the variable will be extracted.

<Source>

(Optional) Specifies the variable to be parsed. The value of <Source> defaults to message. The message value is context-sensitive. In a request flow, message resolves to the request message. In a response flow, message resolves to the response message.

While you often use this policy to extract information from a request or response message, you can use it to extract information from any variable. For example, you can use it to extract information from an entity created by the AccessEntity policy, from data returned by the ServiceCallout policy, or extract information from an XML or JSON object.

If <Source> cannot be resolved, or resolves to a non-message type, the policy will fail to respond.

<Source clearPayload="true|false">request</Source>

Attributes

Attribute Description Default Presence Type
clearPayload

Set to true if you want to clear the payload specified in <Source> after extracting data from it.

Use the <clearPayload> option only if the source message is not required after ExtractVariables is executed. Setting to true frees up the memory used by the message.

false

Optional Boolean

<URIPath>

Extracts a value from the proxy.pathsuffix of a request source message. The path applied to the pattern is the proxy.pathsuffix, which does not include the basepath for the API Proxy. If the source message resolves to a message type of response, the element does nothing.

<Collect>
    <URIPath>
        <Pattern ignoreCase="false">/foo/{$}</Pattern>
    </URIPath>
</Collect>

You can use multiple <Pattern> elements:

<URIPath>
   <Pattern ignoreCase="false">/foo/{$}</Pattern>
   <Pattern ignoreCase="false">/foo/bar/{$}</Pattern>
</URIPath>
Default: N/A
Presence: Optional. However, you must include exactly one of the following: the ref attribute of the <Collect> element, <URIPath>, <QueryParam>, <Header>, <FormParam>, <JSONPayload>, or <XMLPayload>.
Type: N/A

Attributes

Attribute Description Default Presence Type
ignoreCase Specifies to ignore case when matching the pattern.

false

Optional Boolean

<QueryParam>

Extracts a value from the specified query parameter of a request source message. If the source message resolves to a message type of response, the element does nothing.

<Collect>
    <QueryParam name="code">
        <Pattern ignoreCase="true">{$}</Pattern>
    </QueryParam>
</Collect>

If multiple query parameters have the same name, use indices to reference the parameters:

<Collect>
    <QueryParam name="code.2">
        <Pattern ignoreCase="true">{$}</Pattern>
    </QueryParam>
</Collect>

Note: You must specify a single variable named {$}. There may be multiple unique Pattern elements, but the first matching pattern will resolve for a particular request.

Default: N/A
Presence: Optional. However, you must include exactly one of the following: the ref attribute of the <Collect> element, <URIPath>, <QueryParam>, <Header>, <FormParam>, <JSONPayload>, or <XMLPayload>.
Type: N/A

Attributes

Attribute Description Default Presence Type
name Specifies the name of the query parameter. If multiple query parameters have the same name, use indexed referencing, where the first instance of the query parameter has no index, the second is at index 2, the third at index 3, etc.

N/A

Required String

<Header>

Extracts a value from the specified HTTP header of the specified request or response message. If multiple headers have the same name, their values are stored in an array.

<Collect>
    <Header name="my_header">
        <Pattern ignoreCase="false">Bearer {$}</Pattern>
    </Header>
</Collect>

If multiple headers have the same name, use indices to reference individual headers in the array:

<Collect>
    <Header name="my_header.2">
        <Pattern ignoreCase="true">{$}</Pattern>
    </Header>
</Collect>

Or the following to list all the headers in the array:

<Collect>
    <Header name="my_header.values">
        <Pattern ignoreCase="true">{$}</Pattern>
    </Header>
</Collect>
Default: N/A
Presence: Optional. However, you must include exactly one of the following: the ref attribute of the <Collect> element, <URIPath>, <QueryParam>, <Header>, <FormParam>, <JSONPayload>, or <XMLPayload>.
Type: N/A

Attributes

Attribute Description Default Presence Type
name Specifies the name of the header from which you extract the value. If multiple headers have the same name, use indexed referencing, where the first instance of the header has no index, the second is at index 2, the third at index 3, etc. Use .values to get all headers in the array.

N/A

Required String

<FormParam>

Extracts a value from the specified form parameter of the specified request or response message. Form parameters can be extracted only when the Content-Type header of the specified message is application/x-www-form-urlencoded.

<Collect>
    <FormParam name="greeting">
        <Pattern>hello {$}</Pattern>
    </FormParam>
</Collect>
Default: N/A
Presence: Optional. However, you must include exactly one of the following: the ref attribute of the <Collect> element, <URIPath>, <QueryParam>, <Header>, <FormParam>, <JSONPayload>, or <XMLPayload>.
Type: N/A

Attributes

Attribute Description Default Presence Type
name The name of the form parameter from which you extract the value.

N/A

Optional String

<JSONPayload>

Specifies the JSON-formatted message from which the value of the variable will be extracted. JSON extraction is performed only when message's Content-Type header is application/json.

<Collect>
    <JSONPayload>
        <JSONPath>$.results[0].currency</JSONPath>
    </JSONPayload>
</Collect>
Default: N/A
Presence: Optional. However, you must include exactly one of the following: the ref attribute of the <Collect> element, <URIPath>, <QueryParam>, <Header>, <FormParam>, <JSONPayload>, or <XMLPayload>.
Type: N/A

<JSONPayload>/<JSONPath>

Required child element of the <JSONPayload> element. Specifies the JSON path used to extract a value from a JSON-formatted message.

<JSONPath>$.rss.channel.title</JSONPath>
Default: N/A
Presence: Required
Type: String

<XMLPayload>

Specifies the XML-formatted message from which the value of the variable will be extracted. XML payloads are extracted only when the Content-Type header of the message is text/xml, application/xml, or application/*+xml.

<Collect>
    <XMLPayload>
        <Namespaces>
            <Namespace prefix="apigee">http://www.apigee.com</Namespace>
            <Namespace prefix="gmail">http://mail.google.com</Namespace>
        </Namespaces>
        <XPath>/apigee:test/apigee:example</XPath>
    </XMLPayload>
</Collect>
Default: N/A
Presence: Optional. However, you must include exactly one of the following: the ref attribute of the <Collect> element, <URIPath>, <QueryParam>, <Header>, <FormParam>, <JSONPayload>, or <XMLPayload>.
Type: N/A

The following table provides a high-level description of the child elements of <XMLPayload>.

Child Element Required? Description
<Namespaces> Optional Specifies the namespaces to be used in the XPath evaluation.
<XPath> Required Specifies the XPath defined for the variable.

<XMLPayload>/<Namespaces>

Specifies the namespaces to be used in the XPath evaluation. If you're using namespaces in your XPath expressions, you must declare the namespaces here, as shown in the following example.

<Collect>
    <XMLPayload>
        <Namespaces>
            <Namespace prefix="apigee">http://www.apigee.com</Namespace>
            <Namespace prefix="gmail">http://mail.google.com</Namespace>
        </Namespaces>
        <XPath>/apigee:Directions/apigee:route/apigee:leg/apigee:name</XPath>
    </XMLPayload>
</Collect>

If you are not using namespaces in your XPath expressions, you can omit or comment out the <Namespaces> element, as the following example shows:

<Collect>
    <XMLPayload>
        <!-- <Namespaces/> -->
        <XPath>/Directions/route/leg/name</XPath>
    </XMLPayload>
</Collect>
Default: N/A
Presence: Optional
Type: String

Attributes

Attribute Description Default Presence Type
prefix

The namespace prefix.

N/A

Required String

<XMLPayload> <XPath>

Required child element of the XMLPayload element. Specifies the XPath defined for the variable. Only XPath 1.0 expressions are supported.

   <XPath>/test/example</XPath>

Note: If you use namespaces in your XPath expressions, you must declare the namespaces in the <XMLPayload><Namespaces> section of the policy.

Default: N/A
Presence: Required
Type: String

Error Reference

Runtime errors

The table below describes runtime errors, which can occur when the policy executes.

Fault code Cause
DataCollectorTypeMismatch

The value to be captured did not match the DataCollector type.

ExtractionFailure The data extraction failed.
UnresolvedVariable The variable does not exist.
VariableCountLimitExceeded The number of captured variables exceeded the variable count limit of 100 variables
VariableValueLimitExceeded The size of a captured value exceeded the single variable limit of 400 bytes.
MsgContentParsingFailed Message content failed to be parsed into XML or JSON.
InvalidMsgContentType The message content type does not match the expected message content type in the policies capture clause.
NonMsgVariable The <Source> element value did not reference a message variable.
JSONPathQueryFailed The JSONPath query failed to resolve to a value.
PrivateVariableAccess Attempt to access a private variable failed.
XPathEvaluationFailed XPath failed to resolve to a value.

Runtime errors are returned in two ways:

  • Error response back to client (continueOnError=false)

    When the policy's continueOnError attribute is set to false, errors that occur during the policy execution will abort the message processing and return a descriptive error message. The policy will attempt to capture all the relevant errors in the data capture policy before returning the message.

  • DataCapture errors analytics field

    The dataCapturePolicyErrors field contains a list of all errors that have occurred. An example of how this would appear in the analytics data map is shown below:

    # Example payload
    [
         {
             errorType: TypeMismatch,
             policyName: MyDataCapturePolicy-1,
             dataCollector: purchaseValue
         },
         {
             errorType: MaxValueSizeLimitReached,
             policyName: MyDataCapturePolicy-1,
             dataCollector: purchasedItems
         },
    ]

    This field is subject to the 400 byte variable size limit.

    Deployment errors

    Fault code Cause
    DeploymentAssertionError The DataCollector referenced in the policy couldn't be found in the organization during deployment.
    JsonPathCompilationFailed Compiling with the specified JSONPath failed.
    XPathCompilationFailed If the prefix or the value used in the XPath element is not part of any of the declared namespaces in the policy, then the deployment of the API proxy fails.
    PatternCompilationFailed Pattern compilation failed.

    Finding DataCapture Errors in the Debug tool

    The dataCapturePolicyErrors variable is available in the Debug tool. This an additional tool that you can use to catch errors without going to Analytics. For example, you can catch an error that occurs if you upgrade your version of the hybrid runtime and inadvertently break the analytics in an already deployed proxy.

    Related topics