In streaming analytics applications, it is common to enrich data with additional information that might be useful for further analysis. For example, if you have the storeId for a transaction, you might want to add information about the store location. You would typically add this additional information by taking an element and "denormalizing" it by bringing in information from a lookup table.
For lookup tables that are both slowly changing and smaller in size,
it works well to bring the table into the pipeline as a singleton class that
Map<K,V> interface. This lets you avoid having
each element do an API call for its lookup. Once you include a copy of a table
in the pipeline, you need to update it periodically to keep it fresh.
We recommend using the Apache Beam Side input patterns to handle slow updating side inputs.
Side inputs are loaded in memory and are therefore cached automatically.
You can set the size of the cache by using the
You can share the cache across
DoFn instances and use external triggers to refresh the cache.