Targets
There are two fundamental building blocks to a Prism project: tasks and targets.
What are targets?
The second fundamental building block to Prism projects are targets.
Targets enable you to cache the results of your tasks. Put differently, targets are used to store the results of your task at an external location (e.g., a CSV on your local machine, a table in your data warehouse, a file in an S3 bucket, and so on). In doing so, they prevent repetitive and costly task re-runs.
For example, let's say you have a project with two tasks: long
, and short
, and that short
depends on the output of long
. Based on prior runs, we know that long
takes 10 minutes to execute. If we don't want to re-run long
every time we make updates to short
, we can specify a target for long
to save its output to an external location for easy access.
How do you use targets?
To specify a target for a task, use the prism.decorators.target
decorator on the run
function as follows:
The decorator takes two arguments: type
and loc
. The type
parameter should be PrismTarget
class, and it controls how the output should be saved (e.g., as a .txt
file, as a parquet file, or so on). The loc
parameter should be a pathlib.Path
or a string, and it controls where the output should be saved.
In addition, you can add additional keyword arguments to customize the target's saving behavior (e.g., removing the index from CSVs when saving a Pandas DataFrame).
Critical: specifying a target changes the output of the task. Usually, the task output is some sort of object (e.g., a DataFrame). However, targets change it to instead be the location where the object is stored. Put differently, targets cause the output of the task to be the loc
parameter of the prism.decorators.target
decorator.
In the example above, the output of task HelloWorld
is the path "/Users/hello_world.txt"
. This will be reflected in downstream tasks.ref(...)
calls:
What kinds of targets are available?
There are several targets available out-of-the-box. These include Txt
, NumpyTxt
, PandasCsv
, and PysparkParquet
. We're always looking to add targets and improve the Prism functionality, so please let us know if there's a target you want us to include in the next update!
If the pre-defined targets are not sufficient for your use case, then you can define your own PrismTarget
class. These classes are pretty simple. They have three attributes: obj
(i.e., the output to save), loc
(the location to save the output), and hooks
(see here for more information). And, they have one method called save
that specifies how obj
should be saved to loc
.
For reference, here is the full code for the prism.target.Txt
class:
Last updated