Targets

There are two fundamental building blocks to a Prism project: tasks and targets.

What are targets?

The second fundamental building block to Prism projects are targets.

Targets enable you to cache the results of your tasks. Put differently, targets are used to store the results of your task at an external location (e.g., a CSV on your local machine, a table in your data warehouse, a file in an S3 bucket, and so on). In doing so, they prevent repetitive and costly task re-runs.

Why use targets?

There are three reasons to use targets:

  1. Encapsulated saving behavior - many projects will require your tasks to save data in a similar format to the same location (e.g., a database, an S3 bucket, etc.) Using targets reduces duplicative code, because it encapsulates common saving behavior in a function for reuse across different tasks.

  2. Cached outputs - for example, let's say you have a project with two tasks: long, and short, and that short depends on the output of long. Based on prior runs, we know that long takes 10 minutes to execute. If we don't want to re-run long every time we make updates to short, we can specify a target for long to save its output to an external location for easy access.

  3. Isolated task runs - when you use targets, you can run downstream tasks without always have to re-run their upstream dependencies. For example, suppose you save a large CSV in taskA that is then further processed by taskB, and suppose you are debugging taskB. Since taskA has a target, Prism allows you to only run taskB β€”Β it will automatically parse taskA's target definition and provide it to taskB.

These benefits may not sound like much, but they make life easier, especially as your projects grow in size and complexity.

How do you use targets?

To specify a target for a task, use the prism.decorators.target decorator. This decorator takes two required keyword arguments:

  • type: a valid PrismTarget instance. This specifies how the object should be saved (e.g., as a .txt file, as a .csv file. etc.)

  • loc: a string or pathlib.Path object denoting where the object should be saved

In addition, you can add additional keyword arguments to customize the target's saving behavior (e.g., removing the index from CSVs when saving a Pandas DataFrame)

Incorporating targets into tasks

For class-based tasks, you directly decorate the run function. For function-based tasks, you place the target decorator call inside the targets keyword argument of the task decorator.

Here's what that looks like:

For class-based tasks, simply decorate the run function.

# tasks/hello_world.py

import prism.task
import prism.target
import prism.decorators

class HelloWorld(prism.task.PrismTask):
    
    @prism.decorators.target(
        type=prism.target.Txt, 
        loc="/Users/hello_world.txt", 
        **kwargs
    )
    def run(self):
        test_str = "Hello, world!"
        return test_str

Note that, even though a target is used, the return value of a downstream CurrentRun.ref() call will still be the string "Hello, world!":

# tasks/second_task.py

import prism.task
import prism.target
import prism.decorators
from prism.runtime import CurrentRun

class SecondTask(prism.task.PrismTask):

    def run(self):
        hello_world_str = CurrentRun.ref("hello_world.HelloWorld")  # returns "Hello, world!"

What kinds of targets are available?

There are several, basic targets available out-of-the-box. These include Txt, NumpyTxt, PandasCsv, and JSON. We're always looking to add targets and improve the Prism functionality, so please let us know if there's a target you want us to include in the next update!

If the pre-defined targets are not sufficient for your use case, then you can define your own PrismTarget class. These classes are pretty simple. They have two attributes: obj (i.e., the output to save), and loc (the location to save the output). And, they have two methods:

  • save: method that specifies how obj should be saved to loc.

  • open: class method that specifies how a task should open the contents of this target

For reference, here is the full code for the prism.target.Txt class:

class Txt(PrismTarget):

    def save(self, **kwargs):
        with open(self.loc, "w") as f:
            f.write(self.obj, **kwargs)
        f.close()
    
    @classmethod
    def open(cls, loc, hooks):
        with open(loc, 'r') as f:
            obj = f.read()
        return cls(obj, loc, hooks)

If you want to save data in your database using a target, you can access connectors in your targets via the CurrentRun object.

Consult the API reference to see the full implementation for all available targets.

Last updated