Prism
v0.2.2
v0.2.2
  • 👋Welcome to Prism!
  • Getting Started
    • Installation
    • Creating your first project
    • Why Prism?
  • Fundamentals
    • Tasks
      • tasks
        • tasks.ref()
      • hooks
        • hooks.sql
        • hooks.spark
        • hooks.dbt_ref
        • hooks.get_connection
        • hooks.get_cursor
    • Targets
      • Multiple targets
    • Config files
      • prism_project.py
        • RUN_ID / SLUG
        • SYS_PATH_CONF
        • THREADS
        • PROFILE_YML_PATH / PROFILE
        • PRISM_LOGGER
        • TRIGGERS_YML_PATH / TRIGGERS
      • Profile YML
      • Triggers YML
    • Jinja
      • __file__ and Path
      • prism_project
      • wkdir
      • parent_dir
      • concat
      • env
  • Adapters
    • Overview
    • sql
      • BigQuery
      • Postgres
      • Redshift
      • Snowflake
      • Trino
      • Presto
    • PySpark
    • dbt
  • Agents
    • Overview
    • Docker
    • EC2
  • CLI
    • Command Line Interface
    • agent
      • apply
      • run
      • build
      • delete
    • compile
    • connect
    • create
      • agent
      • task
      • trigger
    • graph
    • init
    • run
    • spark-submit
  • Advanced features
    • Concurrency
    • Logging
    • Triggers
    • Retries
    • Python Client
  • API Reference
    • prism.task.PrismTask
    • @task(...)
    • @target(...)
    • @target_iterator(...)
    • TaskManager
      • tasks.ref(...)
    • PrismHooks
      • hooks.sql(...)
      • hooks.dbt_ref(...)
      • hooks.get_connection(...)
      • hooks.get_cursor(...)
    • prism.target.PrismTarget
  • Use Cases
    • Analytics on top of dbt
    • Machine Learning
  • Wiki
    • DAGs
Powered by GitBook
On this page
  • What are tasks?
  • Class-based tasks
  • Function-based tasks
  1. Fundamentals

Tasks

In its most basic form, any data pipeline can be thought of as a series of discrete steps that run in some sort of sequence. For example, ETL pipelines generally have three steps: extract --> transform --> load.

Prism projects are no different. A Prism project is composed of a set of tasks, and these tasks contain the brunt of the project's core logic.

What are tasks?

In Prism, tasks can be either classes or functions. Here what they look like:

# tasks/hello_world.py

import prism.task
import prism.target

class HelloWorld(prism.task.PrismTask):
    
    def run(self, tasks, hooks):
        test_str = "Hello, world!"
        return test_str
# tasks/hello_world.py

from prism.decorators import task

@task()
def hello_world(tasks, hooks):
    test_str = "Hello, world!"
    return test_str

We'll go into the technical details of both next.

Class-based tasks

Tasks are classes that inherit an abstract class called PrismTask. There are two requirements to which all tasks must adhere:

  1. Each task must have method called run. This method should contain all the business logic for the task, and it should return a non-null output.

  2. Tasks must live in a *.py file in the tasks directory.

Important: the output of a task's run function is what's used by downstream tasks in your pipeline. The return value can be anything – a Pandas or Spark DataFrame, a Numpy array, a string, a dictionary, whatever – but it cannot be null. Prism will throw an error if it is.

Apart from these two conditions, feel free to structure and define your tasks however you'd like, i.e., add other class methods, class attributes, etc:

# tasks/hello_world.py

from prism.task import PrismTask

class HelloWorld(PrismTask):

    def some_other_function(*args, **kwargs):
        # do something
        
    def run(self, tasks, hooks):
        test_str = "Hello, world!"
        _ = some_other_function()
        return test_str

As you can see, our HelloWorld task is lives in the tasks directory. It inherits the PrismTask class, and it contains a run function that returns a non-null string.

And that's it! Create a class that inherits the PrismTask class and implement the run method. Prism will take care of the rest.

Good to know: Although user-defined tasks can be arbitrarily long or complex, it is helpful to think of them as discrete steps or objectives in your pipeline. For example, if you are creating an ETL pipeline, then you may want to split your code into three tasks: an extract task, a transform task, and a load task.

Function-based tasks

You can also define tasks using functions rather than entire classes. There's no real difference between a function-based task and a class-based task — we created the feature so that you could work with what you're most comfortable with.

In order for a function to be a task, it must:

  1. Be decorated with the prism.decorators.task function

As with class-based tasks, the functions must return a non-null output and tasks and must live in a *.py file in the tasks directory.

Let's take a look at our original example:

# tasks/hello_world.py

from prism.decorators import task

@task()
def hello_world(tasks, hooks):
    test_str = "Hello, world!"
    return test_str 
PreviousWhy Prism?Nexttasks

Critical: The run function has two mandatory parameters: , and . Both are critical, and Prism will throw an error if it finds a run function without these two parameters.

For additional information, consult the .

Take two positional arguments: , and . Both are critical, and Prism will throw an error if it finds a task function without these two parameters.

The technical specifications for the @task decorator can be found in the .

tasks
hooks
API reference
tasks
hooks
API reference