👋Welcome to Prism!
These docs current for version v0.3.0.
Prism is the easiest way to create data pipelines in Python. With it, users can break down their data flows into modular tasks, manage dependencies, and execute complex computations in sequence.
CHANGELOG
There are significant differences between this version previous version (v0.2.8). These include:
⚠️ Deprecations
CLI
The following CLI commands are deprecated:
prism agent [apply | build | run | delete]prism compileprism connectprism create [agent | task | trigger]prism spark-submit
The following CLI commands remain:
prism initprism runprism graph
Manifest
In previous versions, when a project was compiled, Prism would create a
manifest.jsonthat contained the project's targets, refs, tasks.Starting in
v0.3.0, Prism uses a SQLite database to store project, task, target, and run-related information.prism graphstill uses amanifest.jsonwhen serving the visualizer UI.
tasks and hooks in task arguments
In previous versions, task functions had two required arguments:
tasks, andhooks.taskswas used to reference the output of other tasks, andhookswas used to access adapters specified inprofile.yaml.Starting in
v0.3.0, Prism has replaced both of these with aCurrentRunobject. See more information below.
profile.yaml
In previous versions, users could connect to SQL databases with adapters defined in their
profile.yaml.Starting in
v0.3.0, Prism usesConnectorinstances to connect to databases. See more information below.In addition, we have deprecated the
dbtadapter.
triggers.yaml
In previous versions, users could run custom code after a project succeeded or failed using
triggers.yaml.Starting in
v0.3.0, Prism usesCallbackinstances to run custom code based on a project's status. See more information below.
✨ Enhancements
PrismProject entrypoint
In previous versions, users managed their project with a
prism_project.pyfile and optionalprofile.yamlandtriggers.yamlfiles. The entrypoint to a Prism project was the CLI — users had to runprism runto actually execute their project.Starting in
v0.3.0, thePrismProjectclass is the recommended entrypoint to a Prism project. This class has the following benefits:Programatic access to Prism projects — i.e., Prism projects can be instantiated and run via standard Python
The
PrismProjectclass provides more fine-grained control over running Prism projects.Projects as code, not as YAML. Instead of dealing with multiple files, Prism projects can be written as code that lives in a single file.
The
PrismProjectclass has two methods:runandgraph. As the names suggest,runis used to execute Prism projects, andgraphis used to launch the Prism Visualizer UI.Here's how one uses the
PrismProjectin action:project = PrismProject( version="1.0", tasks_dir=Path.cwd() / "tasks", concurrency=2, ctx={ "OUTPUT": Path.cwd() / "output" }, ) if __name__ == "__main__": project.run()
CurrentRun context object
Rather than
tasksandrefs, object stores information about the current project run.Specific methods include:
CurrentRun.ctx(key: str, default_value: Any) -> Any # for grabbing variables from the PrismProject's base context or runtime context CurrentRun.conn(self, connector_id: str) -> Connector # for grabbing a connector class defined in the PrismProject's instantiation CurrentRun.ref(self, task_id: str) -> Any # for grabbing the output of a taskHere's how to use it in a task:
# example_task.py from prism.decorators import task from prism.runtime import CurrentRun @task(id="example-task-id") def example_task(): other_output = CurrentRun.ref(task_id="other_task_id") ....This is a slightly cleaner user experience (one, unified context object instead of two), and it enables users to take advantage of autocomplete functionality in their IDE.
Connectors
Instead of defining adapters in
profile.yaml, connectors are defined in Python as instances of theConnectorclass.There are five
Connectorsubclasses:BigQueryConnectorPostgresConnectorPrestoConnectorRedshiftConnectorSnowflakeConnectorTrinoConnector
PrismProjectaccepts a list ofConnectorobjects via theconnectorskeyword argument, i.e.,snowflake = SnowflakeConnector( id="snowflake-connector", ... ) project = PrismProject( ..., connectors=[snowflake], ..., )These connector objects can be accessed via
CurrentRun, i.e.,conn = CurrentRun.conn(connector_id="snowflake-connector") conn.execute_sql(...)Callbacks
Instead of defining custom code to run after a project has succeeded via the
triggers.yamlfile, users can specify custom code as Python functions.PrismProjectaccepts a list of functions via theon_successandon_failurekeyword argumentsCallback functions should not accept any arguments, e.g.,
def print_success(): print("Success!") project = PrismProject( ..., on_success=[print_success], ..., )
🛠️ Other improvements
Prism uses
richto create beautiful logs for the user. This includes logs for events, tasks, and exceptions.Updated
triggersnaming convention tocallbacks. See above for more details.Updated
adaptersnaming convention toconnectors. See above for more details.
Why use Prism?
Prism was built to streamline the development and deployment of complex data pipelines. Here are some of its main features:
Real-time dependency declaration: With Prism, users can declare dependencies using a simple function call. No need to explicitly keep track of the pipeline order — at runtime, Prism automatically parses the function calls and builds the dependency graph.
Intuitive logging: Prism automatically logs events for parsing the configuration files, compiling the tasks and creating the project, and executing the tasks. No configuration is required.
Flexible CLI: Users can instantiate, run, and visualize projects using a simple, but powerful command-line interface.
“Batteries included”: Prism comes with all the essentials needed to get up and running quickly. Users can create and run their first project in less than 2 minutes.
Integrations: Prism integrates with several tools that are popular in the data community, including Snowflake, Google BigQuery, Redshift, Trino, and Presto. We're adding more integrations every day, so let us know what you'd like to see!
What is a Prism project?
The PrismProject class is the entrypoint into all Prism projects. This class allows for fine-grained control of project runs. In order to run a Prism project, two things are needed:
A Python module instantiating the
PrismProjectclass.A directory containing tasks to run. This should be supplied to the
PrismProjectinstance via thetasks_dirkeyword argument.
Note: both of the components listed above are supplied in the default project that is created via prism init!
Here's a simple example for what this could look like:
Guides: Jump right in
Follow our handy guides to get started on the basics as quickly as possible:
Getting StartedFundamentalsCLIAPI ReferenceIf you have any feedback about the product or the docs, please let us know!
Last updated