Prism is the easiest way to create data pipelines in Python. With it, users can break down their data flows into modular tasks, manage dependencies, and execute complex computations in sequence.
CHANGELOG
There are significant differences between this version previous version (v0.2.8). These include:
Deprecations
CLI
The following CLI commands are deprecated:
prism agent [apply | build | run | delete]
prism compile
prism connect
prism create [agent | task | trigger]
prism spark-submit
The following CLI commands remain:
prism init
prism run
prism graph
Manifest
In previous versions, when a project was compiled, Prism would create a manifest.json that contained the project's targets, refs, tasks.
Starting in v0.3.0, Prism uses a SQLite database to store project, task, target, and run-related information.
prism graph still uses a manifest.json when serving the visualizer UI.
tasks and hooks in task arguments
In previous versions, task functions had two required arguments: tasks, and hooks. tasks was used to reference the output of other tasks, and hooks was used to access adapters specified in profile.yaml.
Starting in v0.3.0, Prism has replaced both of these with a CurrentRun object. See more information below.
profile.yaml
In previous versions, users could connect to SQL databases with adapters defined in their profile.yaml.
Starting in v0.3.0, Prism uses Connector instances to connect to databases. See more information below.
In addition, we have deprecated the dbt adapter.
triggers.yaml
In previous versions, users could run custom code after a project succeeded or failed using triggers.yaml.
Starting in v0.3.0, Prism uses Callback instances to run custom code based on a project's status. See more information below.
✨ Enhancements
PrismProject entrypoint
In previous versions, users managed their project with a prism_project.py file and optional profile.yaml and triggers.yaml files. The entrypoint to a Prism project was the CLI — users had to run prism run to actually execute their project.
Starting in v0.3.0, the PrismProject class is the recommended entrypoint to a Prism project. This class has the following benefits:
Programatic access to Prism projects — i.e., Prism projects can be instantiated and run via standard Python
The PrismProject class provides more fine-grained control over running Prism projects.
Projects as code, not as YAML. Instead of dealing with multiple files, Prism projects can be written as code that lives in a single file.
The PrismProject class has two methods: run and graph. As the names suggest, run is used to execute Prism projects, and graph is used to launch the Prism Visualizer UI.
Rather than tasks and refs, object stores information about the current project run.
Specific methods include:
CurrentRun.ctx(key: str, default_value: Any) -> Any # for grabbing variables from the PrismProject's base context or runtime context
CurrentRun.conn(self, connector_id: str) -> Connector # for grabbing a connector class defined in the PrismProject's instantiation
CurrentRun.ref(self, task_id: str) -> Any # for grabbing the output of a task
This is a slightly cleaner user experience (one, unified context object instead of two), and it enables users to take advantage of autocomplete functionality in their IDE.
Connectors
Instead of defining adapters in profile.yaml, connectors are defined in Python as instances of the Connector class.
There are five Connector subclasses:
BigQueryConnector
PostgresConnector
PrestoConnector
RedshiftConnector
SnowflakeConnector
TrinoConnector
PrismProject accepts a list of Connector objects via the connectors keyword argument, i.e.,
Prism uses rich to create beautiful logs for the user. This includes logs for events, tasks, and exceptions.
Updated triggers naming convention to callbacks. See above for more details.
Updated adapters naming convention to connectors. See above for more details.
Why use Prism?
Prism was built to streamline the development and deployment of complex data pipelines. Here are some of its main features:
Real-time dependency declaration: With Prism, users can declare dependencies using a simple function call. No need to explicitly keep track of the pipeline order — at runtime, Prism automatically parses the function calls and builds the dependency graph.
Intuitive logging: Prism automatically logs events for parsing the configuration files, compiling the tasks and creating the project, and executing the tasks. No configuration is required.
Flexible CLI: Users can instantiate, run, and visualize projects using a simple, but powerful command-line interface.
“Batteries included”: Prism comes with all the essentials needed to get up and running quickly. Users can create and run their first project in less than 2 minutes.
Integrations: Prism integrates with several tools that are popular in the data community, including Snowflake, Google BigQuery, Redshift, Trino, and Presto. We're adding more integrations every day, so let us know what you'd like to see!
What is a Prism project?
The PrismProject class is the entrypoint into all Prism projects. This class allows for fine-grained control of project runs. In order to run a Prism project, two things are needed:
A Python module instantiating the PrismProject class.
A directory containing tasks to run. This should be supplied to the PrismProject instance via the tasks_dir keyword argument.
Note: both of the components listed above are supplied in the default project that is created via prism init!
Here's a simple example for what this could look like: