The PrismProject class is the entrypoint into all Prism projects.
Overview
Starting in v0.3.0, users can use the PrismProject class to instantiate, manage, and run their Prism projects. It lives in the prism.client module. Here's the class definition for a PrismProject:
Most of these arguments are fairly self-explanatory. However, it's worth spending a bit of extra time on package_lookups.
Accessing other modules via package_lookups
package_lookups allows you to modify your sys.path at runtime and access modules and functions in other directories. Note that sys.path is only temporarily modified during the course of your project run. When your project run finishes, extraneous items are deleted from your sys.path.
This usually isn't necessary if you're running an isolated project. However, this changes if you have an entire repository of Prism projects and are collaborating with other folks. Common functionality (e.g., helper or utility functions) used across projects may live outside any one project, so you'll need to ensure your project has access to those modules via package_lookups.
For example, suppose your project directory looks like this:
By default, each project's task directory and associated parent is added to package_lookups. That means that project1 can access utils1.py and project2 can access utils2.py. However, the parent directory (i.e., all_prism_projects/) is not added to the project's sys.path by default, so the modules in common are not accessible to either project.
Let's see how we can remedy this with package_lookups:
Now, a task in project1 can import common/ at runtime:
# all_prism_projects/project1/tasks/example_task.pyimport common.utils # this is OK since all_prism_projects/ is in package_lookupsimport prism.taskclassExampleTask(prism.task.PrismTask) ...
Important: we recommend specifying paths as relative paths rather than absolute paths (i.e., hard-coded string paths). This enables reproducibility across different machines.
Examples
Basic Example
from prism.client import PrismProjectproject =PrismProject()if__name__=="__main__": project.run()
In this basic example, we define a PrismProject class with all the default arguments. Prism will assume that tasks live in a directory called tasks that lives in the same folder as this module, and it will execute these tasks in a single-threaded when you execute this module.
Project with on_success and on_failure callbacks
from prism.client import PrismProjectdefprint_success():print("Success!")project =PrismProject( on_success=[print_success])if__name__=="__main__": project.run()
In this first example, we use define our callback function print_success in the same module as our PrismProject. If the project run succeeds, then Success! will be printed in the console output.
What if the connector doesn't live inside our project directory? Then, we can package_lookups and the connector import path instead. For example:
from prism.client import PrismProjectproject =PrismProject( package_lookups=[ Path.cwd().parent /"common", # some path outside our working directory ] on_success=["callbacks.<some callback function>"# callbacks.py exists within `common/` ])if__name__=="__main__": project.run()
In this first example, we define our SnowflakeConnector in the same module as our PrismProject instance. Tasks can access this connector via CurrentRun.conn(id="snowflake-connector".
What if we want to share connectors across projects? We would ideally place these connectors in a separate module that can be imported by all relevant projects. Then, we can use package_lookups and the connectors' import path in each PrismProject instance:
from prism.client import PrismProjectfrom prism.connectors import SnowflakeConnectorproject =PrismProject( package_lookups=[ Path.cwd().parent /"connectors", # some path outside our working directory ] connectors=["snowflake_conn.SmallSnowflakeWH", # `snowflake_conn.py` exists within `connectors/`"postgres_conn.HerokuProd", # `postgres_conn.py` exists within `connectors/` ])if__name__=="__main__": project.run()
We can access these context variables via CurrentRun.ctx, i.e.,
from prism.decorators import task, targetimport prism.target@task( targets=[target( type=prism.target.Txt, loc=CurrentRun.ctx("OUTPUT") /"some_example_task_output.txt", ) ],)defsome_example_task():# do some stuff here
Methods
In addition, the class has two methods: run and graph.
unique ID to give this project. If not specified, then Prism will create an ID for you. This job ID will be in the format {project dir}-{version}.
name
human-readable name to give this project. If not specified, then all Prism will create a name for you. This name will be the name of the project directory (i.e., the directory in which the PrismProject lives).
version
project version. Defaults to 1.0.
connectors
list of connectors to use in your project. These can be accessed at runtime CurrentRun.conn(...). Using connectors prevents you from having to define your connection class in each module. Connectors should either be specified as a Connector object or as a string representing the import path to the Connector object.
concurrency
number of threads to use when running tasks. Default is 1 (i.e., single-threaded).
tasks_dir
directory containing tasks. Default is the tasks folder in the current directory.
package_lookups
additional directories / modules to look within when importing modules and functions in your code. The tasks_dir and its parent are automatically added to this list.
on_success
list of callbacks to run when the job succeeds. Callbacks should be functions that do not accept any argument. Callbacks should either be specified as a function or as a string representing the import path to the function.
on_failure
list of callbacks to run when the job fails. Callbacks should be functions that do not accept any argument. Callbacks should either be specified as a function or as a string representing the import path to the function.