spark-submit

Usage

spark-submit is used to execute a PySpark-based Prism project. This is distinct from a Python-based Prism project, in which case you would need to use the run command.

In order to use the spark-submit command, you must have a PySpark profile specified in profile.yml.

Usage: prism spark-submit [-h] [--modules [X.py Y.py ...]] [--all-upstream] [--all-downstream] [--full-tb] [-l] [--vars [A=a B=b ...]] [--context '{}']

Execute your Prism project as a PySpark job

Options:
  -h, --help            show this help message and exit

Command Options:
  --modules [X.py Y.py ...]
                        Path to script(s) that you want to run; if not specified, all modules in project are run
  --all-upstream        Run all modules upstream of --modules
  --all-downstream      Run all modules downstream of --modules

General Options:
  --full-tb             Display the full traceback for errors in the project; default is False
  -l, --log-level       Log level, must be one of `info`, `warn`, `error`, or `debug`. Default is `info`
  --vars [A=a B=b ...]  Prism variables as key-value pairs `key=value`. These overwrite any variable definitions in `prism_project.py`. All values are read as strings.
  --context '{}'        Prism variables as JSON. Cannot co-exist with --vars. These overwrite any variable definitions in `prism_project.py`.

Important: this command is identical to prism run, with the exception that this should only be used to submit PySpark jobs.

Here's what the output looks like in Terminal:

Last updated