Prism
v0.2.0rc1
v0.2.0rc1
  • 👋Welcome to Prism!
  • Getting Started
    • Installation
    • Creating your first project
    • Why Prism?
  • Fundamentals
    • Tasks
      • tasks
      • hooks
        • hooks.sql
        • hooks.spark
        • hooks.dbt_ref
    • Targets
      • Multiple targets
    • Config files
      • prism_project.py
        • RUN_ID / SLUG
        • SYS_PATH_CONF
        • THREADS
        • PROFILE_YML_PATH / PROFILE
        • PRISM_LOGGER
        • TRIGGERS_YML_PATH / TRIGGERS
      • Profile YML
      • Triggers YML
    • Jinja
      • __file__ and Path
      • prism_project
      • wkdir
      • parent_dir
      • concat
      • env
  • Adapters
    • Overview
    • sql
      • BigQuery
      • Postgres
      • Redshift
      • Snowflake
      • Trino
    • PySpark
    • dbt
  • Agents
    • Overview
    • Docker
    • EC2
  • CLI
    • Command Line Interface
    • agent
      • apply
      • run
      • build
      • delete
    • compile
    • connect
    • create
      • agent
      • task
      • trigger
    • graph
    • init
    • run
    • spark-submit
  • Advanced features
    • Concurrency
    • Logging
    • Triggers
    • Retries
    • Python Client
  • API Reference
    • prism.task.PrismTask
    • @task(...)
    • @target(...)
    • @target_iterator(...)
    • tasks.ref(...)
    • hooks.sql(...)
    • hooks.dbt_ref(...)
  • Use Cases
    • Analytics on top of dbt
    • Machine Learning
  • Wiki
    • DAGs
Powered by GitBook
On this page
  1. CLI

spark-submit

PreviousrunNextConcurrency

Last updated 1 year ago

Usage

spark-submit is used to execute a PySpark-based Prism project. This is distinct from a Python-based Prism project, in which case you would need to use the command.

In order to use the spark-submit command, you must have a PySpark profile specified in profile.yml.

Usage: prism spark-submit [OPTIONS]                                                                                                 
                                                                                                                                     
 Execute your Prism project as a PySpark job.                                                                                        
                                                                                                                                     
 Examples:                                                                                                                           
                                                                                                                                     
  • prism spark-submit                                                                                                               
  • prism spark-submit -m module01.py -m module02.py                                                                                 
  • prism spark-submit -m module01 --all-downstream                                                                                  
  • prism spark-submit -v VAR1=VALUE1 -v VAR2=VALUE2                                                                                 
  • prism spark-submit --context '{"hi": 1}'                                                                                         
                                                                                                                                     
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --module          -m  TEXT                     Modules to execute. You can specify multiple modules with as follows: -m           │
│                                                <your_first_module> -m <your_second_module>.                                       │
│ --all-downstream                               Execute all tasks downstream of modules specified with --module.                   │
│ --all-upstream                                 Execute all tasks upstream of modules specified with --module.                     │
│ --log-level       -l  [info|warn|error|debug]  Set the log level.                                                                 │
│ --full-tb                                      Show the full traceback when an error occurs.                                      │
│ --vars            -v  TEXT                     Variables as key value pairs. These overwrite variables in prism_project.py. All   │
│                                                values are intepreted as strings.                                                  │
│ --context             TEXT                     Context as a dictionary. Must be a valid JSON. These overwrite variables in        │
│                                                prism_project.py.                                                                  │
│ --help                                         Show this message and exit.                                                        │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Important: this command is identical to prism run, with the exception that this should only be used to submit PySpark jobs.

Here's what the output looks like in Terminal:

$ prism spark-submit
--------------------------------------------------------------------------------
<HH:MM:SS> | INFO  | Running with prism v0.2.0rc1...
<HH:MM:SS> | INFO  | Found project directory at /Users/my_first_project

<HH:MM:SS> | INFO  | RUNNING EVENT 'parsing prism_project.py'................................................ [RUN]
22/06/28 21:01:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
<HH:MM:SS> | INFO  | FINISHED EVENT 'parsing prism_project.py'............................................... [DONE in 0.03s]
<HH:MM:SS> | INFO  | RUNNING EVENT 'module DAG'.............................................................. [RUN]
<HH:MM:SS> | INFO  | FINISHED EVENT 'module DAG'............................................................. [DONE in 0.01s]
<HH:MM:SS> | INFO  | RUNNING EVENT 'creating pipeline, DAG executor'......................................... [RUN]
<HH:MM:SS> | INFO  | FINISHED EVENT 'creating pipeline, DAG executor'........................................ [DONE in 0.01s]

<HH:MM:SS> | INFO  | ===================== tasks 'vermilion-hornet-Gyycw4kRWG' =====================
<HH:MM:SS> | INFO  | 1 of 1 RUNNING EVENT 'module01.py'...................................................... [RUN]
<HH:MM:SS> | INFO  | 1 of 1 FINISHED EVENT 'module01.py'..................................................... [DONE in 0.01s]

Done!
---------------------------------------------------------------------------------
run