Prism
v0.1.9rc2
v0.1.9rc2
  • 👋Welcome to Prism!
  • Getting Started
    • Installation
    • Creating your first project
    • Why Prism?
  • Fundamentals
    • Tasks
      • tasks
      • hooks
        • hooks.sql
        • hooks.spark
        • hooks.dbt_ref
    • Targets
      • Multiple targets
    • Config files
      • prism_project.py
        • RUN_ID / SLUG
        • SYS_PATH_CONF
        • THREADS
        • PROFILE_YML_PATH / PROFILE
        • PRISM_LOGGER
        • TRIGGERS_YML_PATH / TRIGGERS
      • Profile YML
      • Triggers YML
    • Jinja
      • __file__ and Path
      • prism_project
      • wkdir
      • parent_dir
      • concat
      • env
  • Adapters
    • Overview
    • sql
      • BigQuery
      • Postgres
      • Redshift
      • Snowflake
      • Trino
    • PySpark
    • dbt
  • Agents
    • Overview
    • Docker
    • EC2
  • CLI
    • Command Line Interface
    • init
    • compile
    • connect
    • create
      • agent
      • task
      • trigger
    • graph
    • run
    • spark-submit
    • agent
      • apply
      • run
      • build
      • delete
  • Advanced features
    • Concurrency
    • Logging
    • Triggers
    • Retries
    • Python Client
  • API Reference
    • prism.task.PrismTask
    • @task(...)
    • @target(...)
    • @target_iterator(...)
    • tasks.ref(...)
    • hooks.sql(...)
    • hooks.dbt_ref(...)
  • Use Cases
    • Analytics on top of dbt
    • Machine Learning
  • Wiki
    • DAGs
Powered by GitBook
On this page
  1. Adapters

PySpark

PreviousTrinoNextdbt

Last updated 1 year ago

Configuration

The PySpark configurations are:

  • alias: the alias used to refer to the SparkSession within the modules. The default is spark.

  • loglevel: the log level to use for the console. The default value is WARN. Acceptable values can be found .

  • Any key, value pair that can be used in the config method of the . Some examples are shown below!

# profile.yml

<profile name here>: # change this!
  adapters:
    <pyspark adapter name here>:  # change this!
      alias: spark
      loglevel: WARN
      config:
        spark.driver.cores:
        spark.driver.memory:
        spark.driver.memoryOverhead:
        spark.executor.cores:
        spark.executor.memory:
        spark.executor.memoryOverhead:
        spark.executor.instances:
        spark.task.cpus:
        spark.sql.broadcastTimeout:
        # Add additional config variables here!

Under the hood, prism takes care of parsing the configuration variables, construction a SparkSession instance, and storing the instance in an attribute named alias within the hooks object.

Users can access the SparkSession directly using hooks.{alias}.

def run(self, tasks, hooks):
    # Assuming the alias is 'spark' in profile.yml
    df = hooks.spark.read.parquet('data_path/') 
here
SparkSession.builder class