Skip to main content

Spark Pipes Resource

class ascii_library.orchestration.pipes.spark_pipes.SparkPipesResource

Bases: ConfigurableResource

Generic configurable spark-pipes resource.

Executes jobs in one of several modes: local for quick development, or on scalable cloud backends like databricks and emr. Pipelines may optionally apply sampling to speed up end-to-end runs.

Databricks authentication (environment variables)

DATABRICKS_HOST
DATABRICKS_CLIENT_ID
DATABRICKS_CLIENT_SECRET

EMR credentials (environment variables)

ASCII_AWS_ACCESS_KEY_ID
ASCII_AWS_SECRET_ACCESS_KEY
  • Variables:
    • engine (Engine) – The default execution engine to use.
    • execution_mode (ExecutionMode) – The execution mode for the pipeline (e.g. debug, prod).

engine : Engine

execution_mode : ExecutionMode

get_spark_pipes_client(override_default_engine)