livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubham Gupta <>
Subject Use existing SparkSession in POST/batches request
Date Mon, 01 Oct 2018 01:30:16 GMT
I'm trying to use Livy to remotely submit several Spark *jobs*. Lets say I
want to perform following *spark-submit task remotely* (with all the
options as-such)

spark-submit \
--class \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=1g \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
--conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
--master yarn \
--deploy-mode cluster \
/home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
--start=2012-12-21 \
--end=2012-12-21 \
--pipeline=db-importer \

*NOTE: The options after the JAR (--start, --end etc.) are specific to
my Spark application. I'm using scopt <> for


   I'm aware that I can supply all the various options in above
spark-submit command
   using Livy POST/batches request

   But since I have to make over 250 spark-submits remotely, I'd like to
   exploit Livy's *session-management capabilities*; i.e., I want Livy to
   create a SparkSession once and then use it for all my spark-submit

   The POST/sessions request
   me to specify quite a few options for instantiating a SparkSession remotely.
   However, I see no *session argument* in POST/batches request


My questions are

   1. How can I make use of the SparkSession that I created using
   POST/sessions request for submitting my Spark job using POST/batches
   2. In case its not possible, why is that the case?
   3. Any workarounds?


I've referred to following examples but they only demonstrate supplying (
python) *code* for Sparkjob within Livy's POST request

   - pi_app
   - rssanders3/airflow-spark-operator-plugin
   - livy/examples <>


Here's the link <> to my
original question on StackOverflow

*Shubham Gupta*
Software Engineer

View raw message