livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Use existing SparkSession in POST/batches request
Date Mon, 01 Oct 2018 01:59:55 GMT
BTW, zeppelin has integrated livy's interactive mode to run spark code. You
may try this as well.

https://zeppelin.apache.org/docs/0.8.0/interpreter/livy.html



Jeff Zhang <zjffdu@gmail.com>于2018年10月1日周一 上午9:58写道:

>
> Have you tried the interactive mode ?
>
> Shubham Gupta <y2k.shubhamgupta@gmail.com>于2018年10月1日周一 上午9:30写道:
>
>> I'm trying to use Livy to remotely submit several Spark *jobs*. Lets say
>> I want to perform following *spark-submit task remotely* (with all the
>> options as-such)
>>
>> spark-submit \
>> --class com.company.drivers.JumboBatchPipelineDriver \
>> --conf spark.driver.cores=1 \
>> --conf spark.driver.memory=1g \
>> --conf spark.dynamicAllocation.enabled=true \
>> --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
>> --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
>> --master yarn \
>> --deploy-mode cluster \
>> /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
>> \
>> --start=2012-12-21 \
>> --end=2012-12-21 \
>> --pipeline=db-importer \
>> --run-spiders
>>
>> *NOTE: The options after the JAR (--start, --end etc.) are specific to
>> my Spark application. I'm using scopt <https://github.com/scopt/scopt> for
>> this*
>> ------------------------------
>>
>>    -
>>
>>    I'm aware that I can supply all the various options in above
>>    spark-submit command using Livy POST/batches request
>>    <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>>    .
>>    -
>>
>>    But since I have to make over 250 spark-submits remotely, I'd like to
>>    exploit Livy's *session-management capabilities*; i.e., I want Livy to
>>    create a SparkSession once and then use it for all my spark-submit
>>     requests.
>>    -
>>
>>    The POST/sessions request
>>    <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-sessions>
allows
>>    me to specify quite a few options for instantiating a SparkSession remotely.
>>    However, I see no *session argument* in POST/batches request
>>    <https://livy.incubator.apache.org/docs/latest/rest-api.html#post-batches>
>>    .
>>
>> ------------------------------
>>
>> My questions are
>>
>>
>>    1. How can I make use of the SparkSession that I created using
>>    POST/sessions request for submitting my Spark job using POST/batches
>>     request?
>>    2. In case its not possible, why is that the case?
>>    3. Any workarounds?
>>
>> ------------------------------
>>
>> I've referred to following examples but they only demonstrate supplying (
>> python) *code* for Sparkjob within Livy's POST request
>>
>>    - pi_app
>>    <https://github.com/apache/incubator-livy/blob/master/examples/src/main/python/pi_app.py>
>>    - rssanders3/airflow-spark-operator-plugin
>>    <https://github.com/rssanders3/airflow-spark-operator-plugin/blob/master/example_dags/livy_spark_operator_python_example.py>
>>    - livy/examples <https://livy.incubator.apache.org/examples/>
>>
>> ------------------------------
>>
>> Here's the link <https://stackoverflow.com/questions/51746286/> to my
>> original question on StackOverflow
>>
>> *Shubham Gupta*
>> Software Engineer
>>  zomato
>>
>

Mime
View raw message