livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsch, Tim" <>
Subject Re: new user to livy/spark, basic use case questions
Date Wed, 16 May 2018 20:36:59 GMT
Results for any given script, or script segments, sent to Livy are retrievable in a subsequent
call to the statements endpoint.  The results are available as long as the session and server
are still alive.  If you want a script to write results to a datasource, so they are more
permanent, you could then access them at any time in the future.  It's just a design consideration.

The Livy Programmatic API is useful for batch jobs.  Examples are available on the web site.
 The REST API has more features, from what I've gathered.  With it you can run batch or scripts.

The REST API also supports a 'shared' session type.  In a 'shared' session (not yet documented)
you could run a scala script, followed by python or R from within the same session.

From: Decker, Seth Andrew <>
Sent: Wednesday, May 16, 2018 7:16:56 AM
Subject: new user to livy/spark, basic use case questions


I’m new to Livy and Spark and have a question about how to properly use it.

I’m wanting to use spark both for the interactive scripting side as well as for passing
it data/parameters to run x defined algorithms/applications. I’m looking at using the livy
to interface with spark restfully, but am not sure if I can handle things how I want(or if
that’s the intended way to use them). So I can pass python script into spark through livy,
which is great.

Is the intended way to get results to save to an hdfs/database/data store and then read those
results in? I noticed in the java/python clients that you can create your job through livy
and get the results back in the livy http message, which seems much simpler but I’m having
trouble/concerns over using that path.

My first issue is I don’t necessarily need the client to know the job. I’d rather that
be saved to hdfs in apache/Hadoop and then livy just tells spark to run it with x parameters/input.
Is this doable with the client or do I just stick with the http api?

If the previous is possible is it also possible to run a python script in spark, called via
the java client? From my sleuthing in the github page it looks like you upload/run/submit
jars in java and .py in python, and I would probably have use cases of wanting to run both(such
as having tensorflow .py scripts, or custom java code). Is there a way to run both from the
same client?


Seth Decker

View raw message