livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsch, Tim" <Tim.Har...@Teradata.com>
Subject Re: How to tune Livy for fast queries
Date Thu, 02 Aug 2018 16:16:48 GMT
I've looked a little deeper and see now my error, those parameters are for python and java
clients (clearly).  I forgot there was clients in the code base.   Just wishful thinking on
my part I guess...


In any case, I'm still hoping to understand where Livy overhead on queries is coming from.


________________________________
From: Harsch, Tim <Tim.Harsch@Teradata.com>
Sent: Thursday, August 2, 2018 8:28:58 AM
To: user@livy.incubator.apache.org
Subject: Re: How to tune Livy for fast queries


Thank you Saisai for your response.


    I did have a chance to investigate further and I should give a little background on why
I feel network cost is not the issue:
    I added to our application Kylo (http://kylo.io) as an optional spark server that is used
as a replacement for our existing spark server.  I noticed the performance issues when I use
Livy instead of our pre-existing server.  Kylo's spark-shell would consistently execute queries
quickly (e.g. <100ms) and the same would take longer (>1500ms) with a 500ms polling
(0ms initial query) interval.  This led me to write code that would query Livy quickly in
Python (50ms) and wrap the scala code execute in Livy with some timer method that logs to
Livy logs the time taken.   I would notice that my faster queries are executing in Livy in
<50ms, yet Livy does return the results for at least 350ms (7 queries for results made,
6 returned to client as pending).  I feel fairly confident that Livy has some overhead other
than network.


   I've since discovered these settings in livy-client.conf.template

# Initial interval before polling for Job results
# livy.client.http.job.initial-poll-interval = 100ms
# Maximum interval between successive polls
# livy.client.http.job.max-poll-interval = 5s

and I looked at Livy source and noticed it seems it has a geomertic interval for polling
https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266
<https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266>
I'm thinking that could be the source of my issue but I need a chance to dive deeper.  Do
you think tuning those parameters could improve the situation?


Thanks,

Tim



________________________________
From: Saisai Shao <sai.sai.shao@gmail.com>
Sent: Wednesday, August 1, 2018 7:23:55 PM
To: user@livy.incubator.apache.org
Subject: Re: How to tune Livy for fast queries

[External Email]
________________________________
Probably some network cost should also be counted in. There's no such configuration for tuning.
If you find some performance issue, you can create a JIRA or even a patch to fix Livy.

Harsch, Tim <Tim.Harsch@teradata.com<mailto:Tim.Harsch@teradata.com>> 于2018年8月1日周三
上午8:04写道:

I have a Livy application that I'm trying to tune as I'm seeing some performance issue when
the queries are fast queries.  I've wrapped my queries with a timer that logs the time taken.
 The spark code executed typically takes 50ms to 150ms.  I'm querying Livy every 500ms looking
for my response, and generally it doesn't succeed until the third check.   It seems Livy itself
is spending up to an extra 1000ms.  Where is Livy spending this time?  Are there any tuning
parameters I can adjust?


Also, I am having difficulty changing any of the settings in livy-client.conf.  I placed the
file in /etc/hadoop/conf and livy/conf folder but my settings seem to get ignored.


Thanks

Tim

Mime
View raw message