livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim Hammar <...@logicalclocks.com>
Subject spark.files overrides spark.yarn.dist.files when hive is enabled
Date Wed, 31 Oct 2018 08:38:15 GMT
Hi,

We use livy inside our multi-tenant data science platform that is running on YARN and HDFS.
Recently we added support for SparkSQL on Hive by placing the necessary jar files in spark/jars,
adding hive-site-xml in spark/conf and setting livy.repl.enableHiveContext=trueinlivy.conf.

However, yesterday, I discovered that when livy started the spark session it overrides our
properties in spark.yarn.dist.files and spark.yarn.jars, this was never an issue before we
enabled hive. Looking into the code, I found that what happens is that if hive is enabled,
livy appends (if not already exists) the hive-site.xml to the list of files specified by the
user in the spark.files property and the necessary hive jars to the list of spark jars specified
by the user-request in the property spark.jars, see the related code snippet here:

https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L285

Now what seems to happen is that if all of spark.files, spark.jars, spark.yarn.dist.files,
and spark.yarn.jars are non-null when the job is submitted (spark.files spark.jars filled
in by livy and spark.yarn.dist.files spark.yarn.jars filled in by the user-request from our
platform),spark.yarn.dist.files gets set to spark.files and spark.yarn.jars gets set to spark.jars

Since for example spark.files and spark.yarn.dist.files have the same semantics but are supposed
to be used for non-yarn and yarn deployments, respectively, spark just overwrites spark.yarn.dist.files
with the contents of spark.files. In general, these configuration properties should be mutually
exclusive, you should not mix them as one is designed for YARN mode and the other is for non-YARN
mode.

My questions to you are, have you encountered this issue before? Is there some configuration
option in Livy that I am missing?

My current solution is to deploy a fork of livy on our platform where I check in the code
whether the user-request have populated spark.yarn.X properties and then I append all livy-generated
properties to the yarn-ones. Otherwise I append the livy-generated properties to the regular
spark.X properties, see code snippet here:

https://github.com/Limmen/incubator-livy/commit/aa06f896753ae9d6ce6aa66a80cca36a82f84202

If necessary I can open a JIRA and PR for this based on your feedback.

Best,

Kim
Mime
View raw message