flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Klempa <michal.kle...@gmail.com>
Subject HiveSink not multithreaded?
Date Thu, 26 Jan 2017 07:17:20 GMT
I was working a lot with HiveSink to put the data into Hive, not only
I discovered this bug
but also I have found that HiveSink differs from HDFSEventSink in the
way the thread pool for
delayed operations is created.

See this line in HDFSEventSink:
it uses argument threadsPoolSize which is by default 10
but can be configured as hdfs.threadPoolSize in flume config

To the contrary, HiveSink creates the thread pool this way:
1 thread with note // call timeout pool needs only 1 thd as sink is
effectively single threaded

Why is the Hive sink effectively single threaded? There is no notion
of this in documentation (FlumeUserGuide) and how should I handle this
situation? For performance reasons, i would like to have multithreaded
writeout into Hive, do I have to Multiplex/Round-robin fan-out and
configure multiple HiveSinks? Probably I have to, but it is ugly.

What is the problem that the HiveSInk is single threaded?

Thanks, Michal

View raw message