incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Goers <>
Subject Re: Impala commit policy
Date Thu, 03 Dec 2015 18:41:09 GMT
Virtually any project you look at is going to have portions that are fairly complex and portions
that are pretty straightforward.  In my opinion the correct approach is to identify the parts
of the code that a) seem to be most susceptible to bugs, b) are hard to understand well, or
c) where simple changes can have huge impacts on performance and then use RTC for those areas.
In addition, you might want to require RTC for significant new feature additions. Although
I expect every project to have code that falls into these categories the ratio of that code
will vary from project to project. As an example, I can honestly say that with Flume the File
Channel should always be RTC. But almost everything else could be done more effectively with


> On Dec 3, 2015, at 11:20 AM, Henry Robinson <> wrote:
> I'm happy to field technical questions about Impala. You seem to be
> conflating 'complexity' with 'severity of potential bugs' - I see the two
> as separate.
> Under the 'severity' heading, Impala both writes and reads data from a
> variety of data stores. So if there's a bug in Impala's write path, data
> can be lost. But because Impala also returns results to client
> applications, there's a significant risk of business impact if the *wrong*
> results are returned. I know, because I have dealt with situations where
> this has happened, and no-one is very happy about it. Our customers
> typically run business-critical analytic workloads through Impala; if it
> stops working correctly that's usually a big problem.
> As far as 'complexity' goes, I make no comparative claims about Impala's
> complexity vs any other project. But to give some indication of the moving
> parts inside Impala: there's a component which compiles highly optimised
> versions of each query operator at run time, there's a query planner which
> parses and plans a large portion of the SQL standard, there is the added
> complexity of being a 'massively' (with many deployments in the high 100s
> of nodes) distributed system with the added coordination and consistency
> guarantees that brings to it, and there is also the added complexity of
> running highly concurrent workloads in a single process, with all the
> concurrency headaches etc. that can bring. That's not to mention
> implementations of 'standard' SQL operators like joins, sorts and so on
> that are still the subject of active research in academia and industry.
> All this is in the context of Impala's main differentiator, which is that
> it is amongst the very fastest SQL engine for data stored in HDFS and
> friends. That means that small changes can have large unexpected
> consequences, since efficiency is a subtle and capricious thing. It has
> always, therefore, helped us to have more than one set of eyes on every
> change in the past, to ensure that the probability of the introduction of
> subtle performance and functional regressions is reduced. Automated testing
> plays a huge role here as well, but for us it's been most effective in
> concert with code review.
> (There are other reasons I vastly prefer RTC as well, but I'm addressing
> your specific points here so as not to kick off another RTCvsCTR thread :)).
>> In this case, the RTC seems to stem from the choice of Gerrit, rather than
>> some innate complexity.
> Gerrit does not mandate RTC, since you can just push to refs/heads/<branch>
> and bypass the review creation step.
> Historically, the Impala team at Cloudera has used at least three different
> review tools (including Review Board, which is used elsewhere at the ASF).
> The choice of review tool stems completely from pragmatism - we really did
> not like Review Board, and briefly used Rietveld before moving to Gerrit
> which we have preferred. At every step, we used RTC.
> Henry
>> I *do* note that possibly committers could choose to commit directly, or
>> choose to use Gerrit when they are unsure. Will the (P)PMC allow those
>> direct commits? Or mandate Gerrit for every commit?
>> Cheers,
>> -g

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message