flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hansen <dsche...@gmail.com>
Subject Re: Flumes DSL syntax
Date Tue, 13 Sep 2011 16:00:20 GMT
I really haven't spent enough time looking through the object model,
but in terms of usage, it seems like it would be nice if each
source/sink/decorator had a consistent way of documenting it's
intended usage in the code.  While it would be easy to propose adding
a getUsage() method to the sink interface I think that would be
entirely inappropriate (the usage doc is more an aspect of command
line configurability than the function of the sink itself).  I'm not
sure if there is an appropriate hook on something like a
"configurable" interface that the source/sink/decos all happen to be
using, but if so, that would be a nice place to put a consistent usage

For one thing, it would be nice if you could query the shell for usage
information rather than having to cause an error in the hopes of
getting it -- imagine being able to call

flume shell -e "usage collectorSink"

for any function and having it spit back the appropriate usage information.

Of course to do something like that you'd probably need a generic
FunctionBuilder which was across the board to build functions, sinks,
sources, etc (when they came from a function rather than some other
syntax).  In that case I could picture a config file that mapped
function names to classes -- that same config file would be idea for
putting usage doc because you'd have a consolidated place containing
doc for all the functions rather than having to look in each class.
Then as long as the FuntionBuilder passed that standardized usage
information into the the function that it built, the function could
spit out the consistent usage information as part of an exception
message when the usage specs were violated -- ideally with a nice
additional message describing what the particular violation was.

As for what notation to use in describing the usage spec...  I imagine
square brackets would be fine as long as they were accompanied by some
examples demonstrating that brackets are not actually part of the

For instance:
Usage: functionName[(arg1,arg2)]
arg1 optionally specifies the number of times you want to....
arg2 optionally specifies the type of ....
functionName    //call the function without any arguments
functionName(1)    //call function with arg1 as 1 but allowing arg2 to default
functionName(arg2="bla")  // specify arg2, but allow arg1 to default

I realize this is kind of an ambitious proposal, but it sure would be nice =)

On Sat, Sep 10, 2011 at 3:22 PM, Jonathan Hsieh <jon@cloudera.com> wrote:
> Jeff,
> Thanks for digging into this.  I'm at the point where this syntax "feels
> natural" to me so I'd love you hear you opinion on how to improve the
> documentation to make it easier to learn and understand.
> I could see how we could improve in the manual be being more explicit than
> the terse syntax info.   What do you think that make sense for he usage
> warnings?
> Thanks,
> Jon.
> On Thu, Sep 8, 2011 at 2:14 PM, Jeff Hansen <dscheffy@gmail.com> wrote:
>> I think I finally get the syntax for the most part -- but it took
>> finding and skimming through a book on ANTLR to figure it out.  The
>> pertinent information is in FlumeDeploy.g
>> It seems to me that the syntax is relatively straight forward once you
>> get that sinks, sources and decorators are all just functions and they
>> all follow the same lexical pattern as functions.  Unfortunately the
>> "function" syntax is the one all important pattern that's missing from
>> the documentation (or if it's there it's hidden behind a jedi master
>> saying "these are not the specs you are looking for").
>> The key is explaining that function calls follow a syntax slightly
>> more like that of Ruby or Python than that of Java in that the
>> parentheses for arguments are optional -- except that they aren't
>> exactly optional because they're required if you actually want to pass
>> any arguments to the function.
>> Then it's just a matter of explaining that arguments are themselves
>> either functions or literals (string, numeric, boolean).  Further,
>> required arguments always come first and they may be followed by
>> optional arguments which (much like in ruby and python) can be passed
>> in as named arguments where argName=argValue -- this allows you to
>> skip over arguments you don't want to override if they happen to come
>> before arguments you do want to override.
>> Personally I'd avoid explaining any of this optionality with square
>> brackets, because square brackets are significant characters that show
>> up elsewhere (fan-out sources). In some cases it's relatively clear to
>> me that brackets indicate an argument is optional -- for instance
>> functionName(arg1[,arg2]*) is clear to me, but "Usage:
>> functionName[(arg1,arg2)]" in an error message telling me I've done
>> something wrong just makes me think, crap, did I need to put brackets
>> in there?
>> Does anybody else think that kind of explanation would have been
>> helpful when you were starting out?
>> Thanks,
>> Jeff
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com

View raw message