Hey all,

We're big users of Flume, but now we're looking to integrate a workflow engine to manage dependencies between data imports, scheduled reports, and intermittent data generation for a Hadoop-based data warehouse & analytics system.

I thought I'd reach out to get the community's opinions:

- Do you use Yahoo (Apache) Oozie 
* What do you think of it? (pros/cons) 
* Would you recommend it?

- Do you use something else? 
* What do you think of it? (pros/cons) 
* Would you recommend it?

Any suggestions/comments greatly appreciated.

I'm reaching out to the flume list, because I'd be especially interested to hear about any bespoke flume integrations the community has built (eg - checking that data from all machines is available before starting a job).

Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma | 4sq