Hey Guys,

I've made a decent amount of progress, and now have the settings correct.  For completeness, the settings look like this:
agent.sinks.s3Sink.type = hdfs
agent.sinks.s3Sink.hdfs.path = s3://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@BUCKET-NAME/
You can see the full setup at this gist: https://gist.github.com/crowdmatt/5256881


However, I've run into the following problem: 


2013-03-29 19:05:28,954 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:460)] process failed
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/FlumeData.1364583927762.tmp' on Host 'mybucket.s3.amazonaws.com' @ 'Fri, 29 Mar 2013 19:05:28 GMT' -- ResponseCode: 404, ResponseStatus: Not Found, RequestId: 00864FE1DCD5AD95, HostId: 68AuSUe/XsP9zUiwe4yqhhDjETjVEnXVuTdZjYKQfj6VBKyACLH++MD1i8xgrEE4
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:122)


Does anyone have any pointers on how I can start debugging?

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.
Mobile: (650) 888-5962

Need to schedule a meeting?  Invite me via Google Calendar!  matt@crowdmob.com


On Fri, Mar 29, 2013 at 8:47 AM, Matthew Moore <matt@crowdmob.com> wrote:
Hey,

Thanks for the links to the Jiras.  It seems like someone implemented an S3BufferedWriter which might be helpful in the future.

However, I'm still not sure what to set the configuration (flume.conf) to use s3 as a sink?  Has anyone done that?

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.

Need to schedule a meeting?  Invite me via Google Calendar!  matt@crowdmob.com


On Fri, Mar 29, 2013 at 7:49 AM, Brock Noland <brock@cloudera.com> wrote:
Sorry, I don't know much about this, but here are two relevant JIRA's:

https://issues.apache.org/jira/browse/FLUME-1228


On Fri, Mar 29, 2013 at 9:44 AM, Matthew Moore <matt@crowdmob.com> wrote:
Hey there,

I know this is a really newbish question, but I'm hoping to get a little assistance here so I'm not stuck guess-and-checking.

I'm trying to figure out how to configure FlumeNG (1.3.1), but I couldn't figure out how to setup the hdfs sink to use the s3 implementations.

I'm keeping track of my progress on this gist I made: https://gist.github.com/crowdmatt/5256881

From what I've gathered, I should be using the hdfs type, which I'm setting up as such:

agent.sinks = s3Sink
agent.sinks.s3Sink.type = hdfs
agent.sinks.s3Sink.channel = recoverableMemoryChannel

... but that's where I end up hitting my head against the wall.  I know I should be specifying my s3 access key, secret, and bucket in this format: s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/  

However, I don't know where to specify that, or what dot notation to use.

Can anyone point me in the right direction?

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.

Need to schedule a meeting?  Invite me via Google Calendar!  matt@crowdmob.com



--
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org