flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Philippe Caruana ...@target2sell.com>
Subject Re: support for Google Storage ?
Date Mon, 01 Dec 2014 14:35:02 GMT
Hi,

I managed to write to GS from flume [1], but this is not working 100% yet:
- files are created in the expected directories, but are empty
- flume throws a java.lang.OutOfMemoryError: Java heap space:

java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
    at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
    at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
    at
org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)

(complete stack trace here: http://pastebin.com/i5iSgCM3)

Has anyone already experienced this ?
Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
Where should I look to find out what's wrong ?

My configuration looks like this:
a1.sinks.hdfs_sink.hdfs.path =
gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d

I am running flume from Docker.

[1]
http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit

Thanks.


Le 26/11/2014 17:05, Jean-Philippe Caruana a écrit :
> Hi,
>
> I am a total newbee about hadoop, so sorry if my questions sound
> stupid (please give me pointers).
>
> I would like to use flume to send data to hdfs on google cloud :
> - does GS (google storage) support exists ? It would be great to use a
> path like this gs://some_path
> - where does the flume agent needs to be ? when I see 
> hdfs://some_path/ I wonder why there is no server address in the path
>
> In fact I looking for feedback about sending data to a google cloud
> hadoop cluster from my own (on premises) servers.
>
> Thanks
> -- 
> Jean-Philippe Caruana 
> http://www.barreverte.fr

-- 
Jean-Philippe Caruana 
http://www.barreverte.fr


Mime
View raw message