kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1339417 - /incubator/kafka/site/projects.html
Date Wed, 16 May 2012 23:05:02 GMT
Author: jkreps
Date: Wed May 16 23:05:02 2012
New Revision: 1339417

URL: http://svn.apache.org/viewvc?rev=1339417&view=rev
Log:
Update project page.


Modified:
    incubator/kafka/site/projects.html

Modified: incubator/kafka/site/projects.html
URL: http://svn.apache.org/viewvc/incubator/kafka/site/projects.html?rev=1339417&r1=1339416&r2=1339417&view=diff
==============================================================================
--- incubator/kafka/site/projects.html (original)
+++ incubator/kafka/site/projects.html Wed May 16 23:05:02 2012
@@ -1,34 +1,33 @@
 <!--#include virtual="includes/header.html" -->
 
-<h1>Current Work</h1>
-<p>
-  Here is a <a href="https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+12311720+AND+labels+%3D+newbie">list
of JIRAs</a> you can work on to contribute some quick and easy patches in Kafka.
-</p>	
-
 <p>
-  Below is a list of major projects we know people are currently pursuing. If you have thoughts
on these or want to help, please <a href="mailto: kafka-dev@incubator.apache.org">let
us know</a>.
+  We try to flag projects that are good for people getting started with the code base, you
can find the list of projects <a href="https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=project+%3D+12311720+AND+labels+%3D+newbie">here</a>.
 </p>
 
-<h3>Improved Stream Processing Libraries</h3>
+<h1>Current Work</h1>
 
 <p>
-We recently added the rich producer library that allows partitioned message production. This
combined with the partition affinity of the consumers, gives the ability to do partitioned
stream processing. One thing that is not very well developed is the patterns and libraries
to support this. What we have in mind is a scala DSL to make it easy to group, aggregate,
and otherwise transforms these infinite streams.
+Below is a list of major projects we know people are currently pursuing. If you have thoughts
on these or want to help, please <a href="mailto: kafka-dev@incubator.apache.org">let
us know</a>.
 </p>
 
 <h3>Replication</h3>
 
 <p>
-Messages are currently written to a single broker with no replication between brokers. We
would like to provide replication between brokers and expose options to the producer to block
until a configurable number of replicas have acknowledged the message to allow the client
to control the fault-tolerance semantics.
+Replication is currently the major focus for a number of us. This will turn Kafka into a
fully replicated message log.
+</p>
+<p>
+What is replication? Messages are currently written to a single broker with no replication
between brokers. We would like to provide replication between brokers and expose options to
the producer to block until a configurable number of replicas have acknowledged the message
to allow the client to control the fault-tolerance semantics.
 </p>
-
-<h3>Compression</h3>
-
 <p>
-We have a patch that provides end-to-end message set compression from producer to broker
and broker to consumer with no need for intervening decompression. We hope to add this feature
soon.
+You can see more details on this plan <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication">here</a>.

 </p>
 
 <h1>Project Ideas</h1>
 
+<h3>Improved Stream Processing Libraries</h3>
+
+Kafka supports partitioning data by key and doing distributed stream consumption and publication.
It would be nice to have a small library for common processing operations like joins, filtering,
grouping, etc.
+
 <p>
 Below is a list of projects which would be great to have but haven't yet been started. Ping
the <a href="http://groups.google.com/group/kafka-dev">mailing list</a> if you
are interested in working on any of these.
 </p>
@@ -44,12 +43,6 @@ We offer a JVM-based client for producti
 We have an Hadoop InputFormat and OutputFormat that were contributed and are in use at LinkedIn.
This code is in Java, though, which means it doesn't quite fit in well with the project. It
would be good to convert this code to Scala to keep things consistent.
 </p>
 
-<h3>Long Poll</h3>
-
-<p>
-The consumer currently uses a simple polling mechanism. The fetch request always returns
immediately, yielding no data if no new messages have arrived, and using a simple backoff
mechanism when there are no new messages to avoid to frequent requests to the broker. This
is efficient enough, but the lowest possible latency of the consumer is given by the polling
frequency. It would be nice to enhance the consumer API to allow an option in the fetch request
to have the server block for a given period of time waiting for data to be available rather
than immediately returning and then waiting to poll again. This would provide somewhat improved
latency in the low-throughput case where the consumer is often waiting for a message to arrive.
-</p>
-
 <h3>Syslogd Producer</h3>
 
 <p>
@@ -72,44 +65,10 @@ In this model, partitions are naturally 
 Currently consumer offsets are persisted in Zookeeper which works well for many use cases.
There is no inherent reason the offsets need to be stored here, however. We should expose
a pluggable interface to allow alternate storage mechanisms.
 </p>
 
-<h1>Recently Completed Projects</h1>
+<h3>Restful Proxy</h3>
 
-The following are some recently completed projects from this list.
-
-<h3>Hadoop Consumer</h3>
-<p>
-Provide an InputFormat for Hadoop to allow running Map/Reduce jobs on top of Hadoop data.
-</p>
-
-<h3>Hadoop Producer</h3>
-<p>
-Provide an OutputFormat for Hadoop to allow Map/Reduce jobs to publish data to Kafka.
-</p>
-
-<h3>Console Consumer</h3>
-<p>
-The interaction with zookeeper and complexity of the elastic load balancing of consumers
makes implementing the equivalent of the rich consumer interface outside of the JVM somewhat
difficult (implementing the low-level fetch api is quite easy). A simple approach to this
problem could work similar to Hadoop Streaming and simply provide a consumer which dumps to
standard output in some user-controllable format. This can be piped to another program in
any language which simply reads from standard input to receive the data from the stream.
-</p>
-
-<h3>Rich Producer Interface</h3>
 <p>
-The current producer connects to a single broker and publishes all data there. This feature
would add a higher-level api would allow a cluster aware producer which would semantically
map messages to kafka nodes and partitions. This allows partitioning the stream of messages
with some semantic partition function based on some key in the message to spread them over
broker machines&mdash;e.g. to ensure that all messages for a particular user go to a particular
partition and hence appear in the same stream for the same consumer thread.
+It would be great to have a REST proxy for KAFKA to help integration with languages that
don't have first-class clients. It also makes it easier for web applications to produce data
to Kafka.
 </p>
 
-<h1>Project ideas for Scalathon</h1>
-
-The following are some smaller features that you can hack on and play with Kafka -
-
-<h3>Restful producer API</h3>
-We need to make the Kafka server support RESTful producer requests. This allows Kafka to
be used in any programming language without implementing the wire protocol in each language.
It also makes it easier for web applications to produce data to Kafka. Please refer to the
<a href="http://linkedin.jira.com/browse/KAFKA-71">JIRA</a> to contribute. 
-
-<h3>Pluggable decoder for the consumer</h3>
-Since 0.6, the <a href="http://sna-projects.com/kafka/javadoc/current/">producer</a>
allows a user to plug in an Encoder that converts data of type T to a Kafka message. We need
to do the same thing on the consumer side, by allowing the user to plug in a Decoder that
converts a message into an object of type T. Please refer to the <a href="http://linkedin.jira.com/browse/KAFKA-70">JIRA</a>
to contribute.
-
-<h3>Producer ACK</h3>
-Currently, the <a href="http://sna-projects.com/kafka/javadoc/current/">producer</a>
does not wait for an acknowledgement (ACK) from the Kafka server. The producer just sends
the data across and the server appends it to the appropriate log for a topic, but doesn't
send an ACK back to the producer. Ideally, after handling the producer's request and writing
the data to the log, the server should send back and ACK to the producer. And the producer
should proceed sending the next request only after it receives the ACK from the server. Please
refer to the <a href="http://linkedin.jira.com/browse/KAFKA-16">JIRA</a> to contribute.
-
-<h3>Size based retention policy</h3>
-The kafka server garbage collects logs according to a time-based retention policy (log.retention.hours).
Ideally, the server should also support a size based retention policy (log.retention.size)
to prevent any one topic from occupying too much disk space. Please refer to the <a href="http://linkedin.jira.com/browse/KAFKA-3">JIRA</a>
to contribute.
-
 <!--#include virtual="includes/footer.html" -->



Mime
View raw message