kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1584942 - /kafka/site/081/configuration.html
Date Fri, 04 Apr 2014 23:51:20 GMT
Author: jkreps
Date: Fri Apr  4 23:51:20 2014
New Revision: 1584942

URL: http://svn.apache.org/r1584942
Log:
Misc. tweaks to the producer config documentation.


Modified:
    kafka/site/081/configuration.html

Modified: kafka/site/081/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1584942&r1=1584941&r2=1584942&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Fri Apr  4 23:51:20 2014
@@ -718,21 +718,21 @@ We are working on a replacement for our 
 	<th>Description</th>
 	</tr>
 	<tr>
-	<td>bootstrap.servers</td><td>list</td><td></td><td>high</td><td>A
list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
Data will be load balanced over all servers irrespective of which servers are specified here
for bootstrapping&mdash;this list only impacts the initial hosts used to discover the
full set of servers. This list should be in the form <code>host1:port1,host2:port2,...</code>.
Since these servers are just used for the initial connection to discover the full cluster
membership (which may change dynamically), this list need not contain the full set of servers
(you may want more than one, though, in case a server is down).</td></tr>
+	<td>bootstrap.servers</td><td>list</td><td></td><td>high</td><td>A
list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
Data will be load balanced over all servers irrespective of which servers are specified here
for bootstrapping&mdash;this list only impacts the initial hosts used to discover the
full set of servers. This list should be in the form <code>host1:port1,host2:port2,...</code>.
Since these servers are just used for the initial connection to discover the full cluster
membership (which may change dynamically), this list need not contain the full set of servers
(you may want more than one, though, in case a server is down). If no server in this list
is available sending data will fail until on becomes available.</td></tr>
 	<tr>
-	<td>acks</td><td>string</td><td>1</td><td>high</td><td>The
number of acknowledgments the producer requires before considering a request complete. This
controls the  durability of records that are sent. The following settings are commonly useful:
 <ul> <li><code>acks=0</code> If set to zero then the producer will
not wait for any acknowledgment from the server at all. The record will be immediately added
to the socket buffer and considered sent. No guarantee can be made that the server has received
the record in this case, and the <code>retries</code> configuration will not take
effect (as the client won't generally know of any failures). The offset given back for each
message will always be set to -1. <li><code>acks=1</code> This will mean
the leader will write the record to its local log but will respond without awaiting full acknowledgement
from all followers. In this case should the leader fail immediately after acknowledging the
record but before the followers have replicated i
 t then the record will be lost. <li><code>acks=all</code> This means the
leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees
that the record will not be lost as long as at least one in-sync replica remains alive. This
is the strongest available guarantee. <li>Other settings such as <code>acks=2</code>
are also possible, and will require the given number of acknowledgements but this is generally
less useful.</td></tr>
+	<td>acks</td><td>string</td><td>1</td><td>high</td><td>The
number of acknowledgments the producer requires the leader to have received before considering
a request complete. This controls the  durability of records that are sent. The following
settings are common:  <ul> <li><code>acks=0</code> If set to zero
then the producer will not wait for any acknowledgment from the server at all. The record
will be immediately added to the socket buffer and considered sent. No guarantee can be made
that the server has received the record in this case, and the <code>retries</code>
configuration will not take effect (as the client won't generally know of any failures). The
offset given back for each record will always be set to -1. <li><code>acks=1</code>
This will mean the leader will write the record to its local log but will respond without
awaiting full acknowledgement from all followers. In this case should the leader fail immediately
after acknowledging the record but before the followers
  have replicated it then the record will be lost. <li><code>acks=all</code>
This means the leader will wait for the full set of in-sync replicas to acknowledge the record.
This guarantees that the record will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee. <li>Other settings such as <code>acks=2</code>
are also possible, and will require the given number of acknowledgements but this is generally
less useful.</td></tr>
 	<tr>
 	<td>buffer.memory</td><td>long</td><td>33554432</td><td>high</td><td>The
total bytes of memory the producer can use to buffer records waiting to be sent to the server.
If records are sent faster than they can be delivered to the server the producer will either
block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>.
<p>This setting should correspond roughly to the total memory the producer will use,
but is not a hard bound since not all memory the producer uses is used for buffering. Some
additional memory will be used for compression (if compression is enabled) as well as for
maintaining in-flight requests.</td></tr>
 	<tr>
 	<td>compression.type</td><td>string</td><td>none</td><td>high</td><td>The
compression type for all data generated by the producer. The default is none (i.e. no compression).
Valid  values are <code>none</code>, <code>gzip</code>, or <code>snappy</code>.
Compression is of full batches of data,  so the efficacy of batching will also impact the
compression ratio (more batching means better compression).</td></tr>
 	<tr>
-	<td>retries</td><td>int</td><td>0</td><td>high</td><td>Setting
a value greater than zero will cause the client to resend any record whose send fails with
a potentially transient error. Note that this retry is no different than if the client resent
the message upon receiving the error. Allowing retries will potentially change the ordering
of messages because if two messages are sent to a single partition, and the first fails and
is retried but the second succeeds, then the second message may appear first.</td></tr>
+	<td>retries</td><td>int</td><td>0</td><td>high</td><td>Setting
a value greater than zero will cause the client to resend any record whose send fails with
a potentially transient error. Note that this retry is no different than if the client resent
the record upon receiving the error. Allowing retries will potentially change the ordering
of records because if two records are sent to a single partition, and the first fails and
is retried but the second succeeds, then the second record may appear first.</td></tr>
 	<tr>
-	<td>batch.size</td><td>int</td><td>16384</td><td>medium</td><td>The
producer will attempt to batch records together into fewer requests whenever multiple records
are being sent to the same partition. This helps performance on both the client and the server.
This configuration controls the default batch size in bytes. <p>No attempt will be made
to batch records larger than this size. <p>Requests sent to brokers will contain multiple
batches, one for each partition there is data for. <p>A small batch size will make batching
less common and may reduce throughput (a batch size of zero will disable batching entirely).
A very large batch size may use memory a bit more wastefully as we will always allocate a
buffer of the specified batch size in anticipation of additional messages.</td></tr>
+	<td>batch.size</td><td>int</td><td>16384</td><td>medium</td><td>The
producer will attempt to batch records together into fewer requests whenever multiple records
are being sent to the same partition. This helps performance on both the client and the server.
This configuration controls the default batch size in bytes. <p>No attempt will be made
to batch records larger than this size. <p>Requests sent to brokers will contain multiple
batches, one for each partition with data available to be sent. <p>A small batch size
will make batching less common and may reduce throughput (a batch size of zero will disable
batching entirely). A very large batch size may use memory a bit more wastefully as we will
always allocate a buffer of the specified batch size in anticipation of additional records.</td></tr>
 	<tr>
 	<td>client.id</td><td>string</td><td></td><td>medium</td><td>The
id string to pass to the server when making requests. The purpose of this is to be able to
track the source of requests beyond just ip/port by allowing a logical application name to
be included with the request. The application can set any string it wants as this has no functional
purpose other than in logging and metrics.</td></tr>
 	<tr>
-	<td>linger.ms</td><td>long</td><td>0</td><td>medium</td><td>The
producer groups together any records that arrive in between request sends. Normally this occurs
only under load when records arrive faster than they can be sent out. However in some circumstances
the client may want to reduce the number of requests even under moderate load. This setting
accomplishes this by adding a small amount of artificial delay&mdash;that is, rather than
immediately sending out a record the producer will wait for up to the given delay to allow
other records to be sent so that the sends can be batched together. This can be thought of
as analogous to Nagle's algorithm in TCP. This setting gives the upper bound on the delay
for batching: once we get <code>batch.size</code> worth of records for a partition
it will be sent immediately regardless of this setting, however if we have fewer than this
many bytes accumulated for this partition we will 'linger' for the specified time waiting
for more records t
 o show up. This setting defaults to 0 (i.e. no delay).</td></tr>
+	<td>linger.ms</td><td>long</td><td>0</td><td>medium</td><td>The
producer groups together any records that arrive in between request transmissions into a single
batched request. Normally this occurs only under load when records arrive faster than they
can be sent out. However in some circumstances the client may want to reduce the number of
requests even under moderate load. This setting accomplishes this by adding a small amount
of artificial delay&mdash;that is, rather than immediately sending out a record the producer
will wait for up to the given delay to allow other records to be sent so that the sends can
be batched together. This can be thought of as analogous to Nagle's algorithm in TCP. This
setting gives the upper bound on the delay for batching: once we get <code>batch.size</code>
worth of records for a partition it will be sent immediately regardless of this setting, however
if we have fewer than this many bytes accumulated for this partition we will 'linger' for
the spe
 cified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay).
Setting <code>linger.ms=5</code>, for example, would have the effect of reducing
the number of requests sent but would add up to 5ms of latency to records sent in the absense
of load.</td></tr>
 	<tr>
 	<td>max.request.size</td><td>int</td><td>1048576</td><td>medium</td><td>The
maximum size of a request. This is also effectively a cap on the maximum record size. Note
that the server has its own cap on record size which may be different from this. This setting
will limit the number of record batches the producer will send in a single request to avoid
sending huge requests.</td></tr>
 	<tr>
@@ -740,17 +740,15 @@ We are working on a replacement for our 
 	<tr>
 	<td>send.buffer.bytes</td><td>int</td><td>131072</td><td>medium</td><td>The
size of the TCP send buffer to use when sending data</td></tr>
 	<tr>
-	<td>timeout.ms</td><td>int</td><td>30000</td><td>medium</td><td>The
configuration controls the maximum amount of time the server will wait for acknowledgments
from followers to meet the acknowledgment requirements the producer has specified with the
<code>acks</code> configuration. If the requested number of acknowledgments are
not met when the timeout ellipses an error will be returned. This timeout is measured on the
server side and does not include the network latency of the request.</td></tr>
+	<td>timeout.ms</td><td>int</td><td>30000</td><td>medium</td><td>The
configuration controls the maximum amount of time the server will wait for acknowledgments
from followers to meet the acknowledgment requirements the producer has specified with the
<code>acks</code> configuration. If the requested number of acknowledgments are
not met when the timeout elapses an error will be returned. This timeout is measured on the
server side and does not include the network latency of the request.</td></tr>
 	<tr>
-	<td>block.on.buffer.full</td><td>boolean</td><td>true</td><td>low</td><td>When
our memory buffer is exhausted we must either stop accepting new records (block) or throw
errors. By default this setting is true and we block, however in some scenarios blocking is
not desirable and it is better to immediately give an error. Setting this to <code>false</code>
will accomplish that.</td></tr>
-	<tr>
-	<td>metadata.fetch.backoff.ms</td><td>long</td><td>50</td><td>low</td><td>The
minimum amount of time between metadata refreshes. The client refreshes metadata whenever
it realizes its internal metadata is out of sync with the actual leadership of partitions.
This configuration specifies a backoff to prevent metadata refreshes from happening too frequently.</td></tr>
+	<td>block.on.buffer.full</td><td>boolean</td><td>true</td><td>low</td><td>When
our memory buffer is exhausted we must either stop accepting new records (block) or throw
errors. By default this setting is true and we block, however in some scenarios blocking is
not desirable and it is better to immediately give an error. Setting this to <code>false</code>
will accomplish that: the producer will throw a BufferExhaustedException if a recrord is sent
and the buffer space is full.</td></tr>
 	<tr>
 	<td>metadata.fetch.timeout.ms</td><td>long</td><td>60000</td><td>low</td><td>The
first time data is sent to a topic we must fetch metadata about that topic to know which servers
host the topic's partitions. This configuration controls the maximum amount of time we will
block waiting for the metadata fetch to succeed before throwing an exception back to the client.</td></tr>
 	<tr>
-	<td>metadata.max.age.ms</td><td>long</td><td>300000</td><td>low</td><td>The
period of time in milliseconds after which we force a refresh of metadata even if we haven't
seen any leadership changes to proactively discover any new brokers or partitions.</td></tr>
+	<td>metadata.max.age.ms</td><td>long</td><td>300000</td><td>low</td><td>The
period of time in milliseconds after which we force a refresh of metadata even if we haven't
seen any  partition leadership changes to proactively discover any new brokers or partitions.</td></tr>
 	<tr>
-	<td>metric.reporters</td><td>list</td><td>[]</td><td>low</td><td>A
list of classes to use as metrics reporters. Implementing the <code>MetricReporter</code>
interface allows plugging in classes that will be notified of new metric creation.</td></tr>
+	<td>metric.reporters</td><td>list</td><td>[]</td><td>low</td><td>A
list of classes to use as metrics reporters. Implementing the <code>MetricReporter</code>
interface allows plugging in classes that will be notified of new metric creation. The JmxReporter
is always included to register JMX statistics.</td></tr>
 	<tr>
 	<td>metrics.num.samples</td><td>int</td><td>2</td><td>low</td><td>The
number of samples maintained to compute metrics.</td></tr>
 	<tr>



Mime
View raw message