kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1158009 - in /incubator/kafka/site: coding-guide.html contributing.html includes/header.html styles.css
Date Mon, 15 Aug 2011 21:03:48 GMT
Author: jkreps
Date: Mon Aug 15 21:03:48 2011
New Revision: 1158009

URL: http://svn.apache.org/viewvc?rev=1158009&view=rev
KAFKA-96 Add coding style guide and contributing page to website.


Added: incubator/kafka/site/coding-guide.html
URL: http://svn.apache.org/viewvc/incubator/kafka/site/coding-guide.html?rev=1158009&view=auto
--- incubator/kafka/site/coding-guide.html (added)
+++ incubator/kafka/site/coding-guide.html Mon Aug 15 21:03:48 2011
@@ -0,0 +1,101 @@
+<!--#include virtual="includes/header.html" -->
+<h1>Coding Guidelines</h1>
+These guidelines are meant to encourage consistency and best practices amongst people working
on the Kafka code base. They should be observed unless there is a compelling reason to ignore
+<h2>Basic Stuff</h2>
+<li>Avoid cryptic abbreviations. Single letter variable names are fine in very short
methods with few variables, otherwise make them informative.</li>
+<li>Clear code is preferable to comments. When possible make your naming so good you
don't need comments. When that isn't possible comments should be thought of as mandatory,
write them to be <i>read</i>.</li>
+<li>Logging, configuration, and public APIs are our "UI". Make them pretty, consistent,
and usable.</li>
+<li>There is not a maximum line length (certainly not 80 characters, we don't work
on punch cards any more), but be reasonable.</li>
+<li>Don't be sloppy. Don't check in commented out code: we use version control, it
is still there in the history. Don't leave TODOs in the code or FIXMEs if you can help it.
Don't leave println statements in the code. Hopefully this is all obvious.</li>
+<li>We want people to use our stuff, which means we need clear, correct documentation.
User documentation should be considered a part of any user-facing the feature, just like unit
tests or performance results.</li>
+<li>Don't duplicate code (duh).</li>
+<li>Kafka is system software, and certain things are appropriate in system software
that are not appropriate elsewhere. Sockets, bytes, concurrency, and distribution are our
core competency which means we will have a more "from scratch" implementation of some of these
things then would be appropriate for software elsewhere in the stack. This is because we need
to be exceptionally good at these things. This does not excuse fiddly low-level code, but
it does excuse spending a little extra time to make sure that our filesystem structures, networking
code, threading model, are all done perfectly right for our application rather than just trying
to glue together ill-fitting off-the-shelf pieces (well-fitting off-the-shelf pieces are great
+<li>Scala is a very flexible language. Use restraint. Magic cryptic one-liners do not
impress us, readability impresses us.</li>
+<li>Use <code>val</code>s when possible.</li>
+<li>Use private when possible for member variables.</li>
+<li>Method and member variable names should be in camel case with an initial lower
case character like <code>aMethodName</code>.</li>
+<li>Constants should be camel case with an initial capital <code>LikeThis</code>
not <code>LIKE_THIS</code>.</li>
+<li>Prefer a single top-level class per file for ease of finding things.</li>
+<li>Do not use semi-colons unless required.</li>
+<li>Avoid getters and setters.</li>
+<li>Perfer Option to null in scala APIs.</li>
+<li>Use named arguments when passing in literal values if the meaning is at all unclear,
for example instead of <code>Utils.delete(true)</code> prefer <code>Utils.delete(recursive=true)</code>.
+<li>Indentation is 2 spaces and never tabs. One could argue the right amount of indentation,
but 2 seems to be standard for scala and consistency is best here since there is clearly no
"right" way.</li>
+<li>Include the optional paranthesis on a no-arg method only if the method has a side-effect,
otherwise omit them. For example <code>fileChannel.force()</code> and <code>fileChannel.size</code>.
This helps emphasize that you are calling the method for the side effect, which is changing
some state, not just getting the return value.</li>
+<li>Logging is one third of our "UI" and it should be taken seriously. Please take
the time to assess the logs when making a change to ensure that the important things are getting
logged and there is no junk there.</li>
+<li>Logging statements should be complete sentences with proper capitalization that
are written to be read by a person not necessarily familiar with the source code. It is fine
to put in hacky little logging statements when debugging, but either clean them up or remove
them before checking in. So logging something like "INFO: entering SyncProducer send()" is
not appropriate.</li>
+<li>Logging should not mention class names or internal variables.</li>
+<li>There are four levels of logging <code>TRACE</code>, <code>DEBUG</code>,
<code>INFO</code>, <code>WARN</code>, <code>ERROR</code>,
and <code>FATAL</code>, they should be used as follows.
+	<ul>
+    <li><code>INFO</code> is the level you should assume the software will
be run in. INFO messages are things which are not bad but which the user will definitely want
to know about every time they occur.
+    <li><code>TRACE</code> and <code>DEBUG</code> are both
things you turn on when something is wrong and you want to figure out what is going on. <code>DEBUG</code>
should not be so fine grained that it will seriously effect the performance of the server.
<code>TRACE</code> can be anything. Both <code>DEBUG</code> and <code>TRACE</code>
statements should be wrapped in an <code>if(logger.isDebugEnabled)</code> check
to avoid pasting together big strings all the time.
+    <li><code>WARN</code> and <code>ERROR</code> indicate something
that is bad. Use <code>WARN</code> if you aren't totally sure it is bad, and <code>ERROR</code>
if you are.
+    <li>Use <code>FATAL</code> only right before calling <code>System.exit()</code>.
+	</ul>
+<li>Monitoring is the second third of our "UI" and it should also be taken seriously.</li>
+<li>We use JMX for monitoring.</li>
+<li>Any new features should come with appropriate monitoring to know the feature is
working correctly. This is at least as important as unit tests as it verifies production.</li>
+<h2>Unit Tests</h2>
+<li>New patches should come with unit tests that verify the functionality being added.</li>
+<li>Unit tests are first rate code, and should be treated like it. They should not
contain code duplication, cryptic hackery, or anything like that.</li>
+<li>Be aware of the methods in <code>kafka.utils.TestUtils</code>, they
make a lot of the below things easier to get right.</li>
+<li>Unit tests should test the least amount of code possible, don't start the whole
server unless there is no other way to test a single class or small group of classes in isolation.</li>
+<li>Tests should not depend on any external resources, they need to set up and tear
down their own stuff. This means if you want zookeeper it needs to be started and stopped,
you can't depend on it already being there. Likewise if you need a file with some data in
it, you need to write it in the beginning of the test and delete it (pass or fail).</li>
+<li>It is okay to use the filesystem and network in tests since that is our business
but you need to clean up after yourself. There are helpers for this in <code>TestUtils</code>.</li>
+<li>Do not use sleep or other timing assumptions in tests, it is always, always, always
wrong and will fail intermittently on any test server with other things going on that causes
delays. Write tests in such a way that they are not timing dependent. Seriously. One thing
that will help this is to never directly use the system clock in code (i.e. <code>System.currentTimeMillis</code>)
but instead to use the <code>kafka.utils.Time</code>. This is a trait that has
a mock implementation allowing you to programmatically and deterministically cause the passage
of time when you inject this mock clock instead of the system clock.</li>
+<li>It must be possible to run the tests in parallel, without having them collide.
This is a practical thing to allow multiple branches to CI on a single CI server. This means
you can't hard code directories or ports or things like that in tests because two instances
will step on each other. Again <code>TestUtils</code> has helpers for this stuff
(e.g. <code>TestUtils.choosePort</code> will find a free port for you).</li>
+<li>Configuration is the final third of our "UI".</li>
+<li>Names should be thought through from the point of view of the person using the
config, but often programmers choose configuration names that make sense for someone reading
the code.</li>
+<li>Often the value that makes most sense in configuration is <i>not</i>
the one most useful to program with. For example, let's say you want to throttle I/O to avoid
using up all the I/O bandwidth. The easiest thing to implement is to give a "sleep time" configuration
that let's the program sleep after doing I/O to throttle down its rate. But notice how hard
it is to correctly use this configuration parameter, the user has to figure out the rate of
I/O on the machine, and then do a bunch of arithmetic to calculate the right sleep time to
give the desired rate of I/O on the system. It is much, much, much better to just have the
user configure the maximum I/O rate they want to allow (say 5MB/sec) and then calculate the
appropriate sleep time from that and the actual I/O rate. Another way to say this is that
configuration should always be in terms of the quantity that the user knows, not the quantity
you want to use.</li>
+<li>Configuration is the answer to problems we can't solve up front for some reason--if
there is a way to just choose a best value do that instead.</li>
+<li>Configuration should come from the instance-level properties file. No additional
sources of config (environment variables, system properties, etc) should be added as these
usually inhibit running multiple instances of a broker on one machine.</li>
+<li>Encapsulate synchronization. That is, locks should be private member variables
within a class and only one class or method should need to be examined to verify the correctness
of the synchronization strategy.</li>
+<li>Annotate things as <code>@threadsafe</code> when they are supposed
to be and <code>@notthreadsafe</code> when they aren't to help track this stuff.</li>
+<li>There are a number of gotchas with threads and threadpools: is the daemon flag
set appropriately for your threads? are your threads being named in a way that will distinguish
their purpose in a thread dump? What happens when the number of queued tasks hits the limit
(do you drop stuff? do you block?).</li>
+<li>Prefer the java.util.concurrent packages to either low-level wait-notify, custom
locking/synchronization, or higher level scala-specific primitives. The util.concurrent stuff
is well thought out and actually works correctly. There is a generally feeling that threads
and locking are not going to be the concurrency primitives of the future because of a variety
of well-known weaknesses they have. This is probably true, but they have the advantage of
actually being mature enough to use for high-performance software <i>right now</i>;
their well-known deficiencies are easily worked around by equally well known best-practices.
So avoid actors, software transactional memory, tuple spaces, or anything else not written
by Doug Lea and used by at least a million other productions systems. :-)</li>
+<h2>Backwards Compatibility</h2>
+<li>Our policy is that we should always support backwards compatibility for one release
to enable no-downtime upgrades. This means the server MUST be able to support requests from
both old and new clients simultaneously. The code handling the "old" path can be removed in
the next release. A typical upgrade sequence would be server, consumers, clients.</li>
+<li>There are three things which require this binary compatibility: request objects,
persistent data structure (messages and message sets), and zookeeper structures and protocols.
The message binary structure has a "magic" byte to allow it to be evolved, this number should
be incremented when the format is changed and the number can be checked to apply the right
logic and fill in defaults appropriately. Network requests have a request id which serve a
similar purpose, any change to a request object must be accompanied by a change in the request
id. Any change here should be accompanied by compatibility tests that save requests or messages
in the old format as a binary file which is tested for compatibility with the new code.</li>
+<h2>Client Code</h2>
+<p>There are a few things that need to be considered in client code that are not a
major concern on the server side.</p>
+	<li>Libraries needed by the client should be avoided whenever possible. Clients are
run in someone else's code and it is very possible that they may have the same library we
have, but a different and incompatible version. This will mean they can't use our client.
For this reason the client should not use any libraries that are not strictly necessary.</li>
+<li>We should attempt to maintain API compatibility when possible, though at this point
in the project's lifecycle it is more important to make things good rather avoid breakage.</li>
+<!--#include virtual="includes/footer.html" -->
\ No newline at end of file

Added: incubator/kafka/site/contributing.html
URL: http://svn.apache.org/viewvc/incubator/kafka/site/contributing.html?rev=1158009&view=auto
--- incubator/kafka/site/contributing.html (added)
+++ incubator/kafka/site/contributing.html Mon Aug 15 21:03:48 2011
@@ -0,0 +1,14 @@
+<!--#include virtual="includes/header.html" -->
+<h1>How To Contribute</h1>
+We are always very happy to have code contributions whether for trivial cleanups or big new
features. To submit a patch for inclusion please do the following:
+	<li>Create a patch that applies cleanly against trunk.</li>
+	<li>If the patch is non-trivial we would prefer it come with some unit tests that
cover the new functionality</li>
+	<li>Make sure you have observed the recommendations in the <a href="coding-guide.html">style
+	<li>Open a <a href="https://issues.apache.org/jira/browse/KAFKA">JIRA</a>
ticket describing the patch and attach your patch to the JIRA. Include in the JIRA information
about the issue and the approach you took in fixing it (if this isn't obvious). Creating the
JIRA will automatically send an email to the developer email list alerting us of the new issue.</li>
+	<li>It is our job to follow up on patches, if your patch has sat for more than a week,
please ping <code>kafka-dev@incubator.apache.org</code> in case it somehow got
+<!--#include virtual="includes/footer.html" -->
\ No newline at end of file

Modified: incubator/kafka/site/includes/header.html
URL: http://svn.apache.org/viewvc/incubator/kafka/site/includes/header.html?rev=1158009&r1=1158008&r2=1158009&view=diff
--- incubator/kafka/site/includes/header.html (original)
+++ incubator/kafka/site/includes/header.html Mon Aug 15 21:03:48 2011
@@ -26,18 +26,24 @@
 			<div class="lsidebar">
 					<li><a href="downloads.html">download</a></li>
-					<li><a href="code.html">code</a></li>
-					<li><a href="api-docs/0.6">api docs</a></li>
+					<li><a href="api-docs/0.6">api&nbsp;docs</a></li>
 					<li><a href="quickstart.html">quickstart</a></li>
 					<li><a href="design.html">design</a></li>
 					<li><a href="configuration.html">configuration</a></li>
 					<li><a href="performance.html">performance</a></li>
-					<li><a href="projects.html">projects</a></li>
 					<li><a href="faq.html">faq</a></li>
 					<li><a href="https://cwiki.apache.org/confluence/display/KAFKA">wiki</li>
 					<li><a href="https://issues.apache.org/jira/browse/KAFKA">bugs</a></li>
 					<li><a href="contact.html">mailing&nbsp;lists</a></li>
-					<li><a href="http://test.project-voldemort.com:8080/">unit&nbsp;tests</a></li>
+					<li>developers
+						<ul>
+							<li><a href="code.html">code</a></li>
+							<li><a href="projects.html">projects</a></li>
+							<li><a href="contributing.html">contributing</a></li>
+							<li><a href="coding-guide.html">coding&nbsp;guide</a></li>
+							<li><a href="http://test.project-voldemort.com:8080/">unit&nbsp;tests</a></li>
+						</ul>
+					</li>
 		<div class='content'>
\ No newline at end of file

Modified: incubator/kafka/site/styles.css
URL: http://svn.apache.org/viewvc/incubator/kafka/site/styles.css?rev=1158009&r1=1158008&r2=1158009&view=diff
--- incubator/kafka/site/styles.css (original)
+++ incubator/kafka/site/styles.css Mon Aug 15 21:03:48 2011
@@ -58,6 +58,9 @@ a {
 	text-decoration: none;
 	color: #2e4a8e;
+.lsidebar li li {
+	list-style-type: circle;
 .content {
 	width: 700px;
 	margin-left: 200px;

View raw message