sqoop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jar...@apache.org
Subject git commit: SQOOP-1155: Sqoop 2 documentation for connector development
Date Thu, 15 Aug 2013 15:24:31 GMT
Updated Branches:
  refs/heads/sqoop2 c07c85758 -> d140c5ddd


SQOOP-1155: Sqoop 2 documentation for connector development

(Masatake Iwasaki via Jarek Jarcec Cecho)


Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/d140c5dd
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/d140c5dd
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/d140c5dd

Branch: refs/heads/sqoop2
Commit: d140c5ddd06f7b0179044184098b3521ab6227d0
Parents: c07c857
Author: Jarek Jarcec Cecho <jarcec@apache.org>
Authored: Thu Aug 15 08:23:54 2013 -0700
Committer: Jarek Jarcec Cecho <jarcec@apache.org>
Committed: Thu Aug 15 08:23:54 2013 -0700

----------------------------------------------------------------------
 docs/src/site/sphinx/ConnectorDevelopment.rst | 250 +++++++++++++++++++++
 docs/src/site/sphinx/index.rst                |   1 +
 2 files changed, 251 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/sqoop/blob/d140c5dd/docs/src/site/sphinx/ConnectorDevelopment.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/ConnectorDevelopment.rst b/docs/src/site/sphinx/ConnectorDevelopment.rst
new file mode 100644
index 0000000..918ca00
--- /dev/null
+++ b/docs/src/site/sphinx/ConnectorDevelopment.rst
@@ -0,0 +1,250 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+=============================
+Sqoop 2 Connector Development
+=============================
+
+This document describes you how to implement connector for Sqoop 2.
+
+
+What is Connector?
+++++++++++++++++++
+
+Connector provides interaction with external databases.
+Connector reads data from databases for import,
+and write data to databases for export.
+Interaction with Hadoop is taken cared by common modules of Sqoop 2 framework.
+
+
+Connector Implementation
+++++++++++++++++++++++++
+
+The SqoopConnector class defines functionality
+which must be provided by Connectors.
+Each Connector must extends SqoopConnector and overrides methods shown below.
+::
+
+  public abstract String getVersion();
+  public abstract ResourceBundle getBundle(Locale locale);
+  public abstract Class getConnectionConfigurationClass();
+  public abstract Class getJobConfigurationClass(MJob.Type jobType);
+  public abstract Importer getImporter();
+  public abstract Exporter getExporter();
+  public abstract Validator getValidator();
+  public abstract MetadataUpgrader getMetadataUpgrader();
+
+The getImporter method returns Importer_ instance
+which is a placeholder for the modules needed for import.
+
+The getExporter method returns Exporter_ instance
+which is a placeholder for the modules needed for export.
+
+Methods such as getBundle, getConnectionConfigurationClass,
+getJobConfigurationClass and getValidator
+are concerned to `Connector configurations`_ .
+
+
+Importer
+========
+
+Connector#getImporter method returns Importer instance
+which is a placeholder for the modules needed for import
+such as Partitioner_ and Extractor_ .
+Built-in GenericJdbcConnector defines Importer like this.
+::
+
+  private static final Importer IMPORTER = new Importer(
+      GenericJdbcImportInitializer.class,
+      GenericJdbcImportPartitioner.class,
+      GenericJdbcImportExtractor.class,
+      GenericJdbcImportDestroyer.class);
+  
+  ...
+  
+  @Override
+  public Importer getImporter() {
+    return IMPORTER;
+  }
+
+
+Extractor
+---------
+
+Extractor (E for ETL) extracts data from external database and
+writes it to Sqoop framework for import.
+
+Extractor must overrides extract method.
+::
+
+  public abstract void extract(ExtractorContext context,
+                               ConnectionConfiguration connectionConfiguration,
+                               JobConfiguration jobConfiguration,
+                               Partition partition);
+
+The extract method extracts data from database in some way and
+writes it to DataWriter (provided by context) as `Intermediate representation`_ .
+
+Extractor must iterates in the extract method until the data from database exhausts.
+::
+
+  while (resultSet.next()) {
+    ...
+    context.getDataWriter().writeArrayRecord(array);
+    ...
+  }
+
+
+Partitioner
+-----------
+
+Partitioner creates Partition instances based on configurations.
+The number of Partition instances is interpreted as the number of map tasks.
+Partition instances are passed to Extractor_ as the argument of extract method.
+Extractor_ determines which portion of the data to extract by Partition.
+
+There is no actual convention for Partition classes
+other than being actually Writable and toString()-able.
+::
+
+  public abstract class Partition {
+    public abstract void readFields(DataInput in) throws IOException;
+    public abstract void write(DataOutput out) throws IOException;
+    public abstract String toString();
+  }
+
+Connectors can define the design of Partition on their own.
+
+
+Initializer and Destroyer
+-------------------------
+
+Initializer is instantiated before the submission of MapReduce job
+for doing preparation such as adding dependent jar files.
+
+Destroyer is instantiated after MapReduce job is finished for clean up.
+
+
+Exporter
+========
+
+Connector#getExporter method returns Exporter instance
+which is a placeholder for the modules needed for export
+such as Loader_ .
+Built-in GenericJdbcConnector defines Exporter like this.
+::
+
+  private static final Exporter EXPORTER = new Exporter(
+      GenericJdbcExportInitializer.class,
+      GenericJdbcExportLoader.class,
+      GenericJdbcExportDestroyer.class);
+  
+  ...
+  
+  @Override
+  public Exporter getExporter() {
+    return EXPORTER;
+  }
+
+
+Loader
+------
+
+Loader (L for ETL) receives data from Sqoop framework and
+loads it to external database.
+
+Loader must overrides load method.
+::
+
+  public abstract void load(LoaderContext context,
+                            ConnectionConfiguration connectionConfiguration,
+                            JobConfiguration jobConfiguration) throws Exception;
+
+The load method reads data from DataReader (provided by context)
+in `Intermediate representation`_ and loads it to database in some way.
+
+Loader must iterates in the load method until the data from DataReader exhausts.
+::
+
+  while ((array = context.getDataReader().readArrayRecord()) != null) {
+    ...
+  }
+
+
+Initializer and Destroyer
+-------------------------
+
+Initializer is instantiated before the submission of MapReduce job
+for doing preparation such as adding dependent jar files.
+
+Destroyer is instantiated after MapReduce job is finished for clean up.
+
+
+Connector Configurations
+++++++++++++++++++++++++
+
+Configurations
+==============
+
+The definition of the configurations are represented
+by models defined in org.apache.sqoop.model package.
+
+
+ConnectionConfigurationClass
+----------------------------
+
+
+JobConfigurationClass
+---------------------
+
+
+ResourceBundle
+==============
+
+Resources for Configurations_ are stored in properties file
+accessed by getBundle method of the Connector.
+
+
+Validator
+=========
+
+Validator validates configurations set by users.
+
+
+Internal of Sqoop2 MapReduce Job
+++++++++++++++++++++++++++++++++
+
+Sqoop 2 provides common MapReduce modules such as SqoopMapper and SqoopReducer
+for the both of import and export.
+
+- InputFormat create splits using Partitioner.
+
+- SqoopMapper invokes Extractor's extract method.
+
+- SqoopReducer do no actual works.
+
+- OutputFormat invokes Loader's load method (via SqoopOutputFormatLoadExecutor).
+
+.. todo: sequence diagram like figure.
+
+For import, Extractor provided by Connector extracts data from databases,
+and Loader provided by Sqoop2 loads data into Hadoop.
+
+For export, Extractor provided Sqoop2 exracts data from Hadoop,
+and Loader provided by Connector loads data into databases.
+
+
+.. _`Intermediate representation`: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation

http://git-wip-us.apache.org/repos/asf/sqoop/blob/d140c5dd/docs/src/site/sphinx/index.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/index.rst b/docs/src/site/sphinx/index.rst
index 15ddfbb..6b03e62 100644
--- a/docs/src/site/sphinx/index.rst
+++ b/docs/src/site/sphinx/index.rst
@@ -61,3 +61,4 @@ Developer Guide
 - `Building Sqoop2 <BuildingSqoop2.html>`_
 - `Development Environment Setup <DevEnv.html>`_
 - `Java Client API Guide <ClientAPI.html>`_
+- `Developping Connector <ConnectorDevelopment.html>`_


Mime
View raw message