sqoop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r853642 [2/17] - in /websites/staging/sqoop/trunk/content: ./ docs/1.4.3/ docs/1.4.3/api/ docs/1.4.3/api/com/ docs/1.4.3/api/com/cloudera/ docs/1.4.3/api/com/cloudera/sqoop/ docs/1.4.3/api/com/cloudera/sqoop/lib/ docs/1.4.3/api/com/cloudera...
Date Fri, 08 Mar 2013 16:45:42 GMT
Added: websites/staging/sqoop/trunk/content/docs/1.4.3/SqoopUserGuide.html
==============================================================================
--- websites/staging/sqoop/trunk/content/docs/1.4.3/SqoopUserGuide.html (added)
+++ websites/staging/sqoop/trunk/content/docs/1.4.3/SqoopUserGuide.html Fri Mar  8 16:45:37 2013
@@ -0,0 +1,2673 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Sqoop User Guide (v1.4.3)</title><link rel="stylesheet" type="text/css" href="docbook.css"><meta name="generator" content="DocBook XSL Stylesheets V1.76.1"></head><body><div style="clear:both; margin-bottom: 4px"></div><div align="center"><a href="index.html"><img src="images/home.png" alt="Documentation Home"></a></div><span class="breadcrumbs"><div class="breadcrumbs"><span class="breadcrumb-node">Sqoop User Guide (v1.4.3)</span></div></span><div lang="en" class="article" title="Sqoop User Guide (v1.4.3)"><div class="titlepage"><div><div><h2 class="title"><a name="idp200976"></a>Sqoop User Guide (v1.4.3)</h2></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#_introduction">1. Introduction</a></span></dt><dt><span class="section"><a href="#_supported_releases">2. Supported Releases</a></span></dt><dt><span class="section"><a h
 ref="#_sqoop_releases">3. Sqoop Releases</a></span></dt><dt><span class="section"><a href="#_prerequisites">4. Prerequisites</a></span></dt><dt><span class="section"><a href="#_basic_usage">5. Basic Usage</a></span></dt><dt><span class="section"><a href="#_sqoop_tools">6. Sqoop Tools</a></span></dt><dd><dl><dt><span class="section"><a href="#_using_command_aliases">6.1. Using Command Aliases</a></span></dt><dt><span class="section"><a href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop Installation</a></span></dt><dt><span class="section"><a href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific Arguments</a></span></dt><dt><span class="section"><a href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. Using Tools</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_literal">7. <code class="literal"
 >sqoop-import</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose">7.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.5. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.6. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.7. Incremental Imports</a></span></dt><
 dt><span class="section"><a href="#_file_formats">7.2.8. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.9. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.10. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.11. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.12. Additional Import Configuration Properties</a></span></dt></dl></dd><dt><span class="section"><a href="#_example_invocations">7.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_all_tables_literal">8. <code class="literal">sqoop-import-all-tables</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_2">8.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_2">8.2. Syntax</a></span></dt><dt><span class="sec
 tion"><a href="#_example_invocations_2">8.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_export_literal">9. <code class="literal">sqoop-export</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_3">9.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_3">9.2. Syntax</a></span></dt><dt><span class="section"><a href="#_inserts_vs_updates">9.3. Inserts vs. Updates</a></span></dt><dt><span class="section"><a href="#_exports_and_transactions">9.4. Exports and Transactions</a></span></dt><dt><span class="section"><a href="#_failed_exports">9.5. Failed Exports</a></span></dt><dt><span class="section"><a href="#_example_invocations_3">9.6. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#validation">10. <code class="literal">validation</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_4">10.1. Purpose</a></span></dt><dt><span class="se
 ction"><a href="#_introduction_2">10.2. Introduction</a></span></dt><dt><span class="section"><a href="#_syntax_4">10.3. Syntax</a></span></dt><dt><span class="section"><a href="#_configuration">10.4. Configuration</a></span></dt><dt><span class="section"><a href="#_limitations">10.5. Limitations</a></span></dt><dt><span class="section"><a href="#_example_invocations_4">10.6. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_saved_jobs">11. Saved Jobs</a></span></dt><dt><span class="section"><a href="#_literal_sqoop_job_literal">12. <code class="literal">sqoop-job</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_5">12.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_5">12.2. Syntax</a></span></dt><dt><span class="section"><a href="#_saved_jobs_and_passwords">12.3. Saved jobs and passwords</a></span></dt><dt><span class="section"><a href="#_saved_jobs_and_incremental_imports">12.4. Saved jobs and i
 ncremental imports</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_metastore_literal">13. <code class="literal">sqoop-metastore</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_6">13.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_6">13.2. Syntax</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_merge_literal">14. <code class="literal">sqoop-merge</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_7">14.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_7">14.2. Syntax</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_codegen_literal">15. <code class="literal">sqoop-codegen</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_8">15.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_8">15.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocation
 s_5">15.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_create_hive_table_literal">16. <code class="literal">sqoop-create-hive-table</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_9">16.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_9">16.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_6">16.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_eval_literal">17. <code class="literal">sqoop-eval</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_10">17.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_10">17.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_7">17.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_list_databases_literal">18. <code class="literal">sqoop-list-dat
 abases</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_11">18.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_11">18.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_8">18.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_list_tables_literal">19. <code class="literal">sqoop-list-tables</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_12">19.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_12">19.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_9">19.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_help_literal">20. <code class="literal">sqoop-help</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_13">20.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_13">20.2. Syntax</a></
 span></dt><dt><span class="section"><a href="#_example_invocations_10">20.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_version_literal">21. <code class="literal">sqoop-version</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_14">21.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_14">21.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_11">21.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_compatibility_notes">22. Compatibility Notes</a></span></dt><dd><dl><dt><span class="section"><a href="#_supported_databases">22.1. Supported Databases</a></span></dt><dt><span class="section"><a href="#_mysql">22.2. MySQL</a></span></dt><dd><dl><dt><span class="section"><a href="#_zerodatetimebehavior">22.2.1. zeroDateTimeBehavior</a></span></dt><dt><span class="section"><a href="#_literal_unsigned_literal_columns">22.2.2
 . <code class="literal">UNSIGNED</code> columns</a></span></dt><dt><span class="section"><a href="#_literal_blob_literal_and_literal_clob_literal_columns">22.2.3. <code class="literal">BLOB</code> and <code class="literal">CLOB</code> columns</a></span></dt><dt><span class="section"><a href="#_importing_views_in_direct_mode">22.2.4. Importing views in direct mode</a></span></dt><dt><span class="section"><a href="#_direct_mode_transactions">22.2.5. Direct-mode Transactions</a></span></dt></dl></dd><dt><span class="section"><a href="#_postgresql">22.3. PostgreSQL</a></span></dt><dd><dl><dt><span class="section"><a href="#_importing_views_in_direct_mode_2">22.3.1. Importing views in direct mode</a></span></dt></dl></dd><dt><span class="section"><a href="#_oracle">22.4. Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_dates_and_times">22.4.1. Dates and Times</a></span></dt></dl></dd><dt><span class="section"><a href="#_schema_definition_in_hive">22.5. Schema De
 finition in Hive</a></span></dt></dl></dd><dt><span class="section"><a href="#_notes_for_specific_connectors">23. Notes for specific connectors</a></span></dt><dd><dl><dt><span class="section"><a href="#_mysql_jdbc_connector">23.1. MySQL JDBC Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_upsert_functionality">23.1.1. Upsert functionality</a></span></dt></dl></dd><dt><span class="section"><a href="#_microsoft_sql_connector">23.2. Microsoft SQL Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_extra_arguments">23.2.1. Extra arguments</a></span></dt><dt><span class="section"><a href="#_schema_support">23.2.2. Schema support</a></span></dt><dt><span class="section"><a href="#_table_hints">23.2.3. Table hints</a></span></dt></dl></dd><dt><span class="section"><a href="#_postgresql_connector">23.3. PostgreSQL Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_extra_arguments_2">23.3.1. Extra arguments</a></span></dt><d
 t><span class="section"><a href="#_schema_support_2">23.3.2. Schema support</a></span></dt></dl></dd><dt><span class="section"><a href="#_pg_bulkload_connector">23.4. pg_bulkload connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_15">23.4.1. Purpose</a></span></dt><dt><span class="section"><a href="#_requirements">23.4.2. Requirements</a></span></dt><dt><span class="section"><a href="#_syntax_15">23.4.3. Syntax</a></span></dt><dt><span class="section"><a href="#_data_staging">23.4.4. Data Staging</a></span></dt></dl></dd></dl></dd><dt><span class="section"><a href="#_getting_support">24. Getting Support</a></span></dt><dt><span class="section"><a href="#_troubleshooting">25. Troubleshooting</a></span></dt><dd><dl><dt><span class="section"><a href="#_general_troubleshooting_process">25.1. General Troubleshooting Process</a></span></dt><dt><span class="section"><a href="#_specific_troubleshooting_tips">25.2. Specific Troubleshooting Tips</a></span></
 dt><dd><dl><dt><span class="section"><a href="#_oracle_connection_reset_errors">25.2.1. Oracle: Connection Reset Errors</a></span></dt><dt><span class="section"><a href="#_oracle_case_sensitive_catalog_query_errors">25.2.2. Oracle: Case-Sensitive Catalog Query Errors</a></span></dt><dt><span class="section"><a href="#_mysql_connection_failure">25.2.3. MySQL: Connection Failure</a></span></dt><dt><span class="section"><a href="#_oracle_ora_00933_error_sql_command_not_properly_ended">25.2.4. Oracle: ORA-00933 error (SQL command not properly ended)</a></span></dt><dt><span class="section"><a href="#_mysql_import_of_tinyint_1_from_mysql_behaves_strangely">25.2.5. MySQL: Import of TINYINT(1) from MySQL behaves strangely</a></span></dt></dl></dd></dl></dd></dl></div><pre class="screen">  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.</pre><div class="section" title="1. Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_introduction"></a>1. Introduction</h2></div></div></div><p>Sqoop is a tool designed to transfer data between Hadoop and
+relational databases. You can use Sqoop to import data from a
+relational database management system (RDBMS) such as MySQL or Oracle
+into the Hadoop Distributed File System (HDFS),
+transform the data in Hadoop MapReduce, and then export the data back
+into an RDBMS.</p><p>Sqoop automates most of this process, relying on the database to
+describe the schema for the data to be imported. Sqoop uses MapReduce
+to import and export the data, which provides parallel operation as
+well as fault tolerance.</p><p>This document describes how to get started using Sqoop to move data
+between databases and Hadoop and provides reference information for
+the operation of the Sqoop command-line tool suite. This document is
+intended for:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
+System and application programmers
+</li><li class="listitem">
+System administrators
+</li><li class="listitem">
+Database administrators
+</li><li class="listitem">
+Data analysts
+</li><li class="listitem">
+Data engineers
+</li></ul></div></div><div class="section" title="2. Supported Releases"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_supported_releases"></a>2. Supported Releases</h2></div></div></div><p>This documentation applies to Sqoop v1.4.3.</p></div><div class="section" title="3. Sqoop Releases"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_sqoop_releases"></a>3. Sqoop Releases</h2></div></div></div><p>Sqoop is an open source software product of the Apache Software Foundation.</p><p>Software development for Sqoop occurs at <a class="ulink" href="http://sqoop.apache.org" target="_top">http://sqoop.apache.org</a>
+At that site you can obtain:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
+New releases of Sqoop as well as its most recent source code
+</li><li class="listitem">
+An issue tracker
+</li><li class="listitem">
+A wiki that contains Sqoop documentation
+</li></ul></div></div><div class="section" title="4. Prerequisites"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_prerequisites"></a>4. Prerequisites</h2></div></div></div><p>The following prerequisite knowledge is required for this product:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
+Basic computer technology and terminology
+</li><li class="listitem">
+Familiarity with command-line interfaces such as <code class="literal">bash</code>
+</li><li class="listitem">
+Relational database management systems
+</li><li class="listitem">
+Basic familiarity with the purpose and operation of Hadoop
+</li></ul></div><p>Before you can use Sqoop, a release of Hadoop must be installed and
+configured. Sqoop is currently supporting 4 major Hadoop releases - 0.20,
+0.23, 1.0 and 2.0.</p><p>This document assumes you are using a Linux or Linux-like environment.
+If you are using Windows, you may be able to use cygwin to accomplish
+most of the following tasks. If you are using Mac OS X, you should see
+few (if any) compatibility errors. Sqoop is predominantly operated and
+tested on Linux.</p></div><div class="section" title="5. Basic Usage"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_basic_usage"></a>5. Basic Usage</h2></div></div></div><p>With Sqoop, you can <span class="emphasis"><em>import</em></span> data from a relational database system into
+HDFS. The input to the import process is a database table. Sqoop
+will read the table row-by-row into HDFS. The output of this import
+process is a set of files containing a copy of the imported table.
+The import process is performed in parallel. For this reason, the
+output will be in multiple files. These files may be delimited text
+files (for example, with commas or tabs separating each field), or
+binary Avro or SequenceFiles containing serialized record data.</p><p>A by-product of the import process is a generated Java class which
+can encapsulate one row of the imported table. This class is used
+during the import process by Sqoop itself. The Java source code for
+this class is also provided to you, for use in subsequent MapReduce
+processing of the data. This class can serialize and deserialize data
+to and from the SequenceFile format. It can also parse the
+delimited-text form of a record. These abilities allow you to quickly
+develop MapReduce applications that use the HDFS-stored records in
+your processing pipeline. You are also free to parse the delimiteds
+record data yourself, using any other tools you prefer.</p><p>After manipulating the imported records (for example, with MapReduce
+or Hive) you may have a result data set which you can then <span class="emphasis"><em>export</em></span>
+back to the relational database. Sqoop&#8217;s export process will read
+a set of delimited text files from HDFS in parallel, parse them into
+records, and insert them as new rows in a target database table, for
+consumption by external applications or users.</p><p>Sqoop includes some other commands which allow you to inspect the
+database you are working with. For example, you can list the available
+database schemas (with the <code class="literal">sqoop-list-databases</code> tool) and tables
+within a schema (with the <code class="literal">sqoop-list-tables</code> tool). Sqoop also
+includes a primitive SQL execution shell (the <code class="literal">sqoop-eval</code> tool).</p><p>Most aspects of the import, code generation, and export processes can
+be customized. You can control the specific row range or columns imported.
+You can specify particular delimiters and escape characters for the
+file-based representation of the data, as well as the file format
+used.  You can also control the class or package names used in
+generated code. Subsequent sections of this document explain how to
+specify these and other arguments to Sqoop.</p></div><div class="section" title="6. Sqoop Tools"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_sqoop_tools"></a>6. Sqoop Tools</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="#_using_command_aliases">6.1. Using Command Aliases</a></span></dt><dt><span class="section"><a href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop Installation</a></span></dt><dt><span class="section"><a href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific Arguments</a></span></dt><dt><span class="section"><a href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. Using Tools</a></span></dt></dl></div><p>Sqoop is a collection of related tools. To use Sqoop, you specify the
+tool you want to use and the arguments that control the tool.</p><p>If Sqoop is compiled from its own source, you can run Sqoop without a formal
+installation process by running the <code class="literal">bin/sqoop</code> program. Users
+of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop)
+will see this program installed as <code class="literal">/usr/bin/sqoop</code>. The remainder of this
+documentation will refer to this program as <code class="literal">sqoop</code>. For example:</p><pre class="screen">$ sqoop tool-name [tool-arguments]</pre><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The following examples that begin with a <code class="literal">$</code> character indicate
+that the commands must be entered at a terminal prompt (such as
+<code class="literal">bash</code>). The <code class="literal">$</code> character represents the prompt itself; you should
+not start these commands by typing a <code class="literal">$</code>. You can also enter commands
+inline in the text of a paragraph; for example, <code class="literal">sqoop help</code>. These
+examples do not show a <code class="literal">$</code> prefix, but you should enter them the same
+way.  Don&#8217;t confuse the <code class="literal">$</code> shell prompt in the examples with the <code class="literal">$</code>
+that precedes an environment variable name. For example, the string
+literal <code class="literal">$HADOOP_HOME</code> includes a "<code class="literal">$</code>".</p></td></tr></table></div><p>Sqoop ships with a help tool. To display a list of all available
+tools, type the following command:</p><pre class="screen">$ sqoop help
+usage: sqoop COMMAND [ARGS]
+
+Available commands:
+  codegen            Generate code to interact with database records
+  create-hive-table  Import a table definition into Hive
+  eval               Evaluate a SQL statement and display the results
+  export             Export an HDFS directory to a database table
+  help               List available commands
+  import             Import a table from a database to HDFS
+  import-all-tables  Import tables from a database to HDFS
+  list-databases     List available databases on a server
+  list-tables        List available tables in a database
+  version            Display version information
+
+See 'sqoop help COMMAND' for information on a specific command.</pre><p>You can display help for a specific tool by entering: <code class="literal">sqoop help
+(tool-name)</code>; for example, <code class="literal">sqoop help import</code>.</p><p>You can also add the <code class="literal">--help</code> argument to any command: <code class="literal">sqoop import
+--help</code>.</p><div class="section" title="6.1. Using Command Aliases"><div class="titlepage"><div><div><h3 class="title"><a name="_using_command_aliases"></a>6.1. Using Command Aliases</h3></div></div></div><p>In addition to typing the <code class="literal">sqoop (toolname)</code> syntax, you can use alias
+scripts that specify the <code class="literal">sqoop-(toolname)</code> syntax. For example, the
+scripts <code class="literal">sqoop-import</code>, <code class="literal">sqoop-export</code>, etc. each select a specific
+tool.</p></div><div class="section" title="6.2. Controlling the Hadoop Installation"><div class="titlepage"><div><div><h3 class="title"><a name="_controlling_the_hadoop_installation"></a>6.2. Controlling the Hadoop Installation</h3></div></div></div><p>You invoke Sqoop through the program launch capability provided by
+Hadoop. The <code class="literal">sqoop</code> command-line program is a wrapper which runs the
+<code class="literal">bin/hadoop</code> script shipped with Hadoop. If you have multiple
+installations of Hadoop present on your machine, you can select the
+Hadoop installation by setting the <code class="literal">$HADOOP_COMMON_HOME</code> and
+<code class="literal">$HADOOP_MAPRED_HOME</code> environment variables.</p><p>For example:</p><pre class="screen">$ HADOOP_COMMON_HOME=/path/to/some/hadoop \
+  HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
+  sqoop import --arguments...</pre><p>or:</p><pre class="screen">$ export HADOOP_COMMON_HOME=/some/path/to/hadoop
+$ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce
+$ sqoop import --arguments...</pre><p>If either of these variables are not set, Sqoop will fall back to
+<code class="literal">$HADOOP_HOME</code>. If it is not set either, Sqoop will use the default
+installation locations for Apache Bigtop, <code class="literal">/usr/lib/hadoop</code> and
+<code class="literal">/usr/lib/hadoop-mapreduce</code>, respectively.</p><p>The active Hadoop configuration is loaded from <code class="literal">$HADOOP_HOME/conf/</code>,
+unless the <code class="literal">$HADOOP_CONF_DIR</code> environment variable is set.</p></div><div class="section" title="6.3. Using Generic and Specific Arguments"><div class="titlepage"><div><div><h3 class="title"><a name="_using_generic_and_specific_arguments"></a>6.3. Using Generic and Specific Arguments</h3></div></div></div><p>To control the operation of each Sqoop tool, you use generic and
+specific arguments.</p><p>For example:</p><pre class="screen">$ sqoop help import
+usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
+
+Common arguments:
+   --connect &lt;jdbc-uri&gt;     Specify JDBC connect string
+   --connect-manager &lt;jdbc-uri&gt;     Specify connection manager class to use
+   --driver &lt;class-name&gt;    Manually specify JDBC driver class to use
+   --hadoop-mapred-home &lt;dir&gt;+      Override $HADOOP_MAPRED_HOME
+   --help                   Print usage instructions
+-P                          Read password from console
+   --password &lt;password&gt;    Set authentication password
+   --username &lt;username&gt;    Set authentication username
+   --verbose                Print more information while working
+   --hadoop-home &lt;dir&gt;+     Deprecated. Override $HADOOP_HOME
+
+[...]
+
+Generic Hadoop command-line arguments:
+(must preceed any tool-specific arguments)
+Generic options supported are
+-conf &lt;configuration file&gt;     specify an application configuration file
+-D &lt;property=value&gt;            use value for given property
+-fs &lt;local|namenode:port&gt;      specify a namenode
+-jt &lt;local|jobtracker:port&gt;    specify a job tracker
+-files &lt;comma separated list of files&gt;    specify comma separated files to be copied to the map reduce cluster
+-libjars &lt;comma separated list of jars&gt;    specify comma separated jar files to include in the classpath.
+-archives &lt;comma separated list of archives&gt;    specify comma separated archives to be unarchived on the compute machines.
+
+The general command line syntax is
+bin/hadoop command [genericOptions] [commandOptions]</pre><p>You must supply the generic arguments <code class="literal">-conf</code>, <code class="literal">-D</code>, and so on after the
+tool name but <span class="strong"><strong>before</strong></span> any tool-specific arguments (such as
+<code class="literal">--connect</code>). Note that generic Hadoop arguments are preceeded by a
+single dash character (<code class="literal">-</code>), whereas tool-specific arguments start
+with two dashes (<code class="literal">--</code>), unless they are single character arguments such as <code class="literal">-P</code>.</p><p>The <code class="literal">-conf</code>, <code class="literal">-D</code>, <code class="literal">-fs</code> and <code class="literal">-jt</code> arguments control the configuration
+and Hadoop server settings. For example, the <code class="literal">-D mapred.job.name=&lt;job_name&gt;</code> can
+be used to set the name of the MR job that Sqoop launches, if not specified,
+the name defaults to the jar name for the job - which is derived from the used
+table name.</p><p>The <code class="literal">-files</code>, <code class="literal">-libjars</code>, and <code class="literal">-archives</code> arguments are not typically used with
+Sqoop, but they are included as part of Hadoop&#8217;s internal argument-parsing
+system.</p></div><div class="section" title="6.4. Using Options Files to Pass Arguments"><div class="titlepage"><div><div><h3 class="title"><a name="_using_options_files_to_pass_arguments"></a>6.4. Using Options Files to Pass Arguments</h3></div></div></div><p>When using Sqoop, the command line options that do not change from
+invocation to invocation can be put in an options file for convenience.
+An options file is a text file where each line identifies an option in
+the order that it appears otherwise on the command line. Option files
+allow specifying a single option on multiple lines by using the
+back-slash character at the end of intermediate lines. Also supported
+are comments within option files that begin with the hash character.
+Comments must be specified on a new line and may not be mixed with
+option text. All comments and empty lines are ignored when option
+files are expanded. Unless options appear as quoted strings, any
+leading or trailing spaces are ignored. Quoted strings if used must
+not extend beyond the line on which they are specified.</p><p>Option files can be specified anywhere in the command line as long as
+the options within them follow the otherwise prescribed rules of
+options ordering. For instance, regardless of where the options are
+loaded from, they must follow the ordering such that generic options
+appear first, tool specific options next, finally followed by options
+that are intended to be passed to child programs.</p><p>To specify an options file, simply create an options file in a
+convenient location and pass it to the command line via
+<code class="literal">--options-file</code> argument.</p><p>Whenever an options file is specified, it is expanded on the
+command line before the tool is invoked. You can specify more than
+one option files within the same invocation if needed.</p><p>For example, the following Sqoop invocation for import can
+be specified alternatively as shown below:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST
+
+$ sqoop --options-file /users/homer/work/import.txt --table TEST</pre><p>where the options file <code class="literal">/users/homer/work/import.txt</code> contains the following:</p><pre class="screen">import
+--connect
+jdbc:mysql://localhost/db
+--username
+foo</pre><p>The options file can have empty lines and comments for readability purposes.
+So the above example would work exactly the same if the options file
+<code class="literal">/users/homer/work/import.txt</code> contained the following:</p><pre class="screen">#
+# Options file for Sqoop import
+#
+
+# Specifies the tool being invoked
+import
+
+# Connect parameter and value
+--connect
+jdbc:mysql://localhost/db
+
+# Username parameter and value
+--username
+foo
+
+#
+# Remaining options should be specified in the command line.
+#</pre></div><div class="section" title="6.5. Using Tools"><div class="titlepage"><div><div><h3 class="title"><a name="_using_tools"></a>6.5. Using Tools</h3></div></div></div><p>The following sections will describe each tool&#8217;s operation. The
+tools are listed in the most likely order you will find them useful.</p></div></div><div class="section" title="7. sqoop-import"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_literal_sqoop_import_literal"></a>7. <code class="literal">sqoop-import</code></h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="#_purpose">7.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_contro
 lling_the_import_process">7.2.5. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.6. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.7. Incremental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.8. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.9. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.10. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.11. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.12. Additional Import Configuration Properties</a></span></dt></dl></dd><dt><span class="section"><a href="#_example_invocations">7.3. Example Invocations</a></span></dt></dl></div><div class="section" title="7.1. Pu
 rpose"><div class="titlepage"><div><div><h3 class="title"><a name="_purpose"></a>7.1. Purpose</h3></div></div></div><p>The <code class="literal">import</code> tool imports an individual table from an RDBMS to HDFS.
+Each row from a table is represented as a separate record in HDFS.
+Records can be stored as text files (one record per line), or in
+binary representation as Avro or SequenceFiles.</p></div><div class="section" title="7.2. Syntax"><div class="titlepage"><div><div><h3 class="title"><a name="_syntax"></a>7.2. Syntax</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.5. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.6. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.7. Increm
 ental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.8. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.9. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.10. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.11. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.12. Additional Import Configuration Properties</a></span></dt></dl></div><pre class="screen">$ sqoop import (generic-args) (import-args)
+$ sqoop-import (generic-args) (import-args)</pre><p>While the Hadoop generic arguments must precede any import arguments,
+you can type the import arguments in any order with respect to one
+another.</p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>In this document, arguments are grouped into collections
+organized by function. Some collections are present in several tools
+(for example, the "common" arguments). An extended description of their
+functionality is given only on the first presentation in this
+document.</p></td></tr></table></div><div class="table"><a name="idp5975584"></a><p class="title"><b>Table 1. Common arguments</b></p><div class="table-contents"><table summary="Common arguments" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--connect &lt;jdbc-uri&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify JDBC connect string
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--connection-manager &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify connection manager class to                                          use
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--driver &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Manually specify JDBC driver class                                          to use
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--hadoop-mapred-home &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Override $HADOOP_MAPRED_HOME
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--help</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Print usage instructions
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">-P</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Read password from console
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--password &lt;password&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set authentication password
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--username &lt;username&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set authentication username
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--verbose</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Print more information while working
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--connection-param-file &lt;filename&gt;</code>
+    </td><td style="" align="left">
+    Optional properties file that                                          provides connection parameters
+    </td></tr></tbody></table></div></div><br class="table-break"><div class="section" title="7.2.1. Connecting to a Database Server"><div class="titlepage"><div><div><h4 class="title"><a name="_connecting_to_a_database_server"></a>7.2.1. Connecting to a Database Server</h4></div></div></div><p>Sqoop is designed to import tables from a database into HDFS. To do
+so, you must specify a <span class="emphasis"><em>connect string</em></span> that describes how to connect to the
+database. The <span class="emphasis"><em>connect string</em></span> is similar to a URL, and is communicated to
+Sqoop with the <code class="literal">--connect</code> argument. This describes the server and
+database to connect to; it may also specify the port. For example:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees</pre><p>This string will connect to a MySQL database named <code class="literal">employees</code> on the
+host <code class="literal">database.example.com</code>. It&#8217;s important that you <span class="strong"><strong>do not</strong></span> use the URL
+<code class="literal">localhost</code> if you intend to use Sqoop with a distributed Hadoop
+cluster. The connect string you supply will be used on TaskTracker nodes
+throughout your MapReduce cluster; if you specify the
+literal name <code class="literal">localhost</code>, each node will connect to a different
+database (or more likely, no database at all). Instead, you should use
+the full hostname or IP address of the database host that can be seen
+by all your remote nodes.</p><p>You might need to authenticate against the database before you can
+access it. You can use the <code class="literal">--username</code> and <code class="literal">--password</code> or <code class="literal">-P</code> parameters
+to supply a username and a password to the database. For example:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \
+    --username aaron --password 12345</pre><div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td align="left" valign="top"><p>The <code class="literal">--password</code> parameter is insecure, as other users may
+be able to read your password from the command-line arguments via
+the output of programs such as <code class="literal">ps</code>. The <span class="strong"><strong><code class="literal">-P</code></strong></span> argument will read
+a password from a console prompt, and is the preferred method of
+entering credentials. Credentials may still be transferred between
+nodes of the MapReduce cluster using insecure means.</p></td></tr></table></div><p>Sqoop automatically supports several databases, including MySQL.  Connect
+strings beginning with <code class="literal">jdbc:mysql://</code> are handled automatically in Sqoop.  (A
+full list of databases with built-in support is provided in the "Supported
+Databases" section. For some, you may need to install the JDBC driver
+yourself.)</p><p>You can use Sqoop with any other
+JDBC-compliant database. First, download the appropriate JDBC
+driver for the type of database you want to import, and install the .jar
+file in the <code class="literal">$SQOOP_HOME/lib</code> directory on your client machine. (This will
+be <code class="literal">/usr/lib/sqoop/lib</code> if you installed from an RPM or Debian package.)
+Each driver <code class="literal">.jar</code> file also has a specific driver class which defines
+the entry-point to the driver. For example, MySQL&#8217;s Connector/J library has
+a driver class of <code class="literal">com.mysql.jdbc.Driver</code>. Refer to your database
+vendor-specific documentation to determine the main driver class.
+This class must be provided as an argument to Sqoop with <code class="literal">--driver</code>.</p><p>For example, to connect to a SQLServer database, first download the driver from
+microsoft.com and install it in your Sqoop lib path.</p><p>Then run Sqoop. For example:</p><pre class="screen">$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
+    --connect &lt;connect-string&gt; ...</pre><p>When connecting to a database using JDBC, you can optionally specify extra
+JDBC parameters via a property file using the option
+<code class="literal">--connection-param-file</code>. The contents of this file are parsed as standard
+Java properties and passed into the driver while creating a connection.</p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The parameters specified via the optional property file are only
+applicable to JDBC connections. Any fastpath connectors that use connections
+other than JDBC will ignore these parameters.</p></td></tr></table></div><div class="table"><a name="idp6025328"></a><p class="title"><b>Table 2. Validation arguments <a class="link" href="#validation" title="10. validation">More Details</a></b></p><div class="table-contents"><table summary="Validation arguments More Details" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--validate</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Enable validation of data copied,                                           supports single table copy only.  <code class="literal">--validator &lt;class-name&gt;</code>               Specify validator class to use.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--validation-threshold &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify validation threshold class                                           to use.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    +--validation-failurehandler &lt;class-name
+    </td><td style="" align="left">
+    &gt;+ Specify validation failure                                           handler class to use.
+    </td></tr></tbody></table></div></div><br class="table-break"><div class="table"><a name="idp6049968"></a><p class="title"><b>Table 3. Import control arguments:</b></p><div class="table-contents"><table summary="Import control arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--append</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Append data to an existing dataset                                  in HDFS
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--as-avrodatafile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data to Avro Data Files
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--as-sequencefile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data to SequenceFiles
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--as-textfile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data as plain text (default)
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--boundary-query &lt;statement&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Boundary query to use for creating splits
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--columns &lt;col,col,col&#8230;&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Columns to import from table
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--direct</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Use direct import fast path
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--direct-split-size &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Split the input stream every <span class="emphasis"><em>n</em></span> bytes                                  when importing in direct mode
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--fetch-size &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Number of entries to read from database                                  at once.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--inline-lob-limit &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set the maximum size for an inline LOB
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">-m,--num-mappers &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Use <span class="emphasis"><em>n</em></span> map tasks to import in parallel
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">-e,--query &lt;statement&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Import the results of <span class="emphasis"><em><code class="literal">statement</code></em></span>.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--split-by &lt;column-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Column of the table used to split work                                  units
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--table &lt;table-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Table to read
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--target-dir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    HDFS destination dir
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--warehouse-dir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    HDFS parent for table destination
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--where &lt;where clause&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    WHERE clause to use during import
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">-z,--compress</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Enable compression
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--compression-codec &lt;c&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Use Hadoop codec (default gzip)
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--null-string &lt;null-string&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    The string to be written for a null                                  value for string columns
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--null-non-string &lt;null-string&gt;</code>
+    </td><td style="" align="left">
+    The string to be written for a null                                  value for non-string columns
+    </td></tr></tbody></table></div></div><br class="table-break"><p>The <code class="literal">--null-string</code> and <code class="literal">--null-non-string</code> arguments are optional.\
+If not specified, then the string "null" will be used.</p></div><div class="section" title="7.2.2. Selecting the Data to Import"><div class="titlepage"><div><div><h4 class="title"><a name="_selecting_the_data_to_import"></a>7.2.2. Selecting the Data to Import</h4></div></div></div><p>Sqoop typically imports data in a table-centric fashion. Use the
+<code class="literal">--table</code> argument to select the table to import. For example, <code class="literal">--table
+employees</code>. This argument can also identify a <code class="literal">VIEW</code> or other table-like
+entity in a database.</p><p>By default, all columns within a table are selected for import.
+Imported data is written to HDFS in its "natural order;" that is, a
+table containing columns A, B, and C result in an import of data such
+as:</p><pre class="screen">A1,B1,C1
+A2,B2,C2
+...</pre><p>You can select a subset of columns and control their ordering by using
+the <code class="literal">--columns</code> argument. This should include a comma-delimited list
+of columns to import. For example: <code class="literal">--columns "name,employee_id,jobtitle"</code>.</p><p>You can control which rows are imported by adding a SQL <code class="literal">WHERE</code> clause
+to the import statement. By default, Sqoop generates statements of the
+form <code class="literal">SELECT &lt;column list&gt; FROM &lt;table name&gt;</code>. You can append a
+<code class="literal">WHERE</code> clause to this with the <code class="literal">--where</code> argument. For example: <code class="literal">--where
+"id &gt; 400"</code>. Only rows where the <code class="literal">id</code> column has a value greater than
+400 will be imported.</p><p>By default sqoop will use query <code class="literal">select min(&lt;split-by&gt;), max(&lt;split-by&gt;) from
+&lt;table name&gt;</code> to find out boundaries for creating splits. In some cases this query
+is not the most optimal so you can specify any arbitrary query returning two
+numeric columns using <code class="literal">--boundary-query</code> argument.</p></div><div class="section" title="7.2.3. Free-form Query Imports"><div class="titlepage"><div><div><h4 class="title"><a name="_free_form_query_imports"></a>7.2.3. Free-form Query Imports</h4></div></div></div><p>Sqoop can also import the result set of an arbitrary SQL query. Instead of
+using the <code class="literal">--table</code>, <code class="literal">--columns</code> and <code class="literal">--where</code> arguments, you can specify
+a SQL statement with the <code class="literal">--query</code> argument.</p><p>When importing a free-form query, you must specify a destination directory
+with <code class="literal">--target-dir</code>.</p><p>If you want to import the results of a query in parallel, then each map task
+will need to execute a copy of the query, with results partitioned by bounding
+conditions inferred by Sqoop. Your query must include the token <code class="literal">$CONDITIONS</code>
+which each Sqoop process will replace with a unique condition expression.
+You must also select a splitting column with <code class="literal">--split-by</code>.</p><p>For example:</p><pre class="screen">$ sqoop import \
+  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
+  --split-by a.id --target-dir /user/foo/joinresults</pre><p>Alternately, the query can be executed once and imported serially, by
+specifying a single map task with <code class="literal">-m 1</code>:</p><pre class="screen">$ sqoop import \
+  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
+  -m 1 --target-dir /user/foo/joinresults</pre><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>If you are issuing the query wrapped with double quotes ("),
+you will have to use <code class="literal">\$CONDITIONS</code> instead of just <code class="literal">$CONDITIONS</code>
+to disallow your shell from treating it as a shell variable.
+For example, a double quoted query may look like:
+<code class="literal">"SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"</code></p></td></tr></table></div><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The facility of using free-form query in the current version of Sqoop
+is limited to simple queries where there are no ambiguous projections and
+no <code class="literal">OR</code> conditions in the <code class="literal">WHERE</code> clause. Use of complex queries such as
+queries that have sub-queries or joins leading to ambiguous projections can
+lead to unexpected results.</p></td></tr></table></div></div><div class="section" title="7.2.4. Controlling Parallelism"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_parallelism"></a>7.2.4. Controlling Parallelism</h4></div></div></div><p>Sqoop imports data in parallel from most database sources. You can
+specify the number
+of map tasks (parallel processes) to use to perform the import by
+using the <code class="literal">-m</code> or <code class="literal">--num-mappers</code> argument. Each of these arguments
+takes an integer value which corresponds to the degree of parallelism
+to employ. By default, four tasks are used. Some databases may see
+improved performance by increasing this value to 8 or 16. Do not
+increase the degree of parallelism greater than that available within
+your MapReduce cluster; tasks will run serially and will likely
+increase the amount of time required to perform the import. Likewise,
+do not increase the degree of parallism higher than that which your
+database can reasonably support. Connecting 100 concurrent clients to
+your database may increase the load on the database server to a point
+where performance suffers as a result.</p><p>When performing parallel imports, Sqoop needs a criterion by which it
+can split the workload. Sqoop uses a <span class="emphasis"><em>splitting column</em></span> to split the
+workload. By default, Sqoop will identify the primary key column (if
+present) in a table and use it as the splitting column. The low and
+high values for the splitting column are retrieved from the database,
+and the map tasks operate on evenly-sized components of the total
+range. For example, if you had a table with a primary key column of
+<code class="literal">id</code> whose minimum value was 0 and maximum value was 1000, and Sqoop
+was directed to use 4 tasks, Sqoop would run four processes which each
+execute SQL statements of the form <code class="literal">SELECT * FROM sometable WHERE id
+&gt;= lo AND id &lt; hi</code>, with <code class="literal">(lo, hi)</code> set to (0, 250), (250, 500),
+(500, 750), and (750, 1001) in the different tasks.</p><p>If the actual values for the primary key are not uniformly distributed
+across its range, then this can result in unbalanced tasks. You should
+explicitly choose a different column with the <code class="literal">--split-by</code> argument.
+For example, <code class="literal">--split-by employee_id</code>. Sqoop cannot currently split on
+multi-column indices. If your table has no index column, or has a
+multi-column key, then you must also manually choose a splitting
+column.</p></div><div class="section" title="7.2.5. Controlling the Import Process"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_the_import_process"></a>7.2.5. Controlling the Import Process</h4></div></div></div><p>By default, the import process will use JDBC which provides a
+reasonable cross-vendor import channel. Some databases can perform
+imports in a more high-performance fashion by using database-specific
+data movement tools. For example, MySQL provides the <code class="literal">mysqldump</code> tool
+which can export data from MySQL to other systems very quickly. By
+supplying the <code class="literal">--direct</code> argument, you are specifying that Sqoop
+should attempt the direct import channel. This channel may be
+higher performance than using JDBC. Currently, direct mode does not
+support imports of large object columns.</p><p>When importing from PostgreSQL in conjunction with direct mode, you
+can split the import into separate files after
+individual files reach a certain size. This size limit is controlled
+with the <code class="literal">--direct-split-size</code> argument.</p><p>By default, Sqoop will import a table named <code class="literal">foo</code> to a directory named
+<code class="literal">foo</code> inside your home directory in HDFS. For example, if your
+username is <code class="literal">someuser</code>, then the import tool will write to
+<code class="literal">/user/someuser/foo/(files)</code>. You can adjust the parent directory of
+the import with the <code class="literal">--warehouse-dir</code> argument. For example:</p><pre class="screen">$ sqoop import --connnect &lt;connect-str&gt; --table foo --warehouse-dir /shared \
+    ...</pre><p>This command would write to a set of files in the <code class="literal">/shared/foo/</code> directory.</p><p>You can also explicitly choose the target directory, like so:</p><pre class="screen">$ sqoop import --connnect &lt;connect-str&gt; --table foo --target-dir /dest \
+    ...</pre><p>This will import the files into the <code class="literal">/dest</code> directory. <code class="literal">--target-dir</code> is
+incompatible with <code class="literal">--warehouse-dir</code>.</p><p>When using direct mode, you can specify additional arguments which
+should be passed to the underlying tool. If the argument
+<code class="literal">--</code> is given on the command-line, then subsequent arguments are sent
+directly to the underlying tool. For example, the following adjusts
+the character set used by <code class="literal">mysqldump</code>:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \
+    --direct -- --default-character-set=latin1</pre><p>By default, imports go to a new target location. If the destination directory
+already exists in HDFS, Sqoop will refuse to import and overwrite that
+directory&#8217;s contents. If you use the <code class="literal">--append</code> argument, Sqoop will import
+data to a temporary directory and then rename the files into the normal
+target directory in a manner that does not conflict with existing filenames
+in that directory.</p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>When using the direct mode of import, certain database client utilities
+are expected to be present in the shell path of the task process. For MySQL
+the utilities <code class="literal">mysqldump</code> and <code class="literal">mysqlimport</code> are required, whereas for
+PostgreSQL the utility <code class="literal">psql</code> is required.</p></td></tr></table></div></div><div class="section" title="7.2.6. Controlling type mapping"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_type_mapping"></a>7.2.6. Controlling type mapping</h4></div></div></div><p>Sqoop is preconfigured to map most SQL types to appropriate Java or Hive
+representatives. However the default mapping might not be suitable for
+everyone and might be overridden by <code class="literal">--map-column-java</code> (for changing
+mapping to Java) or <code class="literal">--map-column-hive</code> (for changing Hive mapping).</p><div class="table"><a name="idp6953328"></a><p class="title"><b>Table 4. Parameters for overriding mapping</b></p><div class="table-contents"><table summary="Parameters for overriding mapping" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--map-column-java &lt;mapping&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Override mapping from SQL to Java type                                  for configured columns.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--map-column-hive &lt;mapping&gt;</code>
+    </td><td style="" align="left">
+    Override mapping from SQL to Hive type                                  for configured columns.
+    </td></tr></tbody></table></div></div><br class="table-break"><p>Sqoop is expecting comma separated list of mapping in form &lt;name of column&gt;=&lt;new type&gt;. For example:</p><pre class="screen">$ sqoop import ... --map-column-java id=String,value=Integer</pre><p>Sqoop will rise exception in case that some configured mapping will not be used.</p></div><div class="section" title="7.2.7. Incremental Imports"><div class="titlepage"><div><div><h4 class="title"><a name="_incremental_imports"></a>7.2.7. Incremental Imports</h4></div></div></div><p>Sqoop provides an incremental import mode which can be used to retrieve
+only rows newer than some previously-imported set of rows.</p><p>The following arguments control incremental imports:</p><div class="table"><a name="idp6966736"></a><p class="title"><b>Table 5. Incremental import arguments:</b></p><div class="table-contents"><table summary="Incremental import arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--check-column (col)</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specifies the column to be examined                               when determining which rows to import.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--incremental (mode)</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specifies how Sqoop determines which                               rows are new. Legal values for <code class="literal">mode</code>                              include <code class="literal">append</code> and <code class="literal">lastmodified</code>.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--last-value (value)</code>
+    </td><td style="" align="left">
+    Specifies the maximum value of the                               check column from the previous import.
+    </td></tr></tbody></table></div></div><br class="table-break"><p>Sqoop supports two types of incremental imports: <code class="literal">append</code> and <code class="literal">lastmodified</code>.
+You can use the <code class="literal">--incremental</code> argument to specify the type of incremental
+import to perform.</p><p>You should specify <code class="literal">append</code> mode when importing a table where new rows are
+continually being added with increasing row id values. You specify the column
+containing the row&#8217;s id with <code class="literal">--check-column</code>. Sqoop imports rows where the
+check column has a value greater than the one specified with <code class="literal">--last-value</code>.</p><p>An alternate table update strategy supported by Sqoop is called <code class="literal">lastmodified</code>
+mode. You should use this when rows of the source table may be updated, and
+each such update will set the value of a last-modified column to the current
+timestamp.  Rows where the check column holds a timestamp more recent than the
+timestamp specified with <code class="literal">--last-value</code> are imported.</p><p>At the end of an incremental import, the value which should be specified as
+<code class="literal">--last-value</code> for a subsequent import is printed to the screen. When running
+a subsequent import, you should specify <code class="literal">--last-value</code> in this way to ensure
+you import only the new or updated data. This is handled automatically by
+creating an incremental import as a saved job, which is the preferred
+mechanism for performing a recurring incremental import. See the section on
+saved jobs later in this document for more information.</p></div><div class="section" title="7.2.8. File Formats"><div class="titlepage"><div><div><h4 class="title"><a name="_file_formats"></a>7.2.8. File Formats</h4></div></div></div><p>You can import data in one of two file formats: delimited text or
+SequenceFiles.</p><p>Delimited text is the default import format. You can also specify it
+explicitly by using the <code class="literal">--as-textfile</code> argument. This argument will write
+string-based representations of each record to the output files, with
+delimiter characters between individual columns and rows. These
+delimiters may be commas, tabs, or other characters. (The delimiters
+can be selected; see "Output line formatting arguments.") The
+following is the results of an example text-based import:</p><pre class="screen">1,here is a message,2010-05-01
+2,happy new year!,2010-01-01
+3,another message,2009-11-12</pre><p>Delimited text is appropriate for most non-binary data types. It also
+readily supports further manipulation by other tools, such as Hive.</p><p>SequenceFiles are a binary format that store individual records in
+custom record-specific data types. These data types are manifested as
+Java classes. Sqoop will automatically generate these data types for
+you. This format supports exact storage of all data in binary
+representations, and is appropriate for storing binary data
+(for example, <code class="literal">VARBINARY</code> columns), or data that will be principly
+manipulated by custom MapReduce programs (reading from SequenceFiles
+is higher-performance than reading from text files, as records do not
+need to be parsed).</p><p>Avro data files are a compact, efficient binary format that provides
+interoperability with applications written in other programming
+languages.  Avro also supports versioning, so that when, e.g., columns
+are added or removed from a table, previously imported data files can
+be processed along with new ones.</p><p>By default, data is not compressed. You can compress your data by
+using the deflate (gzip) algorithm with the <code class="literal">-z</code> or <code class="literal">--compress</code>
+argument, or specify any Hadoop compression codec using the
+<code class="literal">--compression-codec</code> argument. This applies to SequenceFile, text,
+and Avro files.</p></div><div class="section" title="7.2.9. Large Objects"><div class="titlepage"><div><div><h4 class="title"><a name="_large_objects"></a>7.2.9. Large Objects</h4></div></div></div><p>Sqoop handles large objects (<code class="literal">BLOB</code> and <code class="literal">CLOB</code> columns) in particular
+ways. If this data is truly large, then these columns should not be
+fully materialized in memory for manipulation, as most columns are.
+Instead, their data is handled in a streaming fashion. Large objects
+can be stored inline with the rest of the data, in which case they are
+fully materialized in memory on every access, or they can be stored in
+a secondary storage file linked to the primary data storage. By
+default, large objects less than 16 MB in size are stored inline with
+the rest of the data. At a larger size, they are stored in files in
+the <code class="literal">_lobs</code> subdirectory of the import target directory. These files
+are stored in a separate format optimized for large record storage,
+which can accomodate records of up to 2^63 bytes each. The size at
+which lobs spill into separate files is controlled by the
+<code class="literal">--inline-lob-limit</code> argument, which takes a parameter specifying the
+largest lob size to keep inline, in bytes. If you set the inline LOB
+limit to 0, all large objects will be placed in external
+storage.</p><div class="table"><a name="idp7006192"></a><p class="title"><b>Table 6. Output line formatting arguments:</b></p><div class="table-contents"><table summary="Output line formatting arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--enclosed-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets a required field enclosing                                    character
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--escaped-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the escape character
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--fields-terminated-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the field separator character
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--lines-terminated-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the end-of-line character
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--mysql-delimiters</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Uses MySQL&#8217;s default delimiter set:                                   fields: <code class="literal">,</code>  lines: <code class="literal">\n</code>                                    escaped-by: <code class="literal">\</code>                                    optionally-enclosed-by: <code class="literal">'</code>
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--optionally-enclosed-by &lt;char&gt;</code>
+    </td><td style="" align="left">
+    Sets a field enclosing character
+    </td></tr></tbody></table></div></div><br class="table-break"><p>When importing to delimited files, the choice of delimiter is
+important. Delimiters which appear inside string-based fields may
+cause ambiguous parsing of the imported data by subsequent analysis
+passes. For example, the string <code class="literal">"Hello, pleased to meet you"</code> should
+not be imported with the end-of-field delimiter set to a comma.</p><p>Delimiters may be specified as:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
+a character (<code class="literal">--fields-terminated-by X</code>)
+</li><li class="listitem"><p class="simpara">
+an escape character (<code class="literal">--fields-terminated-by \t</code>). Supported escape
+  characters are:
+</p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem">
+<code class="literal">\b</code> (backspace)
+</li><li class="listitem">
+<code class="literal">\n</code> (newline)
+</li><li class="listitem">
+<code class="literal">\r</code> (carriage return)
+</li><li class="listitem">
+<code class="literal">\t</code> (tab)
+</li><li class="listitem">
+<code class="literal">\"</code> (double-quote)
+</li><li class="listitem">
+<code class="literal">\\'</code> (single-quote)
+</li><li class="listitem">
+<code class="literal">\\</code> (backslash)
+</li><li class="listitem">
+<code class="literal">\0</code> (NUL) - This will insert NUL characters between fields or lines,
+  or will disable enclosing/escaping if used for one of the <code class="literal">--enclosed-by</code>,
+  <code class="literal">--optionally-enclosed-by</code>, or <code class="literal">--escaped-by</code> arguments.
+</li></ul></div></li><li class="listitem">
+The octal representation of a UTF-8 character&#8217;s code point. This
+  should be of the form <code class="literal">\0ooo</code>, where <span class="emphasis"><em>ooo</em></span> is the octal value.
+  For example, <code class="literal">--fields-terminated-by \001</code> would yield the <code class="literal">^A</code> character.
+</li><li class="listitem">
+The hexadecimal representation of a UTF-8 character&#8217;s code point. This
+  should be of the form <code class="literal">\0xhhh</code>, where <span class="emphasis"><em>hhh</em></span> is the hex value.
+  For example, <code class="literal">--fields-terminated-by \0x10</code> would yield the carriage
+  return character.
+</li></ul></div><p>The default delimiters are a comma (<code class="literal">,</code>) for fields, a newline (<code class="literal">\n</code>) for records, no quote
+character, and no escape character. Note that this can lead to
+ambiguous/unparsible records if you import database records containing
+commas or newlines in the field data. For unambiguous parsing, both must
+be enabled. For example, via <code class="literal">--mysql-delimiters</code>.</p><p>If unambiguous delimiters cannot be presented, then use <span class="emphasis"><em>enclosing</em></span> and
+<span class="emphasis"><em>escaping</em></span> characters. The combination of (optional)
+enclosing and escaping characters will allow unambiguous parsing of
+lines. For example, suppose one column of a dataset contained the
+following values:</p><pre class="screen">Some string, with a comma.
+Another "string with quotes"</pre><p>The following arguments would provide delimiters which can be
+unambiguously parsed:</p><pre class="screen">$ sqoop import --fields-terminated-by , --escaped-by \\ --enclosed-by '\"' ...</pre><p>(Note that to prevent the shell from mangling the enclosing character,
+we have enclosed that argument itself in single-quotes.)</p><p>The result of the above arguments applied to the above dataset would
+be:</p><pre class="screen">"Some string, with a comma.","1","2","3"...
+"Another \"string with quotes\"","4","5","6"...</pre><p>Here the imported strings are shown in the context of additional
+columns (<code class="literal">"1","2","3"</code>, etc.) to demonstrate the full effect of enclosing
+and escaping. The enclosing character is only strictly necessary when
+delimiter characters appear in the imported text. The enclosing
+character can therefore be specified as optional:</p><pre class="screen">$ sqoop import --optionally-enclosed-by '\"' (the rest as above)...</pre><p>Which would result in the following import:</p><pre class="screen">"Some string, with a comma.",1,2,3...
+"Another \"string with quotes\"",4,5,6...</pre><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>Even though Hive supports escaping characters, it does not
+handle escaping of new-line character. Also, it does not support
+the notion of enclosing characters that may include field delimiters
+in the enclosed string.  It is therefore recommended that you choose
+unambiguous field and record-terminating delimiters without the help
+of escaping and enclosing characters when working with Hive; this is
+due to limitations of Hive&#8217;s input parsing abilities.</p></td></tr></table></div><p>The <code class="literal">--mysql-delimiters</code> argument is a shorthand argument which uses
+the default delimiters for the <code class="literal">mysqldump</code> program.
+If you use the <code class="literal">mysqldump</code> delimiters in conjunction with a
+direct-mode import (with <code class="literal">--direct</code>), very fast imports can be
+achieved.</p><p>While the choice of delimiters is most important for a text-mode
+import, it is still relevant if you import to SequenceFiles with
+<code class="literal">--as-sequencefile</code>. The generated class' <code class="literal">toString()</code> method
+will use the delimiters you specify, so subsequent formatting of
+the output data will rely on the delimiters you choose.</p><div class="table"><a name="idp7068784"></a><p class="title"><b>Table 7. Input parsing arguments:</b></p><div class="table-contents"><table summary="Input parsing arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--input-enclosed-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets a required field encloser
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--input-escaped-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the input escape                                          character
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--input-fields-terminated-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the input field separator
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--input-lines-terminated-by &lt;char&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the input end-of-line                                          character
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--input-optionally-enclosed-by &lt;char&gt;</code>
+    </td><td style="" align="left">
+    Sets a field enclosing                                          character
+    </td></tr></tbody></table></div></div><br class="table-break"><p>When Sqoop imports data to HDFS, it generates a Java class which can
+reinterpret the text files that it creates when doing a
+delimited-format import. The delimiters are chosen with arguments such
+as <code class="literal">--fields-terminated-by</code>; this controls both how the data is
+written to disk, and how the generated <code class="literal">parse()</code> method reinterprets
+this data. The delimiters used by the <code class="literal">parse()</code> method can be chosen
+independently of the output arguments, by using
+<code class="literal">--input-fields-terminated-by</code>, and so on. This is useful, for example, to
+generate classes which can parse records created with one set of
+delimiters, and emit the records to a different set of files using a
+separate set of delimiters.</p><div class="table"><a name="idp7088240"></a><p class="title"><b>Table 8. Hive arguments:</b></p><div class="table-contents"><table summary="Hive arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--hive-home &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Override <code class="literal">$HIVE_HOME</code>
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--hive-import</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Import tables into Hive (Uses Hive&#8217;s                               default delimiters if none are set.)
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--hive-overwrite</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Overwrite existing data in the Hive table.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--create-hive-table</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    If set, then the job will fail if the target hive
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    table exits. By default this property is false.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--hive-table &lt;table-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Sets the table name to use when importing                              to Hive.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--hive-drop-import-delims</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Drops <span class="emphasis"><em>\n</em></span>, <span class="emphasis"><em>\r</em></span>, and <span class="emphasis"><em>\01</em></span> from string                              fields when importing to Hive.

[... 1905 lines stripped ...]


Mime
View raw message