sqoop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mau...@apache.org
Subject svn commit: r1822913 [2/17] - in /sqoop/site/trunk: ./ content/ content/resources/docs/1.4.7/ content/resources/docs/1.4.7/api/ content/resources/docs/1.4.7/api/com/ content/resources/docs/1.4.7/api/com/cloudera/ content/resources/docs/1.4.7/api/com/cl...
Date Thu, 01 Feb 2018 23:36:11 GMT
Added: sqoop/site/trunk/content/resources/docs/1.4.7/SqoopUserGuide.html
URL: http://svn.apache.org/viewvc/sqoop/site/trunk/content/resources/docs/1.4.7/SqoopUserGuide.html?rev=1822913&view=auto
==============================================================================
--- sqoop/site/trunk/content/resources/docs/1.4.7/SqoopUserGuide.html (added)
+++ sqoop/site/trunk/content/resources/docs/1.4.7/SqoopUserGuide.html Thu Feb  1 23:36:09 2018
@@ -0,0 +1,4656 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Sqoop User Guide (v1.4.7)</title><link rel="stylesheet" type="text/css" href="docbook.css"><meta name="generator" content="DocBook XSL Stylesheets V1.79.1"></head><body><div style="clear:both; margin-bottom: 4px"></div><div align="center"><a href="index.html"><img src="images/home.png" alt="Documentation Home"></a></div><span class="breadcrumbs"><div class="breadcrumbs"><span class="breadcrumb-node">Sqoop User Guide (v1.4.7)</span></div></span><div lang="en" class="article"><div class="titlepage"><div><div><h2 class="title"><a name="idp774385284144"></a>Sqoop User Guide (v1.4.7)</h2></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="section"><a href="#_introduction">1. Introduction</a></span></dt><dt><span class="section"><a href="#_supported_releases">2. Supported Releases</a></span></dt><dt><span class="section"><a href="#_sqoop_releases">3
 . Sqoop Releases</a></span></dt><dt><span class="section"><a href="#_prerequisites">4. Prerequisites</a></span></dt><dt><span class="section"><a href="#_basic_usage">5. Basic Usage</a></span></dt><dt><span class="section"><a href="#_sqoop_tools">6. Sqoop Tools</a></span></dt><dd><dl><dt><span class="section"><a href="#_using_command_aliases">6.1. Using Command Aliases</a></span></dt><dt><span class="section"><a href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop Installation</a></span></dt><dt><span class="section"><a href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific Arguments</a></span></dt><dt><span class="section"><a href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. Using Tools</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_literal">7. <code class="literal">sqoop-import</code></a></span><
 /dt><dd><dl><dt><span class="section"><a href="#_purpose">7.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_distributed_cache">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_transaction_isolation">7.2.7. Controlling transaction isolation</a></span></dt><dt
 ><span class="section"><a href="#_controlling_type_mapping">7.2.8. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_schema_name_handling">7.2.9. Schema name handling</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.10. Incremental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.11. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.12. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.13. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.14. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_importing_data_into_accumulo">7.2.15. Importing Data Into Accumulo</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.16. Additional Import Configuration Properties</a></span></dt></dl></dd><dt><span c
 lass="section"><a href="#_example_invocations">7.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_all_tables_literal">8. <code class="literal">sqoop-import-all-tables</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_2">8.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_2">8.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_2">8.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_mainframe_literal">9. <code class="literal">sqoop-import-mainframe</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_3">9.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_3">9.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_mainframe">9.2.1. Connecting to a Mainframe</a></span></dt><dt><span class="section"><a href="#_selecting_the_files_t
 o_import">9.2.2. Selecting the Files to Import</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism_2">9.2.3. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_distributed_cache_2">9.2.4. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process_2">9.2.5. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_file_formats_2">9.2.6. File Formats</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive_2">9.2.7. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase_2">9.2.8. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_importing_data_into_accumulo_2">9.2.9. Importing Data Into Accumulo</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties_2">9.2.10. Additional Import Configuration Properties</a></span><
 /dt></dl></dd><dt><span class="section"><a href="#_example_invocations_3">9.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_export_literal">10. <code class="literal">sqoop-export</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_4">10.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_4">10.2. Syntax</a></span></dt><dt><span class="section"><a href="#_inserts_vs_updates">10.3. Inserts vs. Updates</a></span></dt><dt><span class="section"><a href="#_exports_and_transactions">10.4. Exports and Transactions</a></span></dt><dt><span class="section"><a href="#_failed_exports">10.5. Failed Exports</a></span></dt><dt><span class="section"><a href="#_example_invocations_4">10.6. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#validation">11. <code class="literal">validation</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_5">11.1. Purpose</
 a></span></dt><dt><span class="section"><a href="#_introduction_2">11.2. Introduction</a></span></dt><dt><span class="section"><a href="#_syntax_5">11.3. Syntax</a></span></dt><dt><span class="section"><a href="#_configuration">11.4. Configuration</a></span></dt><dt><span class="section"><a href="#_limitations">11.5. Limitations</a></span></dt><dt><span class="section"><a href="#_example_invocations_5">11.6. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_saved_jobs">12. Saved Jobs</a></span></dt><dt><span class="section"><a href="#_literal_sqoop_job_literal">13. <code class="literal">sqoop-job</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_6">13.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_6">13.2. Syntax</a></span></dt><dt><span class="section"><a href="#_saved_jobs_and_passwords">13.3. Saved jobs and passwords</a></span></dt><dt><span class="section"><a href="#_saved_jobs_and_incremental_import
 s">13.4. Saved jobs and incremental imports</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_metastore_literal">14. <code class="literal">sqoop-metastore</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_7">14.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_7">14.2. Syntax</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_merge_literal">15. <code class="literal">sqoop-merge</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_8">15.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_8">15.2. Syntax</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_codegen_literal">16. <code class="literal">sqoop-codegen</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_9">16.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_9">16.2. Syntax</a></span></dt><dt><span class="section"><a href="#_e
 xample_invocations_6">16.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_create_hive_table_literal">17. <code class="literal">sqoop-create-hive-table</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_10">17.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_10">17.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_7">17.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_eval_literal">18. <code class="literal">sqoop-eval</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_11">18.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_11">18.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_8">18.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_list_databases_literal">19. <code class="literal">sqo
 op-list-databases</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_12">19.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_12">19.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_9">19.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_list_tables_literal">20. <code class="literal">sqoop-list-tables</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_13">20.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_13">20.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_10">20.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_help_literal">21. <code class="literal">sqoop-help</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_14">21.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_14">21.2. Syntax</
 a></span></dt><dt><span class="section"><a href="#_example_invocations_11">21.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_version_literal">22. <code class="literal">sqoop-version</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_15">22.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_15">22.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_12">22.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_sqoop_hcatalog_integration">23. Sqoop-HCatalog Integration</a></span></dt><dd><dl><dt><span class="section"><a href="#_hcatalog_background">23.1. HCatalog Background</a></span></dt><dt><span class="section"><a href="#_exposing_hcatalog_tables_to_sqoop">23.2. Exposing HCatalog Tables to Sqoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_new_command_line_options">23.2.1. New Command Line Options</a></span></dt><dt
 ><span class="section"><a href="#_supported_sqoop_hive_options">23.2.2. Supported Sqoop Hive Options</a></span></dt><dt><span class="section"><a href="#_direct_mode_support">23.2.3. Direct Mode support</a></span></dt><dt><span class="section"><a href="#_unsupported_sqoop_options">23.2.4. Unsupported Sqoop Options</a></span></dt><dd><dl><dt><span class="section"><a href="#_unsupported_sqoop_hive_import_options">23.2.4.1. Unsupported Sqoop Hive Import Options</a></span></dt><dt><span class="section"><a href="#_unsupported_sqoop_export_and_import_options">23.2.4.2. Unsupported Sqoop Export and Import Options</a></span></dt></dl></dd><dt><span class="section"><a href="#_ignored_sqoop_options">23.2.5. Ignored Sqoop Options</a></span></dt></dl></dd><dt><span class="section"><a href="#_automatic_table_creation">23.3. Automatic Table Creation</a></span></dt><dt><span class="section"><a href="#_delimited_text_formats_and_field_and_line_delimiter_characters">23.4. Delimited Text Formats and F
 ield and Line Delimiter Characters</a></span></dt><dt><span class="section"><a href="#_hcatalog_table_requirements">23.5. HCatalog Table Requirements</a></span></dt><dt><span class="section"><a href="#_support_for_partitioning">23.6. Support for Partitioning</a></span></dt><dt><span class="section"><a href="#_schema_mapping">23.7. Schema Mapping</a></span></dt><dt><span class="section"><a href="#_support_for_hcatalog_data_types">23.8. Support for HCatalog Data Types</a></span></dt><dt><span class="section"><a href="#_providing_hive_and_hcatalog_libraries_for_the_sqoop_job">23.9. Providing Hive and HCatalog Libraries for the Sqoop Job</a></span></dt><dt><span class="section"><a href="#_examples">23.10. Examples</a></span></dt><dt><span class="section"><a href="#_import">23.11. Import</a></span></dt><dt><span class="section"><a href="#_export">23.12. Export</a></span></dt></dl></dd><dt><span class="section"><a href="#_compatibility_notes">24. Compatibility Notes</a></span></dt><dd><dl
 ><dt><span class="section"><a href="#_supported_databases">24.1. Supported Databases</a></span></dt><dt><span class="section"><a href="#_mysql">24.2. MySQL</a></span></dt><dd><dl><dt><span class="section"><a href="#_zerodatetimebehavior">24.2.1. zeroDateTimeBehavior</a></span></dt><dt><span class="section"><a href="#_literal_unsigned_literal_columns">24.2.2. <code class="literal">UNSIGNED</code> columns</a></span></dt><dt><span class="section"><a href="#_literal_blob_literal_and_literal_clob_literal_columns">24.2.3. <code class="literal">BLOB</code> and <code class="literal">CLOB</code> columns</a></span></dt><dt><span class="section"><a href="#_importing_views_in_direct_mode">24.2.4. Importing views in direct mode</a></span></dt></dl></dd><dt><span class="section"><a href="#_postgresql">24.3. PostgreSQL</a></span></dt><dd><dl><dt><span class="section"><a href="#_importing_views_in_direct_mode_2">24.3.1. Importing views in direct mode</a></span></dt></dl></dd><dt><span class="sectio
 n"><a href="#_oracle">24.4. Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_dates_and_times">24.4.1. Dates and Times</a></span></dt></dl></dd><dt><span class="section"><a href="#_schema_definition_in_hive">24.5. Schema Definition in Hive</a></span></dt><dt><span class="section"><a href="#_parquet_support_in_hive">24.6. Parquet Support in Hive</a></span></dt><dt><span class="section"><a href="#_cubrid">24.7. CUBRID</a></span></dt></dl></dd><dt><span class="section"><a href="#connectors">25. Notes for specific connectors</a></span></dt><dd><dl><dt><span class="section"><a href="#_mysql_jdbc_connector">25.1. MySQL JDBC Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_upsert_functionality">25.1.1. Upsert functionality</a></span></dt></dl></dd><dt><span class="section"><a href="#_mysql_direct_connector">25.2. MySQL Direct Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_requirements">25.2.1. Requirements</a></span></dt><dt><spa
 n class="section"><a href="#_limitations_2">25.2.2. Limitations</a></span></dt><dt><span class="section"><a href="#_direct_mode_transactions">25.2.3. Direct-mode Transactions</a></span></dt></dl></dd><dt><span class="section"><a href="#_microsoft_sql_connector">25.3. Microsoft SQL Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_extra_arguments">25.3.1. Extra arguments</a></span></dt><dt><span class="section"><a href="#_allow_identity_inserts">25.3.2. Allow identity inserts</a></span></dt><dt><span class="section"><a href="#_non_resilient_operations">25.3.3. Non-resilient operations</a></span></dt><dt><span class="section"><a href="#_schema_support">25.3.4. Schema support</a></span></dt><dt><span class="section"><a href="#_table_hints">25.3.5. Table hints</a></span></dt></dl></dd><dt><span class="section"><a href="#_postgresql_connector">25.4. PostgreSQL Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_extra_arguments_2">25.4.1. Extra argume
 nts</a></span></dt><dt><span class="section"><a href="#_schema_support_2">25.4.2. Schema support</a></span></dt></dl></dd><dt><span class="section"><a href="#_postgresql_direct_connector">25.5. PostgreSQL Direct Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_requirements_2">25.5.1. Requirements</a></span></dt><dt><span class="section"><a href="#_limitations_3">25.5.2. Limitations</a></span></dt></dl></dd><dt><span class="section"><a href="#_pg_bulkload_connector">25.6. pg_bulkload connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_16">25.6.1. Purpose</a></span></dt><dt><span class="section"><a href="#_requirements_3">25.6.2. Requirements</a></span></dt><dt><span class="section"><a href="#_syntax_16">25.6.3. Syntax</a></span></dt><dt><span class="section"><a href="#_data_staging">25.6.4. Data Staging</a></span></dt></dl></dd><dt><span class="section"><a href="#_netezza_connector">25.7. Netezza Connector</a></span></dt><dd><dl><dt><spa
 n class="section"><a href="#_extra_arguments_3">25.7.1. Extra arguments</a></span></dt><dt><span class="section"><a href="#_direct_mode">25.7.2. Direct Mode</a></span></dt><dt><span class="section"><a href="#_null_string_handling">25.7.3. Null string handling</a></span></dt></dl></dd><dt><span class="section"><a href="#_data_connector_for_oracle_and_hadoop">25.8. Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_about">25.8.1. About</a></span></dt><dd><dl><dt><span class="section"><a href="#_jobs">25.8.1.1. Jobs</a></span></dt><dt><span class="section"><a href="#_how_the_standard_oracle_manager_works_for_imports">25.8.1.2. How The Standard Oracle Manager Works for Imports</a></span></dt><dt><span class="section"><a href="#_how_the_data_connector_for_oracle_and_hadoop_works_for_imports">25.8.1.3. How The Data Connector for Oracle and Hadoop Works for Imports</a></span></dt><dt><span class="section"><a href="#_data_connector_for_oracle_an
 d_hadoop_exports">25.8.1.4. Data Connector for Oracle and Hadoop Exports</a></span></dt></dl></dd><dt><span class="section"><a href="#_requirements_4">25.8.2. Requirements</a></span></dt><dd><dl><dt><span class="section"><a href="#_ensure_the_oracle_database_jdbc_driver_is_setup_correctly">25.8.2.1. Ensure The Oracle Database JDBC Driver Is Setup Correctly</a></span></dt><dt><span class="section"><a href="#_oracle_roles_and_privileges">25.8.2.2. Oracle Roles and Privileges</a></span></dt><dt><span class="section"><a href="#_additional_oracle_roles_and_privileges_required_for_export">25.8.2.3. Additional Oracle Roles And Privileges Required for Export</a></span></dt><dt><span class="section"><a href="#_supported_data_types">25.8.2.4. Supported Data Types</a></span></dt></dl></dd><dt><span class="section"><a href="#_execute_sqoop_with_data_connector_for_oracle_and_hadoop">25.8.3. Execute Sqoop With Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><
 a href="#_connect_to_oracle_oracle_rac">25.8.3.1. Connect to Oracle / Oracle RAC</a></span></dt><dt><span class="section"><a href="#_connect_to_an_oracle_database_instance">25.8.3.2. Connect to An Oracle Database Instance</a></span></dt><dt><span class="section"><a href="#_connect_to_an_oracle_rac">25.8.3.3. Connect to An Oracle RAC</a></span></dt><dt><span class="section"><a href="#_login_to_the_oracle_instance">25.8.3.4. Login to The Oracle Instance</a></span></dt><dt><span class="section"><a href="#_kill_data_connector_for_oracle_and_hadoop_jobs">25.8.3.5. Kill Data Connector for Oracle and Hadoop Jobs</a></span></dt></dl></dd><dt><span class="section"><a href="#_import_data_from_oracle">25.8.4. Import Data from Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_match_hadoop_files_to_oracle_table_partitions">25.8.4.1. Match Hadoop Files to Oracle Table Partitions</a></span></dt><dt><span class="section"><a href="#_specify_the_partitions_to_import">25.8.4.2. Specif
 y The Partitions To Import</a></span></dt><dt><span class="section"><a href="#_consistent_read_all_mappers_read_from_the_same_point_in_time">25.8.4.3. Consistent Read: All Mappers Read From The Same Point In Time</a></span></dt></dl></dd><dt><span class="section"><a href="#_export_data_into_oracle">25.8.5. Export Data into Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_insert_export">25.8.5.1. Insert-Export</a></span></dt><dt><span class="section"><a href="#_update_export">25.8.5.2. Update-Export</a></span></dt><dt><span class="section"><a href="#_merge_export">25.8.5.3. Merge-Export</a></span></dt><dt><span class="section"><a href="#_create_oracle_tables">25.8.5.4. Create Oracle Tables</a></span></dt><dt><span class="section"><a href="#_nologging">25.8.5.5. NOLOGGING</a></span></dt><dt><span class="section"><a href="#_partitioning">25.8.5.6. Partitioning</a></span></dt><dt><span class="section"><a href="#_match_rows_via_multiple_columns">25.8.5.7. Match Rows Via
  Multiple Columns</a></span></dt><dt><span class="section"><a href="#_storage_clauses">25.8.5.8. Storage Clauses</a></span></dt></dl></dd><dt><span class="section"><a href="#_manage_date_and_timestamp_data_types">25.8.6. Manage Date And Timestamp Data Types</a></span></dt><dd><dl><dt><span class="section"><a href="#_import_date_and_timestamp_data_types_from_oracle">25.8.6.1. Import Date And Timestamp Data Types from Oracle</a></span></dt><dt><span class="section"><a href="#_the_data_connector_for_oracle_and_hadoop_does_not_apply_a_time_zone_to_date_timestamp_data_types">25.8.6.2. The Data Connector for Oracle and Hadoop Does Not Apply A Time Zone to DATE / TIMESTAMP Data Types</a></span></dt><dt><span class="section"><a href="#_the_data_connector_for_oracle_and_hadoop_retains_time_zone_information_in_timezone_data_types">25.8.6.3. The Data Connector for Oracle and Hadoop Retains Time Zone Information in TIMEZONE Data Types</a></span></dt><dt><span class="section"><a href="#_data_con
 nector_for_oracle_and_hadoop_explicitly_states_time_zone_for_local_timezone_data_types">25.8.6.4. Data Connector for Oracle and Hadoop Explicitly States Time Zone for LOCAL TIMEZONE Data Types</a></span></dt><dt><span class="section"><a href="#_java_sql_timestamp">25.8.6.5. java.sql.Timestamp</a></span></dt><dt><span class="section"><a href="#_export_date_and_timestamp_data_types_into_oracle">25.8.6.6. Export Date And Timestamp Data Types into Oracle</a></span></dt></dl></dd><dt><span class="section"><a href="#_configure_the_data_connector_for_oracle_and_hadoop">25.8.7. Configure The Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_oraoop_site_template_xml">25.8.7.1. oraoop-site-template.xml</a></span></dt><dt><span class="section"><a href="#_oraoop_oracle_session_initialization_statements">25.8.7.2. oraoop.oracle.session.initialization.statements</a></span></dt><dt><span class="section"><a href="#_oraoop_table_import_where_clause_loca
 tion">25.8.7.3. oraoop.table.import.where.clause.location</a></span></dt><dt><span class="section"><a href="#_oracle_row_fetch_size">25.8.7.4. oracle.row.fetch.size</a></span></dt><dt><span class="section"><a href="#_oraoop_import_hint">25.8.7.5. oraoop.import.hint</a></span></dt><dt><span class="section"><a href="#_oraoop_oracle_append_values_hint_usage">25.8.7.6. oraoop.oracle.append.values.hint.usage</a></span></dt><dt><span class="section"><a href="#_mapred_map_tasks_speculative_execution">25.8.7.7. mapred.map.tasks.speculative.execution</a></span></dt><dt><span class="section"><a href="#_oraoop_block_allocation">25.8.7.8. oraoop.block.allocation</a></span></dt><dt><span class="section"><a href="#_oraoop_import_omit_lobs_and_long">25.8.7.9. oraoop.import.omit.lobs.and.long</a></span></dt><dt><span class="section"><a href="#_oraoop_locations">25.8.7.10. oraoop.locations</a></span></dt><dt><span class="section"><a href="#_sqoop_connection_factories">25.8.7.11. sqoop.connection.fac
 tories</a></span></dt><dt><span class="section"><a href="#_expressions_in_oraoop_site_xml">25.8.7.12. Expressions in oraoop-site.xml</a></span></dt></dl></dd><dt><span class="section"><a href="#_troubleshooting_the_data_connector_for_oracle_and_hadoop">25.8.8. Troubleshooting The Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_quote_oracle_owners_and_tables">25.8.8.1. Quote Oracle Owners And Tables</a></span></dt><dt><span class="section"><a href="#_quote_oracle_columns">25.8.8.2. Quote Oracle Columns</a></span></dt><dt><span class="section"><a href="#_confirm_the_data_connector_for_oracle_and_hadoop_can_initialize_the_oracle_session">25.8.8.3. Confirm The Data Connector for Oracle and Hadoop Can Initialize The Oracle Session</a></span></dt><dt><span class="section"><a href="#_check_the_sqoop_debug_logs_for_error_messages">25.8.8.4. Check The Sqoop Debug Logs for Error Messages</a></span></dt><dt><span class="section"><a href="#_expor
 t_check_tables_are_compatible">25.8.8.5. Export: Check Tables Are Compatible</a></span></dt><dt><span class="section"><a href="#_export_parallelization">25.8.8.6. Export: Parallelization</a></span></dt><dt><span class="section"><a href="#_export_check_oraoop_oracle_append_values_hint_usage">25.8.8.7. Export: Check oraoop.oracle.append.values.hint.usage</a></span></dt><dt><span class="section"><a href="#_turn_on_verbose">25.8.8.8. Turn On Verbose</a></span></dt></dl></dd></dl></dd></dl></dd><dt><span class="section"><a href="#_getting_support">26. Getting Support</a></span></dt><dt><span class="section"><a href="#_troubleshooting">27. Troubleshooting</a></span></dt><dd><dl><dt><span class="section"><a href="#_general_troubleshooting_process">27.1. General Troubleshooting Process</a></span></dt><dt><span class="section"><a href="#_specific_troubleshooting_tips">27.2. Specific Troubleshooting Tips</a></span></dt><dd><dl><dt><span class="section"><a href="#_oracle_connection_reset_error
 s">27.2.1. Oracle: Connection Reset Errors</a></span></dt><dt><span class="section"><a href="#_oracle_case_sensitive_catalog_query_errors">27.2.2. Oracle: Case-Sensitive Catalog Query Errors</a></span></dt><dt><span class="section"><a href="#_mysql_connection_failure">27.2.3. MySQL: Connection Failure</a></span></dt><dt><span class="section"><a href="#_oracle_ora_00933_error_sql_command_not_properly_ended">27.2.4. Oracle: ORA-00933 error (SQL command not properly ended)</a></span></dt><dt><span class="section"><a href="#_mysql_import_of_tinyint_1_from_mysql_behaves_strangely">27.2.5. MySQL: Import of TINYINT(1) from MySQL behaves strangely</a></span></dt></dl></dd></dl></dd></dl></div><pre class="screen">  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.</pre><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_introduction"></a>1. Introduction</h2></div></div></div><p>Sqoop is a tool designed to transfer data between Hadoop and
+relational databases or mainframes. You can use Sqoop to import data from a
+relational database management system (RDBMS) such as MySQL or Oracle or a
+mainframe into the Hadoop Distributed File System (HDFS),
+transform the data in Hadoop MapReduce, and then export the data back
+into an RDBMS.</p><p>Sqoop automates most of this process, relying on the database to
+describe the schema for the data to be imported. Sqoop uses MapReduce
+to import and export the data, which provides parallel operation as
+well as fault tolerance.</p><p>This document describes how to get started using Sqoop to move data
+between databases and Hadoop or mainframe to Hadoop and provides reference
+information for the operation of the Sqoop command-line tool suite. This
+document is intended for:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
+System and application programmers
+</li><li class="listitem">
+System administrators
+</li><li class="listitem">
+Database administrators
+</li><li class="listitem">
+Data analysts
+</li><li class="listitem">
+Data engineers
+</li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_supported_releases"></a>2. Supported Releases</h2></div></div></div><p>This documentation applies to Sqoop v1.4.7.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_sqoop_releases"></a>3. Sqoop Releases</h2></div></div></div><p>Sqoop is an open source software product of the Apache Software Foundation.</p><p>Software development for Sqoop occurs at <a class="ulink" href="http://sqoop.apache.org" target="_top">http://sqoop.apache.org</a>
+At that site you can obtain:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
+New releases of Sqoop as well as its most recent source code
+</li><li class="listitem">
+An issue tracker
+</li><li class="listitem">
+A wiki that contains Sqoop documentation
+</li></ul></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_prerequisites"></a>4. Prerequisites</h2></div></div></div><p>The following prerequisite knowledge is required for this product:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
+Basic computer technology and terminology
+</li><li class="listitem">
+Familiarity with command-line interfaces such as <code class="literal">bash</code>
+</li><li class="listitem">
+Relational database management systems
+</li><li class="listitem">
+Basic familiarity with the purpose and operation of Hadoop
+</li></ul></div><p>Before you can use Sqoop, a release of Hadoop must be installed and
+configured. Sqoop is currently supporting 4 major Hadoop releases - 0.20,
+0.23, 1.0 and 2.0.</p><p>This document assumes you are using a Linux or Linux-like environment.
+If you are using Windows, you may be able to use cygwin to accomplish
+most of the following tasks. If you are using Mac OS X, you should see
+few (if any) compatibility errors. Sqoop is predominantly operated and
+tested on Linux.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_basic_usage"></a>5. Basic Usage</h2></div></div></div><p>With Sqoop, you can <span class="emphasis"><em>import</em></span> data from a relational database system or a
+mainframe into HDFS. The input to the import process is either database table
+or mainframe datasets. For databases, Sqoop will read the table row-by-row
+into HDFS.  For mainframe datasets, Sqoop will read records from each mainframe
+dataset into HDFS. The output of this import process is a set of files
+containing a copy of the imported table or datasets.
+The import process is performed in parallel. For this reason, the
+output will be in multiple files. These files may be delimited text
+files (for example, with commas or tabs separating each field), or
+binary Avro or SequenceFiles containing serialized record data.</p><p>A by-product of the import process is a generated Java class which
+can encapsulate one row of the imported table. This class is used
+during the import process by Sqoop itself. The Java source code for
+this class is also provided to you, for use in subsequent MapReduce
+processing of the data. This class can serialize and deserialize data
+to and from the SequenceFile format. It can also parse the
+delimited-text form of a record. These abilities allow you to quickly
+develop MapReduce applications that use the HDFS-stored records in
+your processing pipeline. You are also free to parse the delimiteds
+record data yourself, using any other tools you prefer.</p><p>After manipulating the imported records (for example, with MapReduce
+or Hive) you may have a result data set which you can then <span class="emphasis"><em>export</em></span>
+back to the relational database. Sqoop&#8217;s export process will read
+a set of delimited text files from HDFS in parallel, parse them into
+records, and insert them as new rows in a target database table, for
+consumption by external applications or users.</p><p>Sqoop includes some other commands which allow you to inspect the
+database you are working with. For example, you can list the available
+database schemas (with the <code class="literal">sqoop-list-databases</code> tool) and tables
+within a schema (with the <code class="literal">sqoop-list-tables</code> tool). Sqoop also
+includes a primitive SQL execution shell (the <code class="literal">sqoop-eval</code> tool).</p><p>Most aspects of the import, code generation, and export processes can
+be customized. For databases, you can control the specific row range or
+columns imported.  You can specify particular delimiters and escape characters
+for the file-based representation of the data, as well as the file format
+used.  You can also control the class or package names used in
+generated code. Subsequent sections of this document explain how to
+specify these and other arguments to Sqoop.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_sqoop_tools"></a>6. Sqoop Tools</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="section"><a href="#_using_command_aliases">6.1. Using Command Aliases</a></span></dt><dt><span class="section"><a href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop Installation</a></span></dt><dt><span class="section"><a href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific Arguments</a></span></dt><dt><span class="section"><a href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. Using Tools</a></span></dt></dl></div><p>Sqoop is a collection of related tools. To use Sqoop, you specify the
+tool you want to use and the arguments that control the tool.</p><p>If Sqoop is compiled from its own source, you can run Sqoop without a formal
+installation process by running the <code class="literal">bin/sqoop</code> program. Users
+of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop)
+will see this program installed as <code class="literal">/usr/bin/sqoop</code>. The remainder of this
+documentation will refer to this program as <code class="literal">sqoop</code>. For example:</p><pre class="screen">$ sqoop tool-name [tool-arguments]</pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The following examples that begin with a <code class="literal">$</code> character indicate
+that the commands must be entered at a terminal prompt (such as
+<code class="literal">bash</code>). The <code class="literal">$</code> character represents the prompt itself; you should
+not start these commands by typing a <code class="literal">$</code>. You can also enter commands
+inline in the text of a paragraph; for example, <code class="literal">sqoop help</code>. These
+examples do not show a <code class="literal">$</code> prefix, but you should enter them the same
+way.  Don&#8217;t confuse the <code class="literal">$</code> shell prompt in the examples with the <code class="literal">$</code>
+that precedes an environment variable name. For example, the string
+literal <code class="literal">$HADOOP_HOME</code> includes a "<code class="literal">$</code>".</p></td></tr></table></div><p>Sqoop ships with a help tool. To display a list of all available
+tools, type the following command:</p><pre class="screen">$ sqoop help
+usage: sqoop COMMAND [ARGS]
+
+Available commands:
+  codegen            Generate code to interact with database records
+  create-hive-table  Import a table definition into Hive
+  eval               Evaluate a SQL statement and display the results
+  export             Export an HDFS directory to a database table
+  help               List available commands
+  import             Import a table from a database to HDFS
+  import-all-tables  Import tables from a database to HDFS
+  import-mainframe   Import mainframe datasets to HDFS
+  list-databases     List available databases on a server
+  list-tables        List available tables in a database
+  version            Display version information
+
+See 'sqoop help COMMAND' for information on a specific command.</pre><p>You can display help for a specific tool by entering: <code class="literal">sqoop help
+(tool-name)</code>; for example, <code class="literal">sqoop help import</code>.</p><p>You can also add the <code class="literal">--help</code> argument to any command: <code class="literal">sqoop import
+--help</code>.</p><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_using_command_aliases"></a>6.1. Using Command Aliases</h3></div></div></div><p>In addition to typing the <code class="literal">sqoop (toolname)</code> syntax, you can use alias
+scripts that specify the <code class="literal">sqoop-(toolname)</code> syntax. For example, the
+scripts <code class="literal">sqoop-import</code>, <code class="literal">sqoop-export</code>, etc. each select a specific
+tool.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_controlling_the_hadoop_installation"></a>6.2. Controlling the Hadoop Installation</h3></div></div></div><p>You invoke Sqoop through the program launch capability provided by
+Hadoop. The <code class="literal">sqoop</code> command-line program is a wrapper which runs the
+<code class="literal">bin/hadoop</code> script shipped with Hadoop. If you have multiple
+installations of Hadoop present on your machine, you can select the
+Hadoop installation by setting the <code class="literal">$HADOOP_COMMON_HOME</code> and
+<code class="literal">$HADOOP_MAPRED_HOME</code> environment variables.</p><p>For example:</p><pre class="screen">$ HADOOP_COMMON_HOME=/path/to/some/hadoop \
+  HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
+  sqoop import --arguments...</pre><p>or:</p><pre class="screen">$ export HADOOP_COMMON_HOME=/some/path/to/hadoop
+$ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce
+$ sqoop import --arguments...</pre><p>If either of these variables are not set, Sqoop will fall back to
+<code class="literal">$HADOOP_HOME</code>. If it is not set either, Sqoop will use the default
+installation locations for Apache Bigtop, <code class="literal">/usr/lib/hadoop</code> and
+<code class="literal">/usr/lib/hadoop-mapreduce</code>, respectively.</p><p>The active Hadoop configuration is loaded from <code class="literal">$HADOOP_HOME/conf/</code>,
+unless the <code class="literal">$HADOOP_CONF_DIR</code> environment variable is set.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_using_generic_and_specific_arguments"></a>6.3. Using Generic and Specific Arguments</h3></div></div></div><p>To control the operation of each Sqoop tool, you use generic and
+specific arguments.</p><p>For example:</p><pre class="screen">$ sqoop help import
+usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
+
+Common arguments:
+   --connect &lt;jdbc-uri&gt;     Specify JDBC connect string
+   --connect-manager &lt;class-name&gt;     Specify connection manager class to use
+   --driver &lt;class-name&gt;    Manually specify JDBC driver class to use
+   --hadoop-mapred-home &lt;dir&gt;      Override $HADOOP_MAPRED_HOME
+   --help                   Print usage instructions
+   --password-file          Set path for file containing authentication password
+   -P                       Read password from console
+   --password &lt;password&gt;    Set authentication password
+   --username &lt;username&gt;    Set authentication username
+   --verbose                Print more information while working
+   --hadoop-home &lt;dir&gt;     Deprecated. Override $HADOOP_HOME
+
+[...]
+
+Generic Hadoop command-line arguments:
+(must preceed any tool-specific arguments)
+Generic options supported are
+-conf &lt;configuration file&gt;     specify an application configuration file
+-D &lt;property=value&gt;            use value for given property
+-fs &lt;local|namenode:port&gt;      specify a namenode
+-jt &lt;local|jobtracker:port&gt;    specify a job tracker
+-files &lt;comma separated list of files&gt;    specify comma separated files to be copied to the map reduce cluster
+-libjars &lt;comma separated list of jars&gt;    specify comma separated jar files to include in the classpath.
+-archives &lt;comma separated list of archives&gt;    specify comma separated archives to be unarchived on the compute machines.
+
+The general command line syntax is
+bin/hadoop command [genericOptions] [commandOptions]</pre><p>You must supply the generic arguments <code class="literal">-conf</code>, <code class="literal">-D</code>, and so on after the
+tool name but <span class="strong"><strong>before</strong></span> any tool-specific arguments (such as
+<code class="literal">--connect</code>). Note that generic Hadoop arguments are preceeded by a
+single dash character (<code class="literal">-</code>), whereas tool-specific arguments start
+with two dashes (<code class="literal">--</code>), unless they are single character arguments such as <code class="literal">-P</code>.</p><p>The <code class="literal">-conf</code>, <code class="literal">-D</code>, <code class="literal">-fs</code> and <code class="literal">-jt</code> arguments control the configuration
+and Hadoop server settings. For example, the <code class="literal">-D mapred.job.name=&lt;job_name&gt;</code> can
+be used to set the name of the MR job that Sqoop launches, if not specified,
+the name defaults to the jar name for the job - which is derived from the used
+table name.</p><p>The <code class="literal">-files</code>, <code class="literal">-libjars</code>, and <code class="literal">-archives</code> arguments are not typically used with
+Sqoop, but they are included as part of Hadoop&#8217;s internal argument-parsing
+system.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_using_options_files_to_pass_arguments"></a>6.4. Using Options Files to Pass Arguments</h3></div></div></div><p>When using Sqoop, the command line options that do not change from
+invocation to invocation can be put in an options file for convenience.
+An options file is a text file where each line identifies an option in
+the order that it appears otherwise on the command line. Option files
+allow specifying a single option on multiple lines by using the
+back-slash character at the end of intermediate lines. Also supported
+are comments within option files that begin with the hash character.
+Comments must be specified on a new line and may not be mixed with
+option text. All comments and empty lines are ignored when option
+files are expanded. Unless options appear as quoted strings, any
+leading or trailing spaces are ignored. Quoted strings if used must
+not extend beyond the line on which they are specified.</p><p>Option files can be specified anywhere in the command line as long as
+the options within them follow the otherwise prescribed rules of
+options ordering. For instance, regardless of where the options are
+loaded from, they must follow the ordering such that generic options
+appear first, tool specific options next, finally followed by options
+that are intended to be passed to child programs.</p><p>To specify an options file, simply create an options file in a
+convenient location and pass it to the command line via
+<code class="literal">--options-file</code> argument.</p><p>Whenever an options file is specified, it is expanded on the
+command line before the tool is invoked. You can specify more than
+one option files within the same invocation if needed.</p><p>For example, the following Sqoop invocation for import can
+be specified alternatively as shown below:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST
+
+$ sqoop --options-file /users/homer/work/import.txt --table TEST</pre><p>where the options file <code class="literal">/users/homer/work/import.txt</code> contains the following:</p><pre class="screen">import
+--connect
+jdbc:mysql://localhost/db
+--username
+foo</pre><p>The options file can have empty lines and comments for readability purposes.
+So the above example would work exactly the same if the options file
+<code class="literal">/users/homer/work/import.txt</code> contained the following:</p><pre class="screen">#
+# Options file for Sqoop import
+#
+
+# Specifies the tool being invoked
+import
+
+# Connect parameter and value
+--connect
+jdbc:mysql://localhost/db
+
+# Username parameter and value
+--username
+foo
+
+#
+# Remaining options should be specified in the command line.
+#</pre></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_using_tools"></a>6.5. Using Tools</h3></div></div></div><p>The following sections will describe each tool&#8217;s operation. The
+tools are listed in the most likely order you will find them useful.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_literal_sqoop_import_literal"></a>7. <code class="literal">sqoop-import</code></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="section"><a href="#_purpose">7.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_distributed_cac
 he">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_transaction_isolation">7.2.7. Controlling transaction isolation</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.8. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_schema_name_handling">7.2.9. Schema name handling</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.10. Incremental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.11. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.12. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.13. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.14. Importing Data Into
  HBase</a></span></dt><dt><span class="section"><a href="#_importing_data_into_accumulo">7.2.15. Importing Data Into Accumulo</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.16. Additional Import Configuration Properties</a></span></dt></dl></dd><dt><span class="section"><a href="#_example_invocations">7.3. Example Invocations</a></span></dt></dl></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_purpose"></a>7.1. Purpose</h3></div></div></div><p>The <code class="literal">import</code> tool imports an individual table from an RDBMS to HDFS.
+Each row from a table is represented as a separate record in HDFS.
+Records can be stored as text files (one record per line), or in
+binary representation as Avro or SequenceFiles.</p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="_syntax"></a>7.2. Syntax</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_distributed_cache">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_transaction_isolation">7.2.7.
  Controlling transaction isolation</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.8. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_schema_name_handling">7.2.9. Schema name handling</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.10. Incremental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.11. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.12. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.13. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.14. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_importing_data_into_accumulo">7.2.15. Importing Data Into Accumulo</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.16. Additional Import Configur
 ation Properties</a></span></dt></dl></div><pre class="screen">$ sqoop import (generic-args) (import-args)
+$ sqoop-import (generic-args) (import-args)</pre><p>While the Hadoop generic arguments must precede any import arguments,
+you can type the import arguments in any order with respect to one
+another.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>In this document, arguments are grouped into collections
+organized by function. Some collections are present in several tools
+(for example, the "common" arguments). An extended description of their
+functionality is given only on the first presentation in this
+document.</p></td></tr></table></div><div class="table"><a name="idp774390820832"></a><p class="title"><b>Table 1. Common arguments</b></p><div class="table-contents"><table class="table" summary="Common arguments" style="border-collapse: collapse;border-top: 1px solid ; border-bottom: 1px solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 1px solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--connect &lt;jdbc-uri&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Specify JDBC connect string
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--connection-manager &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Specify connection manager class to                                          use
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--driver &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Manually specify JDBC driver class                                          to use
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--hadoop-mapred-home &lt;dir&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Override $HADOOP_MAPRED_HOME
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--help</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Print usage instructions
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--password-file</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Set path for a file containing the                                          authentication password
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">-P</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Read password from console
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--password &lt;password&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Set authentication password
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--username &lt;username&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Set authentication username
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--verbose</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Print more information while working
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--connection-param-file &lt;filename&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Optional properties file that                                          provides connection parameters
+    </td></tr><tr><td style="border-right: 1px solid ; " align="left">
+    <code class="literal">--relaxed-isolation</code>
+    </td><td style="" align="left">
+    Set connection transaction isolation                                          to read uncommitted for the mappers.
+    </td></tr></tbody></table></div></div><br class="table-break"><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_connecting_to_a_database_server"></a>7.2.1. Connecting to a Database Server</h4></div></div></div><p>Sqoop is designed to import tables from a database into HDFS. To do
+so, you must specify a <span class="emphasis"><em>connect string</em></span> that describes how to connect to the
+database. The <span class="emphasis"><em>connect string</em></span> is similar to a URL, and is communicated to
+Sqoop with the <code class="literal">--connect</code> argument. This describes the server and
+database to connect to; it may also specify the port. For example:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees</pre><p>This string will connect to a MySQL database named <code class="literal">employees</code> on the
+host <code class="literal">database.example.com</code>. It&#8217;s important that you <span class="strong"><strong>do not</strong></span> use the URL
+<code class="literal">localhost</code> if you intend to use Sqoop with a distributed Hadoop
+cluster. The connect string you supply will be used on TaskTracker nodes
+throughout your MapReduce cluster; if you specify the
+literal name <code class="literal">localhost</code>, each node will connect to a different
+database (or more likely, no database at all). Instead, you should use
+the full hostname or IP address of the database host that can be seen
+by all your remote nodes.</p><p>You might need to authenticate against the database before you can
+access it. You can use the <code class="literal">--username</code> to supply a username to the database.
+Sqoop provides couple of different ways to supply a password,
+secure and non-secure, to the database which is detailed below.</p><p><b>Secure way of supplying password to the database. </b>You should save the password in a file on the users home directory with 400
+permissions and specify the path to that file using the <span class="strong"><strong><code class="literal">--password-file</code></strong></span>
+argument, and is the preferred method of entering credentials. Sqoop will
+then read the password from the file and pass it to the MapReduce cluster
+using secure means with out exposing the password in the job configuration.
+The file containing the password can either be on the Local FS or HDFS.
+For example:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \
+    --username venkatesh --password-file ${user.home}/.password</pre><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td align="left" valign="top"><p>Sqoop will read entire content of the password file and use it as
+a password. This will include any trailing white space characters such as
+new line characters that are added by default by most of the text editors.
+You need to make sure that your password file contains only characters
+that belongs to your password. On the command line you can use command
+<code class="literal">echo</code> with switch <code class="literal">-n</code> to store password without any trailing white space
+characters. For example to store password <code class="literal">secret</code> you would call
+<code class="literal">echo -n "secret" &gt; password.file</code>.</p></td></tr></table></div><p>Another way of supplying passwords is using the <code class="literal">-P</code> argument which will
+read a password from a console prompt.</p><p><b>Protecting password from preying eyes. </b>Hadoop 2.6.0 provides an API to separate password storage from applications.
+This API is called the credential provided API and there is a new
+<code class="literal">credential</code>  command line tool to manage passwords and their aliases.
+The passwords are stored with their aliases in a keystore that is password
+protected.   The keystore password can be the provided to a password prompt
+on the command line, via an environment variable or defaulted to a software
+defined constant.   Please check the Hadoop documentation on the usage
+of this facility.</p><p>Once the password is stored using the Credential Provider facility and
+the Hadoop configuration has been suitably updated, all applications can
+optionally use the alias in place of the actual password and at runtime
+resolve the alias for the password to use.</p><p>Since the keystore or similar technology used for storing the credential
+provider is shared across components, passwords for various applications,
+various database and other passwords can be securely stored in them and only
+the alias needs to be exposed in configuration files, protecting the password
+from being visible.</p><p>Sqoop has been enhanced to allow usage of this funcionality if it is
+available in the underlying Hadoop version being used.   One new option
+has been introduced to provide the alias on the command line instead of the
+actual password (--password-alias).  The argument value this option is
+the alias on the storage associated with the actual password.
+Example usage is as follows:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \
+    --username dbuser --password-alias mydb.password.alias</pre><p>Similarly, if the command line option is not preferred, the alias can be saved
+in the file provided with --password-file option.  Along with this, the
+Sqoop configuration parameter org.apache.sqoop.credentials.loader.class
+should be set to the classname that provides the alias resolution:
+<code class="literal">org.apache.sqoop.util.password.CredentialProviderPasswordLoader</code></p><p>Example usage is as follows (assuming .password.alias has the alias for
+the real password) :</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \
+    --username dbuser --password-file ${user.home}/.password-alias</pre><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td align="left" valign="top"><p>The <code class="literal">--password</code> parameter is insecure, as other users may
+be able to read your password from the command-line arguments via
+the output of programs such as <code class="literal">ps</code>. The <span class="strong"><strong><code class="literal">-P</code></strong></span> argument is the preferred
+method over using the <code class="literal">--password</code> argument. Credentials may still be
+transferred between nodes of the MapReduce cluster using insecure means.
+For example:</p></td></tr></table></div><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \
+    --username aaron --password 12345</pre><p>Sqoop automatically supports several databases, including MySQL.  Connect
+strings beginning with <code class="literal">jdbc:mysql://</code> are handled automatically in Sqoop.  (A
+full list of databases with built-in support is provided in the "Supported
+Databases" section. For some, you may need to install the JDBC driver
+yourself.)</p><p>You can use Sqoop with any other
+JDBC-compliant database. First, download the appropriate JDBC
+driver for the type of database you want to import, and install the .jar
+file in the <code class="literal">$SQOOP_HOME/lib</code> directory on your client machine. (This will
+be <code class="literal">/usr/lib/sqoop/lib</code> if you installed from an RPM or Debian package.)
+Each driver <code class="literal">.jar</code> file also has a specific driver class which defines
+the entry-point to the driver. For example, MySQL&#8217;s Connector/J library has
+a driver class of <code class="literal">com.mysql.jdbc.Driver</code>. Refer to your database
+vendor-specific documentation to determine the main driver class.
+This class must be provided as an argument to Sqoop with <code class="literal">--driver</code>.</p><p>For example, to connect to a SQLServer database, first download the driver from
+microsoft.com and install it in your Sqoop lib path.</p><p>Then run Sqoop. For example:</p><pre class="screen">$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
+    --connect &lt;connect-string&gt; ...</pre><p>When connecting to a database using JDBC, you can optionally specify extra
+JDBC parameters via a property file using the option
+<code class="literal">--connection-param-file</code>. The contents of this file are parsed as standard
+Java properties and passed into the driver while creating a connection.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The parameters specified via the optional property file are only
+applicable to JDBC connections. Any fastpath connectors that use connections
+other than JDBC will ignore these parameters.</p></td></tr></table></div><div class="table"><a name="idp774390891360"></a><p class="title"><b>Table 2. Validation arguments <a class="link" href="#validation" title="11. validation">More Details</a></b></p><div class="table-contents"><table class="table" summary="Validation arguments More Details" style="border-collapse: collapse;border-top: 1px solid ; border-bottom: 1px solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 1px solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--validate</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Enable validation of data copied,                                             supports single table copy only.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--validator &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Specify validator class to use.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--validation-threshold &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Specify validation threshold class                                             to use.
+    </td></tr><tr><td style="border-right: 1px solid ; " align="left">
+    <code class="literal">--validation-failurehandler &lt;class-name&gt;</code>
+    </td><td style="" align="left">
+    Specify validation failure                                             handler class to use.
+    </td></tr></tbody></table></div></div><br class="table-break"><div class="table"><a name="idp774390917888"></a><p class="title"><b>Table 3. Import control arguments:</b></p><div class="table-contents"><table class="table" summary="Import control arguments:" style="border-collapse: collapse;border-top: 1px solid ; border-bottom: 1px solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 1px solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--append</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Append data to an existing dataset                                  in HDFS
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--as-avrodatafile</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Imports data to Avro Data Files
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--as-sequencefile</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Imports data to SequenceFiles
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--as-textfile</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Imports data as plain text (default)
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--as-parquetfile</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Imports data to Parquet Files
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--boundary-query &lt;statement&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Boundary query to use for creating splits
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--columns &lt;col,col,col&#8230;&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Columns to import from table
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--delete-target-dir</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Delete the import target directory                                  if it exists
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--direct</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Use direct connector if exists for the database
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--fetch-size &lt;n&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Number of entries to read from database                                  at once.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--inline-lob-limit &lt;n&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Set the maximum size for an inline LOB
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">-m,--num-mappers &lt;n&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Use <span class="emphasis"><em>n</em></span> map tasks to import in parallel
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">-e,--query &lt;statement&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Import the results of <span class="emphasis"><em><code class="literal">statement</code></em></span>.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--split-by &lt;column-name&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Column of the table used to split work                                  units.  Cannot be used with                                  <code class="literal">--autoreset-to-one-mapper</code> option.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--split-limit &lt;n&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Upper Limit for each split size.                                  This only applies to Integer and Date columns.                                  For date or timestamp fields it is calculated in seconds.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--autoreset-to-one-mapper</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Import should use one mapper if a table                                  has no primary key and no split-by column                                  is provided.  Cannot be used with                                  <code class="literal">--split-by &lt;col&gt;</code> option.
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--table &lt;table-name&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Table to read
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--target-dir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    HDFS destination dir
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--temporary-rootdir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    HDFS directory for temporary files created during import (overrides default "_sqoop")
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--warehouse-dir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    HDFS parent for table destination
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--where &lt;where clause&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    WHERE clause to use during import
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">-z,--compress</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Enable compression
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--compression-codec &lt;c&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    Use Hadoop codec (default gzip)
+    </td></tr><tr><td style="border-right: 1px solid ; border-bottom: 1px solid ; " align="left">
+    <code class="literal">--null-string &lt;null-string&gt;</code>
+    </td><td style="border-bottom: 1px solid ; " align="left">
+    The string to be written for a null                                  value for string columns
+    </td></tr><tr><td style="border-right: 1px solid ; " align="left">
+    <code class="literal">--null-non-string &lt;null-string&gt;</code>
+    </td><td style="" align="left">
+    The string to be written for a null                                  value for non-string columns
+    </td></tr></tbody></table></div></div><br class="table-break"><p>The <code class="literal">--null-string</code> and <code class="literal">--null-non-string</code> arguments are optional.\
+If not specified, then the string "null" will be used.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_selecting_the_data_to_import"></a>7.2.2. Selecting the Data to Import</h4></div></div></div><p>Sqoop typically imports data in a table-centric fashion. Use the
+<code class="literal">--table</code> argument to select the table to import. For example, <code class="literal">--table
+employees</code>. This argument can also identify a <code class="literal">VIEW</code> or other table-like
+entity in a database.</p><p>By default, all columns within a table are selected for import.
+Imported data is written to HDFS in its "natural order;" that is, a
+table containing columns A, B, and C result in an import of data such
+as:</p><pre class="screen">A1,B1,C1
+A2,B2,C2
+...</pre><p>You can select a subset of columns and control their ordering by using
+the <code class="literal">--columns</code> argument. This should include a comma-delimited list
+of columns to import. For example: <code class="literal">--columns "name,employee_id,jobtitle"</code>.</p><p>You can control which rows are imported by adding a SQL <code class="literal">WHERE</code> clause
+to the import statement. By default, Sqoop generates statements of the
+form <code class="literal">SELECT &lt;column list&gt; FROM &lt;table name&gt;</code>. You can append a
+<code class="literal">WHERE</code> clause to this with the <code class="literal">--where</code> argument. For example: <code class="literal">--where
+"id &gt; 400"</code>. Only rows where the <code class="literal">id</code> column has a value greater than
+400 will be imported.</p><p>By default sqoop will use query <code class="literal">select min(&lt;split-by&gt;), max(&lt;split-by&gt;) from
+&lt;table name&gt;</code> to find out boundaries for creating splits. In some cases this query
+is not the most optimal so you can specify any arbitrary query returning two
+numeric columns using <code class="literal">--boundary-query</code> argument.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_free_form_query_imports"></a>7.2.3. Free-form Query Imports</h4></div></div></div><p>Sqoop can also import the result set of an arbitrary SQL query. Instead of
+using the <code class="literal">--table</code>, <code class="literal">--columns</code> and <code class="literal">--where</code> arguments, you can specify
+a SQL statement with the <code class="literal">--query</code> argument.</p><p>When importing a free-form query, you must specify a destination directory
+with <code class="literal">--target-dir</code>.</p><p>If you want to import the results of a query in parallel, then each map task
+will need to execute a copy of the query, with results partitioned by bounding
+conditions inferred by Sqoop. Your query must include the token <code class="literal">$CONDITIONS</code>
+which each Sqoop process will replace with a unique condition expression.
+You must also select a splitting column with <code class="literal">--split-by</code>.</p><p>For example:</p><pre class="screen">$ sqoop import \
+  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
+  --split-by a.id --target-dir /user/foo/joinresults</pre><p>Alternately, the query can be executed once and imported serially, by
+specifying a single map task with <code class="literal">-m 1</code>:</p><pre class="screen">$ sqoop import \
+  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
+  -m 1 --target-dir /user/foo/joinresults</pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>If you are issuing the query wrapped with double quotes ("),
+you will have to use <code class="literal">\$CONDITIONS</code> instead of just <code class="literal">$CONDITIONS</code>
+to disallow your shell from treating it as a shell variable.
+For example, a double quoted query may look like:
+<code class="literal">"SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"</code></p></td></tr></table></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The facility of using free-form query in the current version of Sqoop
+is limited to simple queries where there are no ambiguous projections and
+no <code class="literal">OR</code> conditions in the <code class="literal">WHERE</code> clause. Use of complex queries such as
+queries that have sub-queries or joins leading to ambiguous projections can
+lead to unexpected results.</p></td></tr></table></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_parallelism"></a>7.2.4. Controlling Parallelism</h4></div></div></div><p>Sqoop imports data in parallel from most database sources. You can
+specify the number
+of map tasks (parallel processes) to use to perform the import by
+using the <code class="literal">-m</code> or <code class="literal">--num-mappers</code> argument. Each of these arguments
+takes an integer value which corresponds to the degree of parallelism
+to employ. By default, four tasks are used. Some databases may see
+improved performance by increasing this value to 8 or 16. Do not
+increase the degree of parallelism greater than that available within
+your MapReduce cluster; tasks will run serially and will likely
+increase the amount of time required to perform the import. Likewise,
+do not increase the degree of parallism higher than that which your
+database can reasonably support. Connecting 100 concurrent clients to
+your database may increase the load on the database server to a point
+where performance suffers as a result.</p><p>When performing parallel imports, Sqoop needs a criterion by which it
+can split the workload. Sqoop uses a <span class="emphasis"><em>splitting column</em></span> to split the
+workload. By default, Sqoop will identify the primary key column (if
+present) in a table and use it as the splitting column. The low and
+high values for the splitting column are retrieved from the database,
+and the map tasks operate on evenly-sized components of the total
+range. For example, if you had a table with a primary key column of
+<code class="literal">id</code> whose minimum value was 0 and maximum value was 1000, and Sqoop
+was directed to use 4 tasks, Sqoop would run four processes which each
+execute SQL statements of the form <code class="literal">SELECT * FROM sometable WHERE id
+&gt;= lo AND id &lt; hi</code>, with <code class="literal">(lo, hi)</code> set to (0, 250), (250, 500),
+(500, 750), and (750, 1001) in the different tasks.</p><p>If the actual values for the primary key are not uniformly distributed
+across its range, then this can result in unbalanced tasks. You should
+explicitly choose a different column with the <code class="literal">--split-by</code> argument.
+For example, <code class="literal">--split-by employee_id</code>. Sqoop cannot currently split on
+multi-column indices. If your table has no index column, or has a
+multi-column key, then you must also manually choose a splitting
+column.</p><p>User can override the <code class="literal">--num-mapers</code> by using <code class="literal">--split-limit</code> option.
+Using the <code class="literal">--split-limit</code> parameter places a limit on the size of the split
+section created. If the size of the split created is larger than the size
+specified in this parameter, then the splits would be resized to fit within
+this limit, and the number of splits will change according to that.This
+affects actual number of mappers. If size of a split calculated based on
+provided <code class="literal">--num-mappers</code> parameter exceeds <code class="literal">--split-limit</code> parameter then actual
+number of mappers will be increased.If the value specified in <code class="literal">--split-limit</code>
+parameter is 0 or negative, the parameter will be ignored altogether and
+the split size will be calculated according to the number of mappers.</p><p>If a table does not have a primary key defined and the <code class="literal">--split-by &lt;col&gt;</code>
+is not provided, then import will fail unless the number
+of mappers is explicitly set to one with the <code class="literal">--num-mappers 1</code> option
+or the <code class="literal">--autoreset-to-one-mapper</code> option is used.  The option
+<code class="literal">--autoreset-to-one-mapper</code> is typically used with the import-all-tables
+tool to automatically handle tables without a primary key in a schema.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_distributed_cache"></a>7.2.5. Controlling Distributed Cache</h4></div></div></div><p>Sqoop will copy the jars in $SQOOP_HOME/lib folder to job cache every
+time when start a Sqoop job. When launched by Oozie this is unnecessary
+since Oozie use its own Sqoop share lib which keeps Sqoop dependencies
+in the distributed cache. Oozie will do the localization on each
+worker node for the Sqoop dependencies only once during the first Sqoop
+job and reuse the jars on worker node for subsquencial jobs. Using
+option <code class="literal">--skip-dist-cache</code> in Sqoop command when launched by Oozie will
+skip the step which Sqoop copies its dependencies to job cache and save
+massive I/O.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_the_import_process"></a>7.2.6. Controlling the Import Process</h4></div></div></div><p>By default, the import process will use JDBC which provides a
+reasonable cross-vendor import channel. Some databases can perform
+imports in a more high-performance fashion by using database-specific
+data movement tools. For example, MySQL provides the <code class="literal">mysqldump</code> tool
+which can export data from MySQL to other systems very quickly. By
+supplying the <code class="literal">--direct</code> argument, you are specifying that Sqoop
+should attempt the direct import channel. This channel may be
+higher performance than using JDBC.</p><p>Details about use of direct mode with each specific RDBMS, installation requirements, available
+options and limitations can be found in <a class="xref" href="#connectors" title="25. Notes for specific connectors">Section 25, &#8220;Notes for specific connectors&#8221;</a>.</p><p>By default, Sqoop will import a table named <code class="literal">foo</code> to a directory named
+<code class="literal">foo</code> inside your home directory in HDFS. For example, if your
+username is <code class="literal">someuser</code>, then the import tool will write to
+<code class="literal">/user/someuser/foo/(files)</code>. You can adjust the parent directory of
+the import with the <code class="literal">--warehouse-dir</code> argument. For example:</p><pre class="screen">$ sqoop import --connnect &lt;connect-str&gt; --table foo --warehouse-dir /shared \
+    ...</pre><p>This command would write to a set of files in the <code class="literal">/shared/foo/</code> directory.</p><p>You can also explicitly choose the target directory, like so:</p><pre class="screen">$ sqoop import --connnect &lt;connect-str&gt; --table foo --target-dir /dest \
+    ...</pre><p>This will import the files into the <code class="literal">/dest</code> directory. <code class="literal">--target-dir</code> is
+incompatible with <code class="literal">--warehouse-dir</code>.</p><p>When using direct mode, you can specify additional arguments which
+should be passed to the underlying tool. If the argument
+<code class="literal">--</code> is given on the command-line, then subsequent arguments are sent
+directly to the underlying tool. For example, the following adjusts
+the character set used by <code class="literal">mysqldump</code>:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \
+    --direct -- --default-character-set=latin1</pre><p>By default, imports go to a new target location. If the destination directory
+already exists in HDFS, Sqoop will refuse to import and overwrite that
+directory&#8217;s contents. If you use the <code class="literal">--append</code> argument, Sqoop will import
+data to a temporary directory and then rename the files into the normal
+target directory in a manner that does not conflict with existing filenames
+in that directory.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_transaction_isolation"></a>7.2.7. Controlling transaction isolation</h4></div></div></div><p>By default, Sqoop uses the read committed transaction isolation in the mappers
+to import data.  This may not be the ideal in all ETL workflows and it may
+desired to reduce the isolation guarantees.   The <code class="literal">--relaxed-isolation</code> option
+can be used to instruct Sqoop to use read uncommitted isolation level.</p><p>The <code class="literal">read-uncommitted</code> isolation level is not supported on all databases
+(for example, Oracle), so specifying the option <code class="literal">--relaxed-isolation</code>

[... 4043 lines stripped ...]


Mime
View raw message