mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernd Mathiske" <be...@mesosphere.io>
Subject Re: Review Request 30774: Fetcher Cache
Date Fri, 15 May 2015 22:07:25 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated May 15, 2015, 3:07 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.


Changes
-------

Fixed remaining issues.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs
can now be cached in a cache directory designated by a slave flag. This only happens when
asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size
limit also given by a new slave flag. Cache-resident files are evicted as necessary to make
space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one
download. The fetcher program remains external for safety reasons, but is now augmented with
more elaborate parameters packed into a JSON object to implement specific fetch actions for
all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the
new features.


Diffs (updated)
-----

  docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 967b1e3bbfb3f6b71d5a15d02cba7ed5ec21816f 
  include/mesos/type_utils.hpp 044637481e5405d4d6f61653a9f9386edd191deb 
  src/Makefile.am 93c7c8a807a33ab639be6289535bbd32022aa85b 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f 
  src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622

  src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603

  src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032 
  src/slave/flags.cpp d0932b04e3825abb6173efe0d1aee199aa356932 
  src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 
  src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 19db71217f0a3f1ab17a6fd4408f8251410d731d 
  src/tests/mesos.cpp bc082e8d91deb2c5dd64bbc3f0a8a50fa7d19264 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to stout: 30006, 30033,
30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions
of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when the fetcher
uses JSON in a single env var as a parameter. They never tested anything that won't be covered
by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of calling the run
method of the fetcher directly, calling through fetch(). Also removes all uses of I/O redirection,
which is not really needed for debugging, and thus the next patch can refactor fetch() and
run(). (The latter comes in two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will later
cause fetcher cache actions. Also introduces the notion of a cache directory to the fetcher
info protobuf. And then propagates these additions throughout the rest of the code base where
applicable. This includes passing the slave ID all the way down to the place where the cache
dir name is constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the cache,
fetch through the cache, retrieve from the cache). Switches the basis for dealing with uris
to "items", which contain the uri, the action, and potentially a cache file name. Refactors
fetch() and run(), so there is only one of each. Introduces about half of the actual cache
logic, including a hashmap of cache file objects for bookkeeping and basic operations on it.


30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download results in
the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS fetching.
Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related to fetching
from HDFS. Uses a mock HDFS client written in bash that acts just like a real "hadoop" command
if used in the right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements
recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a test fixture
and tests to find out if the fetcher cache works with a variety of settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains the
file size for the URI passed as argument. This is needed to determine the size of a file on
HDFS before downloading it to the fetcher cache (to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation code
(distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor,
since it is going to be reused by download size queries when we introduce fetcher cache management.
Also factored out URI validation, which will be used the same way by mesos-fetcher and the
fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough space to accomodate
upcoming downloads to the cache. Necessary provisions included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it fails, the
reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All math included
:-)
- To find out how much space is needed, downloading has a prelude in which we query the download
size from the URI. This works for all URI types that mesos-fetcher currently supports, including
http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in play happens.
- There is cleanup code for all kinds of error situations. At the very end of the fetch attempt,
each list is processed for undoing things like space reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related cache files
are. We use reference counting for this, since there may be concurrent fetch attempts using
the same cache files.


Thanks,

Bernd Mathiske


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message