s3tail

Documentation Status Updates

S3tail is a simple tool to help access log files stored in an S3 bucket in the same way one might use the *nix tail command (with far fewer options, most notably the lack of follow).

Simplest install method is via pip install s3tail (see installation for other methods).

Features

S3tail downloads and displays the content of files stored in S3, optionally starting at a specific prefix. For example, the following will start dumping all the log file contents found for August the fourth in the order S3 provides from that prefix onward:

$ s3tail s3://my-logs/production-s3-access-2016-08-04

When s3tail is stopped or interrupted, it’ll print a bookmark to be used to pick up at the exact spot following the last log printed in a previous run. Something like the following might be used to leverage this ability to continue tailing from a previous stopping point:

$ s3tail s3://my-logs/production-s3-access-2016-08-04
...
...a-bunch-of-file-output...
...
Bookmark: production-s3-access-2016-08-04-00-20-31-61059F36E0DBF36E:706

This can then be used to pick up at line 707 later on, like this:

$ s3tail s3://my-logs/production-s3-access-2016-08-04 \
    --bookmark production-s3-access-2016-08-04-00-20-31-61059F36E0DBF36E:706

Additionally, it’s often useful to let s3tail track where things were left off and pick up at that spot without needing to copy and paste the previous bookmark. This is where “named bookmarks” come in handy. The examples above could have been reduced to these operations:

$ s3tail --bookmark my-special-spot s3://my-logs/production-s3-access-2016-08-04
...
^C
$ s3tail --bookmark my-special-spot s3://my-logs/production-s3-access
Starting production-s3-access-2016-08-04-02-22-32-415AE699C8233AC3
Found production-s3-access-2016-08-04-02-22-32-415AE699C8233AC3 in cache
Picked up at line 707
...

It’s safe to rerun s3tail sessions when working with piped commands searching for data in the stream (e.g. grep). S3tail keeps files in a local file system cache (for 24 hours by default) and will always read and display from the cache before downloading from S3. This is done in a best-effort background thread to avoid impacting performance. The file cache is stored in the user’s HOME directory, in an .s3tailcache subdirectory, where the file names are the S3 keys hashed with SHA-256. These can be listed through the use of the --cache-lookup option:

$ s3tail --cache-lookup s3://my-logs/production-s3-access-2016-08-04

my-logs/production-s3-access-2016-08-04-23-20-40-9935D31F89E5E38B
  => NOT IN CACHE
my-logs/production-s3-access-2016-08-04-23-20-45-D76C63A0478F829B
  => NOT IN CACHE
my-logs/production-s3-access-2016-08-04-23-20-51-C14A8D0980A9F562
  => NOT IN CACHE
...
my-logs/production-s3-access-2016-08-04-23-24-02-C9DF441E6B14EFBB
  => /Users/brad/.s3tailcache/05/0536db5ed3938c0b7fb8d2809bf8b4eb1a686ba14c9dc9b09aafc20780ef0528
my-logs/production-s3-access-2016-08-04-23-24-10-E9E55E9019AA46D0
  => /Users/brad/.s3tailcache/d1/d1c8b060d7c9a59c6387fc93b7a3d42db09ce90df2ed4eb71449e88e010ab4a8
my-logs/production-s3-access-2016-08-04-23-24-58-28FE2F9927BCBEA3
  => /Users/brad/.s3tailcache/46/46de81db7cd618074a8ff24cef938dca0d8353da3af8ccc67f517ba8600c3963

Check out usage for more details and examples (like how to leverage GoAccess to generate beautiful traffic reports!).

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.