Docker Distribution: Hosted Remote Storage with Amazon’s Simple Storage Service (S3)

docker s3

What follows is a lightly modified excerpt from chapter 10 of Docker in Action. Chapter 10 covers the Docker Distribution project in depth.

Simple Storage Service (or S3) from AWS offers several features in addition to blob storage. You can configure blobs to be encrypted at rest, versioned, access audited, or made available through AWS’s content delivery network.

Use the “s3” storage property to adopt S3 as your hosted remote blob store. There are four required sub-properties: “accesskey,” “secretkey,” “region,” and “bucket.” These are required to authenticate your account and set the location where blob reads and writes will happen. Other sub-properties specify how the Distribution project should use the blob store. These include, “encrypt,” “secure,” “v4auth,” “chunksize,” and “rootdirectory.”

Setting the encrypt property to, “true” will enable encryption-as-rest for the data your registry saves to S3. This is a free feature that enhances the security of your service.

The “secure” property controls the use of HTTPS for communication with S3. The default is false, and will result in the use of HTTP. If you are storing private image material, you should set this to true.

The “v4auth” property tells the registry to use version 4 of the AWS authentication protocol. In general this should be set to true, but defaults to false.

Files greater than 5GB must be split into smaller files and reassembled on the service side in S3. However, chunked uploads are available to files smaller than 5GB and should be considered for files greater than 100MB. File chunks can be uploaded in parallel and individual chunk upload failures can be retired individually. The Distribution project and its S3 client perform file chunking automatically, but the “chunksize” property sets the size beyond which files should be chunked. The minimum chunk size is 5MB.

Finally, the “rootdirectory” property sets the directory within your S3 bucket where the registry data should be rooted. This is helpful if you want to run multiple registries from the same bucket. The following is a fork of the default configuration file and has been configured to use S3 (provided you fill in the blanks for your account).

# Filename: s3-config.yml
version: 0.1
log:
level: debug
fields:
service: registry
environment: development
storage:
cache:
layerinfo: inmemory
s3:
accesskey: <your awsaccesskey>
secretkey: <your awssecretkey>
region: <your bucket region>
bucket: <your bucketname>
encrypt: true
secure: true
v4auth: true
chunksize: 5242880
rootdirectory: /s3/object/name/prefix
maintenance:
uploadpurging:
enabled: false
http:
addr: :5000
secret: asecretforlocaldevelopment
debug:
addr: localhost:5001

After you’ve provided your account access key, secret, bucket name, and region, you can pack the updated registry configuration into a new image with the following Dockerfile:


# Filename: s3-config.df
FROM registry:2
LABEL source=dockerinaction
LABEL category=infrastructure
# Set the default argument to specify the config file to use
# Setting it early will enable layer caching if the
# s3-config.yml changes.
CMD ["/s3-config.yml"]
COPY ["./s3-config.yml","/s3-config.yml"]

 

And built it with the following docker build command:

docker build -t dockerinaction/s3-registry -f s3-config.df .

Launch your new S3 backed registry with a simple “docker run” command:

docker run -d --name s3-registry dockerinaction/s3-registry

An alternative to building a full image would be to use bind-mount volumes to load the configuration file in a new container and set the default command. For example, you could use the following Docker command:

docker run -d --name s3-registry -v "$PWD"/s3-config.yml:/s3-config.yml registry:2 s3-config.yml

If you wanted to get really fancy, you could inject the new configuration as environment variables and use a stock image. This is a good idea for secret material handling (like your account specifics). Running the container that way would look something like:


docker run -d --name s3-registry \
-e REGISTRY_STORAGE_S3_ACCESSKEY=$AWS_ACCESS_KEY \
-e REGISTRY_STORAGE_S3_SECRETKEY=$AWS_SECRET_KEY \
-e REGISTRY_STORAGE_S3_REGION=us-west2 \
-e REGISTRY_STORAGE_S3_BUCKET=my_registry \
-e REGISTRY_STORAGE_S3_ENCRYPT=true \
-e REGISTRY_STORAGE_S3_SECURE=true \
-e REGISTRY_STORAGE_S3_V4AUTH=true \
-e REGISTRY_STORAGE_S3_CHUNKSIZE=5242880 \
-e REGISTRY_STORAGE_S3_ROOTDIRECTORY=/s3/object/name/prefix \
registry:2

In practice a blended approach where you provide environment agnostic and insensitive material via configuration file, but inject the remaining components through environment variables will meet your needs.

S3 is offered under a use-based cost model. There is no upfront cost to get started, and many smaller registries will be able to operate within the “free tier” of either service.

If you are not interested in a hosted data service and do not hesitate in the face of some technical complexity, then you might alternatively consider running a Ceph storage cluster and the RADOS blob storage backend.

About

Jeff Nickoloff is a software engineer and author. He is currently writing Docker in Action, consulting, working on a privacy conscious content discovery project, and mentoring a few new engineers. Until recently he worked with Amazon.com in engineering and leadership. He spent that time building and iterating on the high volume microservices powering the largest document processing workflow engine at Amazon. He has contributed libraries used by teams all over Amazon. Connect with Jeff via Twitter @allingeek or follow his blog at https://medium.com/@allingeek/.

Tagged with: , ,
Posted in Tutorial
Categories
%d bloggers like this: