Originally posted on the Container Solutions blog
Things shifted slightly in the Cloud Native world recently, when the Docker Hub turned on rate limiting.
If you run a Kubernetes cluster, or make extensive use of Docker images, this
is something you need to be aware of as it could cause outages. In
particular, if you are suddenly finding a lot of Kubernetes pods failing with
ErrImagePull
and event messages like:
Failed to pull image "ratelimitalways/test:latest":
rpc error: code = Unknown desc = Error response from daemon:
pull access denied for ratelimitalways/test, repository does not exist or may require 'docker login': denied: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: <https://www.docker.com/increase-rate-limit>
Then you’ve probably been hit by the rate limiting. Essentially, in order to control costs, the Docker Hub now controls the speed at which image pulls can be made. The rules are:
- Anonymous users can pull 100 images in six hours.
- Authenticated users can pull 200 images in six hours.
- Paid users are not limited.
- One pull = one GET request for a manifest (GETs of image layers do not count).
Identifying Images from Docker Hub ๐
Large clusters and CI/CD platforms that use the Hub are likely to hit these limits—in these situations you are likely to have multiple nodes pulling from the same IP address (or what appears to the Hub as the same address).
The first thing you might want to do is find out what images from the Docker Hub you’re using. Remember that the Docker Hub controls the ‘default namespace’ for container images, so it’s not always obvious where images come from.
If you run the following on a Kubernetes cluster, it should identify all images from the Docker Hub that use the normal naming convention:
$ kubectl get pods --all-namespaces \
-o jsonpath="{.items[*].spec.containers[*].image}" \
| tr -s '[[:space:]]' '\n' | grep -v "[^.]*[:.].*/" | sort | uniq>
This won’t identify images that explicitly reference the Docker Hub— i.e., images like "docker.io/library/postgres:latest"
. You can find these with the rather simpler expression:
$ kubectl get pods --all-namespaces \
-o jsonpath="{.items[*].spec.containers[*].image}" \
| tr -s '[[:space:]]' '\n' | grep "^docker.io" | sort | uniq
Solving the Problem ๐
So what’s the best way to solve this problem? It will depend on how quick you need to get this sorted, but your options are:
Pay for Docker Hub licenses. It’s not expensive, but Docker pricing is per team member, which can be a little confusing when what you actually want to license is a cluster of 100 Kubernetes nodes. To make sure you’re in the clear here, opt for the team membership unless it’s a very small cluster.
To use the new credentials, you will need to add image pull secrets to your deployments. Note that image pull secrets can be added to the default service account, so you don’t have to manually update every deployment.
- Set up the Docker Registry (part of Docker Distribution) as a pull through cache or mirror. This used to be a popular solution and should ensure your cluster is only pulling each image once. Unfortunately, it isn’t that easy to do and requires configuration changes to each node, so the best approach is dependent on how you installed and manage Kubernetes.
Be aware that the registry will delete cached images after seven days. Several clouds also run their own Docker mirrors, which avoid the need to run your own registry instance (but still require configuration). - Install Trow (or another registry with proxy support) and configure as a proxy-cache. Trow has a –proxy-docker-hub argument, which will configure it to automatically fetch any repos under f/docker from the Docker Hub e.g. docker pull my-trow.myorg/f/docker/redis:latest will pull the redis:latest image from the Docker Hub and cache it in the Trow registry. This solution will require updating image names to reference the Trow registry, but doesn’t require any images to be moved manually.
- Switch all your images to point to a different registry. For example, you could install a local registry on your cluster and mandate that all images must come from the registry. This sounds like a lot of work, but it is arguably the most sustainable, maintainable, and secure way forward. With regards to enforcing the registry choice, this can be done with an Admission Controller (which can be installed with Trow) or Open Policy Agent.
It’s worth pointing out that most of these aren’t mutually exclusive—you can pay for the Docker Hub to get you out of a bind, then move to a solution that uses both of the final two options.
In the long run, I would recommend that most clusters should be set up with their own registry and the cluster should only be allowed to run images from that registry. Any third-party images, such as Docker official images, can be proxy-cached.
This will provide a fall-back in the case of remote outages: As well as having a local copy that can be used, the registry also provides a place where new images can be pushed, allowing updates to still take place when the remote registry can’t be reached. In a lot of cases it may be worth taking this further and ‘gating’ all third-party content to protect against bad upstream content. In this set-up, images are tested and verified before being added to the organisational registry.
To give an example of where this helps, imagine a bad image is pushed to the nginx:1.19 repo on Docker Hub (see this NodeJS Docker issue for a real world example). If your set-up pulls this version into the cache, you’ll be stuck until a fix is pushed, but if you used gating, it should never have hit you in the first place, and you should also have a history of old images in case you need to roll back.
So what’s the takeaway from all this? We need to be more careful and thoughtful with our software supply chains. I think this is going to be a big topic in the future, and we can already see hints of where things are going in the Notary and Grafeas projects.