Fixing Docker Layer Push issue on GitLab Container Registry
As a DevOps Engineer, my main responsibility is to manage deployments, particularly publishing images to the GitLab Container Registry. Occasionally, the deployment of an image layer fails for unknown reasons, particularly with larger-sized images.
I experience this issue both on GitLab.com and self-hosted ones, and troubleshooting it could be hard, especially when the error log is vague and the process keeps retrying itself.
Most common error messages encountered on pushing to the registry:
- received unexpected HTTP status: 500 Internal Server Error
- unknown: Client Closed Request
- http: proxy error: context canceled
- write: connection reset by peer
This guide will help you troubleshoot and fix deployment issues, especially on stuck or failed layer pushes on the GitLab container registry platform.
Troubleshooting #1: Gitlab Nginx Upload Limit
When running GitLab on-premises, the default built-in nginx upload limit is too small to handle large uploads and needs to be increased. Open the GitLab configuration file and increase the client_max_body_size value as needed.
# /etc/gitlab/gitlab.rb
nginx['client_max_body_size'] = '256m'
After increasing the limit, run the reconfiguration script and restart the GitLab server.
gitlab-ctl reconfigure
gitlab-ctl restart
Troubleshooting #2: GitLab Authentication Token Timeout
Simply put, if an image upload takes a considerable amount of time due to computational or bandwidth limitations, the registry token may expire midway through the upload process, causing it to fail.
This value cannot be changed on GitLab.com, and you may need to either self-host your docker registry or optimize the image size further. Fortunately, in a self-hosted GitLab installation, you can access this page and change the token limit.
# with your GitLab admin account
https://your_gitlab_domain/admin/application_settings/ci_cd
Troubleshooting #3: Reverse Proxy Timeout
If you running a reverse proxy behind GitLab or the container registry, you need to increase the timeout, especially on read/write timeout to handle a longer HTTP connection.
If your GitLab registry log file contains the error "client disconnected during blob PATCH", increasing the proxy timeout could help solve the issue.
On Nginx, increase the connection timeout by adding the value below to your configuration:
For Traefik:
Troubleshooting #4: Decrease Docker Concurrent Uploads
By default, docker push will upload five concurrent layer at the same time. If your registry and runner bandwidth is limited and slow decreasing the concurrent layer upload could help with the issue.
Open your daemon config file, should be on /etc/docker/daemon.json, or check the documentation. Update the concurrent upload/download value as needed, and restart the docker daemon.
{
"max-concurrent-uploads": 2,
"max-concurrent-downloads": 2
}
Troubleshooting #5: Insufficient Server Capacity
Ultimately, if you have exhausted all troubleshooting options, it may be that the server's performance or capacity is reaching its limits and necessitates an essential upgrade, particularly for a high-activity GitLab server.
Begin by upgrading the Disk and Network performance, as these are crucial for fast and efficient registry operations, or consider separating the registry server from GitLab itself.
Conclusion
Managing GitLab and Docker registry could be hard, but hopefully, this guide will help you troubleshoot and fix build push issues more easily. An efficient and stable deployment will make your time as a DevOps engineer easier.