We’ve open-sourced our bulk image downloader

pixolution
2 min readMay 10, 2019

--

Photo by Iwan Shimko on Unsplash

We at pixolution like to craft our own tools to simplify daily processes. When we needed an image downloader to store large amounts of web images locally from a given list of image URLs, we started coding. The result is a lightweight and fast bulk image downloader. It’s open source and we are happy to make it available to you on GitHub.

When we offer services like de-duplication of image databases we have to download images from our customers. Often, these image data are provided as a list of image URLs for thousands or millions of images that we need to download to our local servers for further processing. This is where the image downloader comes into play. The job of the image downloader is to download the list of image URLs and quickly save the images in a local folder.

It’s simple, yet brilliant. See the features:

Multithreaded downloads

Downloading a large list of images takes a lot of time if you download only one image at a time. Therefore, our image downloader can download multiple images at the same time.

Support for rate limits

Multithreaded downloading is cool. But only until you get blocked by the download server. To prevent this, you can set the maximum number of images you want to download per second. This way you’re able to control the download speed without overloading the server.

Preserving of the context path

The downloaded images are stored locally in the same folder structure as on the server. This is useful when downloading images with the same file name from different folders on the server.

Progress bar

The progress bar provides a quick overview of the download process and shows the estimated remaining time.

Possibility to store images in a tar archive

Folders with millions of images are difficult to handle and copy. To ease further processing, the downloader can store the images directly in tar archives.

If you have similar tasks to do, we are happy if you use our image downloader. Try it and share your experiences with us. We’ll be excited to have your feedback.

Originally published at pixolution.org/blog.

--

--

pixolution

Self-hosted AI image search & custom AI solutions. We turn data into actions, ultimately increasing workflow efficiency and productivity.