nomagi logo

ML Projects Done Right: Our Story with ClearML

clearml logo dark text (3)
Share Article:

Maciej Styk, MLOPS Engineer II 

If you’ve worked on early machine learning projects, you probably know the feeling. You have code scripts everywhere, various spreadsheets with scores & parameters written down, and randomly named files with model weights. 

As our team got bigger and our AI projects became more ambitious, we realized the simple “spreadsheet” methods just wouldn’t work anymore. We needed to get organized. We needed reliability and a clear view of our work. This is the story of how a few years back, we found a key tool to help us: ClearML and we’ve been using it ever since.

We needed a tool that could track our experiments & artifacts and more importantly, bring order to our entire AI development process. After looking at a few options, ClearML stood out as the best choice. It had all the features we needed, was easy to set up, and promised to help us manage our work from start to finish. We decided to go for it and the impact has been huge.

Here’s a closer look at the key ways ClearML has improved our workflow.

Pillar 1: Getting the Same Results Every Time (Reproducibility)

It is vital in machine learning to make sure you can repeat your results. If you can’t, fixing problems is a nightmare, working together is hard and you can’t really trust your own work.

The challenge before ClearML: Before, repeating an experiment meant going through a long process. We had a couple of spreadsheets that were multiplying with every project we started, parameters used to run the training had to be written down manually and once the training was running, we had no easy option to verify what parameters were used. 

How ClearML solved it: ClearML acts like an automatic note-taker for every experiment. It automatically saves:

  • Exact git commit: This includes the exact version from Git and, amazingly, even Git diffs. No more wondering if a “small test” was part of the final run.
  • The training config: ClearML collects a full YAML dump of our training configuration.
  • All settings: It remembers all the settings and parameters used to run the experiment (e.g., number of used GPUs, used GPU drive or Python version).

The impact: Now, repeating an experiment is as simple as clicking a “clone” button. This has been a huge help for:

  • Getting new team members started: They can easily look at old experiments and understand how they were done.
  • Finding bugs: If a model isn’t working right, we can go back to the exact experiment and see what went wrong.

Pillar 2: Seeing Exactly What Happened (Full Visibility)

To make better models, you need to understand why an experiment gave you a certain result. When you just put data in and get a result out with no idea of what the model metrics are, it’s like walking in the dark.

The challenge before ClearML: We tried to use our own, super simple tools for metrics comparison. Even though, after some training, the tool got the job done. Nonetheless, it was a hassle, and trying to see performance trends over time was a lot of work. Figuring out which code and dataset created a specific result often felt like detective work.

clearml experiment manager

How ClearML solved it: ClearML gives us a single, easy-to-use dashboard that acts as a command center for all our ML training.

  • Automatic tracking: ClearML automatically saves all the important numbers, graphs, images and text output during our training (even raw training logs).
  • Easy comparison: We can easily see all of our experiments on a single page and compare the differences between them, which makes it easy to spot the tiniest changes. We can filter by metric to compare results and find the best ones.
  • Clear graphs: We can track our model’s performance with built-in graphs, which let us see how different experiments stack up against each other.

The impact: This transparency has allowed us to:

  • Fix problems faster: We can quickly spot what’s different between a failed run and a successful one.
  • Work together better: Team members can easily share their work with full context, so everyone is on the same page (a single link to ClearML means sometimes more than a thousand words)..
  • Make smarter decisions: With great tools for tracking and comparison, we can confidently choose the best models to move forward with.

Pillar 3: Training Artifacts in the Right Place (Easy Management)

ML training creates a lot of files: model weights, graphs, debug data, samples, reports, and many more, depending on the type of ML training you run. Keeping all of them organized is key to working well.

The challenge before ClearML: Our files were hiding in the deep abyss of cloud buckets. Models were easy to lose since every training produced tens of files with cryptic names. This led to confusion, accidental mistakes and made it hard to find the right file when we needed it.

How ClearML solved it: ClearML makes sure that your files are there when you need them.

  • Central storage: All of our artifacts are stored in one place – we can manage them easily through the ClearML UI.
  • Ease of browsing: Every-chart, Time-series & Debug images and more are easily available for us to browse through. This gives us peace of mind and the ability to quickly jump through the most critical files.

The impact: This organized approach to file management has led to:

  • Less clutter and confusion: No more guessing which file is the right one.
  • Better teamwork: Everyone on the team can confidently find and use artifacts from training run by other engineers.

The Big Picture: Our Work is Now Clearer, Faster, and More Reliable

Using ClearML was more than just adding a new tool. It helped us build better work habits. Because the platform handles so much of the tracking and organization automatically, our team spends less time on manual record keeping and more time coming up with new ideas.

We now experience:

  • Faster development cycles.
  • Less time spent on fixing problems and searching for typos.
  • Better teamwork and knowledge sharing.
  • More confidence in our models and how we build them.

ClearML has become an essential tool for our ML team. It brings a level of clarity and efficiency to our work that we couldn’t achieve before. If your team is struggling with the messy parts of machine learning and you want to improve your process, we highly recommend taking a look at ClearML.

Share Article:

Your Future in Robotics Starts Here

Get in touch with Nomagic to see how our innovative technology can take your fulfillment processes to the next level.