DVC (Data Version Control) is an open-source version control system designed specifically for managing machine learning projects and data science workflows. It extends the capabilities of Git by enabling users to track changes in large datasets. The data are stored in a remote storage system, while the dvc metadata is stored in Git. This allows for efficient collaboration and reproducibility in data science projects. Syntactically, DVC integrates seamlessly with Git, allowing users to version control not only code but also data and machine learning models. It provides a command-line interface (CLI) that is familiar to Git users, making it easy to adopt.
At current iits projects we mainly use it for managing the datasets, not the machine learning models, themselves, as there are other tools that are better suited for that purpose (e.g. MLflow).