The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
Updated
Jun 7, 2024 - Python
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
The open-source tool for building high-quality datasets and computer vision models
A Doctor for your data
Interactively explore unstructured datasets from your dataframe.
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
A curated, but incomplete, list of data-centric AI resources.
Curated list of open source tooling for data-centric AI on unstructured data.
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
Lesson guide and textbook for "Data as a Science" course.
A web service for semi-automated conversion of raw imaging data to BIDS
Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and auto-tag/caption models for your purposes. Custom datasets can be added!
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Package that builds a JSON inventory/manifest from public primary or derived datasets
tranSMART Arborist ETL toolkit
Client interface for all things Cleanlab Studio
Data Cleaning and Data Profiling Library for Python
Code for data linkage (curation of research database).
AqSolDB: A curated aqueous solubility dataset contains 9.982 unique compounds.
Add a description, image, and links to the data-curation topic page so that developers can more easily learn about it.
To associate your repository with the data-curation topic, visit your repo's landing page and select "manage topics."