Data ScienceMachine Learning

Find Datasets for your Machine Learning Projects

Artificial intelligence and robotic machine learning education concept. Robot teacher explains modern theory. Classroom interior with empty black chalkboard. Blue gray colorful background. Copy space for text messages

Source and Credit: Rodolfo Mendes

The fundamental step to work on a Machine Learning project is to state well the problem you want to solve. Usually, the problem definition comes from a business or a domain like marketing, finance, medicine, engineering, and others. Thus, we must state the problem in terms of the business or the domain and then map it to a typical Machine Learning task.

Once we have clarity on which problem we are trying to solve, we need to get the necessary data to solve the problem. Data is the feedstock for Machine Learning models. They matter more than the algorithms you plan to use.

But building a dataset is not a trivial task. In a business environment, we need to reach other departments to get access to databases, documents, and unify different data sources. For personal projects, this is more difficult because sometimes we don’t even have access to the data we need. 

So if you are working in a project for your portfolio, it is a good idea to work on real and public datasets. Besides saving time from collecting and processing data, your work on open datasets can provide useful insights into public interest.   

In this post, we list some public repositories from which you can download datasets to use on your projects.

Collaborative Repositories

Universities and companies maintain these repositories. Researchers and practitioners from different areas and domains make their data available on these repositories:

Government and political organizations

Governments and organizations keep portals of public datasets about the economy, education, health, agriculture, and other areas of public interest. Below check data repositories from some international organizations:

Also, many countries maintain open data repositories:

Dataset search

Finally, you can search the Internet for the data you need. You can use the Google Dataset Search tool to search the Web to find the dataset you need:


Good Machine Learning projects start with quality data. Use the lists and tools above to find the data you need for your project. Also, you can navigate and explore the repositories to find ideas for new projects.

Related posts
AIData ScienceFeatured NewsMachine Learning

Relationship between Artificial Intelligence, Machine Learning and Data Science

“People worry that computers will get too smart and take over the world, but the real problem is…
Read more
Featured NewsMachine LearningTechnology

Machine learning pushes quantum computing forward

Researchers have created a machine learning framework to precisely locate atom-sized quantum bits in…
Read more
Featured VideosMachine Learning

Machine Learning Basics 3

Introduction To Machine Learning | 3. Types of Machine Learning Share on…
Read more
Become a techsocialnetworker

Sign up to techsocialnetwork




Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Data ScienceFeatured News

Guide to Exploratory Data Analysis with JHU COVID-19 Data

Worth reading...
%d bloggers like this:
Skip to toolbar