My Stories

#1. MEFASH-WORLDβœ‚

MEFASH-WORLD is an acronym for MEN FASHION GENERATING SYSTEM. The whole concept of MEFASH-WORLD was born out of the need to solve a problem in the fashion industry. Am going to be walking you through the whole process of creating a system that can generate new efficient designs, yes, fashion designs for men.

Before we start building this system there are some basic concepts we need to know and the field of DATA SCIENCE AND ANALYTICS is a field that is vital in this development process.

WHAT IS DATA SCIENCE(DS)?

Data Science is a multi-disciplinary field that brings together or combine concepts from three(3) major fields: Computer Science, Statistics/Machine Learning, and Data Analysis to understand and extract insights from the ever-increasing amount of data. One very interesting thing is that data is in abundance today and we have moved into the era of BIG DATA. Everyday we accumulate thousands of data, for instance, everyday you generate data, yes you, are you surprised how?. Ok, from the time you wake, take your bath, eat, the food you eat, the time you leave for work/school, all the activities you carry out within the day, these are all data πŸ˜ƒπŸ€“.

PARADIGMS OF DATA RESEARCH:

There are actually two paradigms of data research, namely;

  1. HYPOTHESIS-DRIVEN: In this approach the problem is given or the problem is identified but the data to be used is not given. So you are meant to source the data yourself.
  2. DATA-DRIVEN: Here the data is given or already at hand but no problem is identified yet. For example, a company that already have registered customer and have their data, can use the data with the goal of serving their customers better.

But for our MEFASH GENERATING SYSTEM, we will be making use of the Hypothesis-Driven approach and this is because we have identified a problem and we are to source for the dataset to be used in building our system. And the dataset is going to be images.

Which paradigm of Data Research do you think is the best? And why?
Leave your answers for me in the comment session below πŸ‘‡πŸ‘‡.Bye πŸ₯°


#2. MEFASH GENERATING SYSTEM: ALGORITHM CLASSIFICATION METHOD

Hello guys, welcome back to another session on the how to create a FASHION GENERATING SYSTEM. A quick flashback to my first post (INTRODUCTION TO MEFASH-WORLD: Here ),I said we will be creating a system that can generate new efficient designs, yes, fashion designs for men. And yesterday I stated that we will be using the HYPOTHESIS-DRIVEN paradigm of Data Research.
We will also be making use of Machine Learning algorithms too but before that we need to know the two main types of machine learning algorithm, they are four actually.

CLASSIFICATIONS OF MACHINE LEARNING ALGORITHMS

  1. SUPERVISED LEARNING: This is when algorithms are provided with training data and correct answers. All the materials are Labeled to tell the machine the corresponding value to make it predict the correct value. The task of the Machine Learning algorithm is to learn based on the training data and apply the knowledge to real data. For example, a Machine Learning algorithm to identify Happy and Unhappy pictures, You will have to feed or train your model with photos of you in your happy and unhappy mood, and then later your model will be able to identify new happy and Unhappy photos of you πŸ˜‰. Hence this involves manual labeling or classification of data.
  2. UNSUPERVISED LEARNING: These machine learning algorithms do not have a training dataset and no material is labeled. They are presented with some data about the real world and have to learn from that data on their own and the machine classifies materials itself by detecting the characteristics of the data. Unsupervised learning algorithms are mostly focused on finding hidden patterns in data. For example, a Machine Learning algorithm to identify if an image is of a dog or a cat. the machine will have to decide which of the picture(let's say from 100 pictures) from the data is a dog or a cat, by itself, and also classify them at the same time. Hence this does not involve manual labeling or classification of data πŸ€“.
  3. SEMI-SUPERVISED LEARNING: Here only a few of the datasets are classified or labeled, serving as a hint to the computer. The computer finds features from the already labeled data and then classifies other data accordingly. For example, 10 out of 100 images are labeled manually and the computer labels the rest.πŸ˜ƒ
  4. REINFORCEMENT LEARNING: Here the algorithm learns based on external feedback given by the environment or a thinking entity. This is applied in a machine learning algorithm that learns how to play games against an opponent. The moves that lead to victories are learned and repeated while moves that lead to failures are avoided. So here the computer is learning from experience.😁

But we are going to be making use of the UNSUPERVISED LEARNING approach for MEFASH GENERATING SYSTEM.
We will be stopping here for today, I hope you gained a thing or two from this session.
You can leave your questions about anything you are not clear about or comment on in the comment section below πŸ‘‡. ByeπŸ₯°


#3. MEFASH GENERATING SYSTEM: DATA IDENTIFICATION

Hello guys πŸ€“ πŸ‘‹, I warmly welcome you to another session on how to build a FASHION GENERATING SYSTEM, in this case, MEFASH GENERATING SYSTEM 😁. Sit tight as today on this episode we will be looking at how to identify our data, that is, the kind of data that will make up our dataset and how to prepare them😊.

DATA IDENTIFICATION

In order for you to Identify the data to be used, you need to first identify the problem, in this case, our problem is how to create new fashion styles and we will be using machine learning to solve this problem. So if we are solving the problem of how to use machine learning to come up with a model that can generate new styles for customers, what do you think our dataset will look like? Now there are five types of data, that is, five forms in which our data can appear, namely:

  1. STRUCTURED DATA: Data is said to be structured data if it can fit into a table, spreadsheet, or relational database. These are data that have predefined structures.
  2. UNSTRUCTURED DATA: This is the opposite of structured data, they are data with no predefined structure, they come in any size or form and cannot be easily stored in a table. Examples: audio, video, images.
  3. QUANTITATIVE DATA: Are mainly numerical data. Example: weight, height.
  4. CATEGORICAL DATA: These are data that can be divided into groups. Example: gender, sex, tribe.
  5. BIG DATA: These are very large or massive datasets and cannot fit into the memory of a single machine. They are characterized by three VS (variety, volume, velocity).

But for our MEFASH GENERATING SYSTEM, we will be making use of the Hypothesis-Driven approach and this is because we have identified a problem and we are to source for the dataset to be used in building our system. And the dataset is going to be images.To answer the question I asked earlier, we are going to be making use of the UNSTRUCTURED DATA. This is because our dataset is going to be made up of images of previous fashion styles which we will use to feed our model and train the model. This is getting interesting πŸ˜ŠπŸ€“.
OK, that will be all for today guys, see you tomorrow for the next session πŸ˜‰πŸ‘.

Which paradigm of Data Research do you think is the best? And why?
You can leave your questions about anything you are not clear about or comment on in the comment section below πŸ‘‡. ByeπŸ₯°


#4. MEFASH GENERATING SYSTEM: DATA PREPARATION

Hello people, hope you are having a great day 😊?. We will be making progress on our way to building our MEFASH GENERATING SYSTEM. Since we already know we will be working with images as our dataset, we will be talking about Data Preparation today.
The fact is, machine learning models depend on data, but before bringing your data into your machine learning model it is important to make sure it is clean, accurate, and consistent. To create a machine learning model, it is crucial that you are able to Train, Test and Validate the data before production. Hence data preparation is used to produce Clean, Accurate, and Consistent data (CAC) that can be fed to your machine learning model.
Once our data is identified and collected, data preparation is carried out to assess the condition of the data which includes dropping incomplete records, outlier detection, random value filling, heuristic guess, interpolation. But these techniques are mostly used for structured data, but since we are dealing with unstructured data we will be looking at how to classify our data into Training data, Testing data, and Validation Data.

TRAINING DATA

In machine learning, training data is the data you feed to your machine learning algorithm for the purpose of training your model. Since we are using the supervised learning approach our training data is going to require human involvement to analyze and process for the computer to use, which means we are going to have to label the data. Training data is usually 80-90% of the dataset.

TESTING DATA

The test data is usually 10-20% of the dataset. It is a set of observations used to evaluate the effectiveness or performance of the model. The testing data are examples that the machine has not seen before, if it contains data that the machine has seen before or training data, it will be difficult to know or assess whether the machine has learned and this could lead to OVER-FITTING (which means memorizing the training data).

VALIDATION DATA

Validation data is required in some cases. It is used to tune variables called HYPERPARAMETERS which are responsible for controlling how the model is learned.

This is where we call it a wrap for today. I will discuss how to collect data(images) in a very easy way subsiquentlyπŸ˜‰πŸ˜.
You can leave your questions about anything you are not clear about or comment on in the comment section below πŸ‘‡. ByeπŸ₯°


#5. MEFASH GENERATING SYSTEM: HOW TO COLLECT DATA FROM THE INTERNET USING A CRAWLER

There are several crawlers and frameworks for scrapping data on the World Wide Web(www), but today we are going to focus on ICRAWLER and we will be making use of the python programming language.
Good day and welcome to another exciting session of MEFASH GENERATING SYSTEM 😊😁. Today I will be teaching you how to crawl or scrape data from the internet using ICRAWLER and Google Image Crawler. Don't worry, it's very easy and I will be walking you through every stepπŸ˜‰.

WHAT IS A CRAWLER?

A crawler is a computer program that automatically searches documents on the web. A web crawler is an internet bot that systematically browses the World Wide Web (www), typically operated by search engines like Google, Bing for the purpose of web indexing.

REQUIREMENTS

For you to be able to use ICRAWLER you will need the following on your computer system:

  1. ANACONDA NAVIGATOR: Anaconda Navigator is a desktop graphical user interface (GUI) that allows you to launch applications and easily manage conda packages, environments, and channels without using command-line commands.
  2. JUPYTER NOTEBOOK: This is a web computational notebook, it is open source so it is free. It can be used to combine software code, computational output, explanatory text, and multimedia resources in a single document.
  3. Python 2.7 and above You can download this Here.

Install ICRAWLER You will need to open your Anaconda prompt and type the following pip install ICRAWLER or conda install -c hellock icrawler. Then you type and run the following lines of code on your jupyter notebook :

  1. From icrawler.builtin import GoogleImageCrawler
  2. For keyword in ['plain styles for men']:
  3. google_crawler =GoogleImageCrawler( Parser_threads =2, downloader_threads=4, storage={'root_dir' : 'image_dataset/{}'.format(keyword) } )
  4. google_crawler.crawl( keyword=keyword,max_num=10, min_size=(200, 200))

NOTE: In line 2, keyword is the type of data you want ICRAWLER to scrap for you. In line 6, storage is where you put the path to the folder where you want ICRAWLER to store the dataset for you.

Check Documentation for more information on how to use ICRAWLER.
OK, we stop here for today guys πŸ€“ πŸ‘‹.
You can leave your questions about anything you are not clear about or comment on in the comment section below πŸ‘‡. ByeπŸ₯°

Leave a comment


Events

Events Featured In

Facilitator at AI Invasion 2022
April 25-29, 2022

Data Scientists Network (DSN) is a registered non-profit oranization with a vision to build a world-class artifitial intelligence knowledge, research and innovation ecosystem that delivers high impact and transformational research, business use applications, AI-first start-ups, employability and social good use cases; such that in 10 years, they will raise 1 million AI tallents and build AI solutions that improves the quality of life and wellbeing of 2 billion people in emergiing market. This vision is avhieved through renewed focus on 3 core areas;

  • Community for learning and research
  • Product development for social impact
  • Partnerships for solution delivery

I partnered with DSN as a facilitator in the 2022 AI Invasion, an annual 6 weeks free AI class and training organized by DSN. This training features introductory Python, Data Science and Machine Learning.

  • Free AI classes in 80 cities across Nigeria
  • 30+ Hours of Learning
  • 100+ Tutors