Orchids – Paula Vasconcelos

ML classification of orchid species

Analysis and Classification of Orchid Types

Overview

This project involves the analysis of a dataset related to orchid species, with the objective of classifying known types and predicting the types of unknown samples using various data analysis and machine learning techniques.

Data Preparation

The analysis begins with the loading and preprocessing of the datasets. The data consists of features related to different orchid species, separated into known and unknown types. Preprocessing steps included handling missing values, standardizing features, and encoding categorical variables to ensure the data was suitable for analysis.

Exploratory Data Analysis (EDA)

A thorough exploratory data analysis (EDA) was conducted to understand the distribution and relationships within the data. This included visualizing variable distributions, examining correlations, and identifying any outliers or patterns that could inform further analysis.

Principal Component Analysis (PCA)

To reduce the dimensionality of the dataset and facilitate visualization, Principal Component Analysis (PCA) was applied. The PCA results revealed distinct clusters corresponding to different orchid species, with the first two principal components capturing a significant portion of the variance in the data.

Machine Learning Models

Several machine learning models were implemented to classify the orchid species:

Logistic Regression
Random Forest Classifier
K-Means Clustering

These models were trained on the known data and then used to predict the types of unknown orchid samples. Model performance was evaluated using metrics such as accuracy, confusion matrices, and cross-validation scores to ensure robust predictions.

Results and Conclusions

The PCA and clustering analyses successfully identified clear separations between different orchid types. The machine learning models demonstrated high accuracy in classifying the known orchid species and provided reliable predictions for the unknown samples. The analysis provided valuable insights into the distinguishing features of each orchid type, contributing to the broader understanding of the dataset.

This project showcases the effective application of data preprocessing, dimensionality reduction, and classification techniques to solve a real-world problem in species classification.



Click here for Github repo