Skip to main content

38 docs tagged with "IIT Madras"

View All Tags

Alphabeta

Two modifications are made to the agent: the minimax algorithm is optimized with alpha-beta pruning and complexity is added to the heuristic.

Behavioral Finance

First, our team interviewed a number of professionals and distributed a questionnaire in the public as a survey to understand behavioural finance.

Certificate of Distinction

I obtained a Certificate of Distinction for completing both Diploma in Data Science and Diploma in Programming with a CGPA over 9.5 (9.64).

Cleaning

There are features like the name of the passenger and the cabin which cannot be used for analysis directly.

Data

The data provided includes the following, with the names and ids (join key) of 4668 stocks as common columns in all of them:

Data

The dataset used for the competition consisted of features indicating amount of time spent on various pages of the site by the visitor, personal details of the visitor such as gender, marital status and education, and OS/search engine being used by the visitor.

Data

The Titanic Survival dataset is simple - it contains details of passengers including personal details (name, gender, age, family), passenger details (class, cabin, embarked from, fare of ticket) which are input features, and whether they survived, which is the target feature.

Estimation

Various models were tried for this problem, with the exception of deep neural networks, since tensorflow and pytorch were forbidden for the project/competition.

Estimation

To start with, I will use the RandomForest estimator and see how it does.

Feature Engineering

Let's divide the train data into x and y now that it has been cleaned and preprocessed.

Genetic Algorithm

Next, Genetic Algorithm is applied to this initial population of tours for a number of generations to make it 'fitter'.

Improvements

Since this was my first ever Kaggle competition and Machine Learning project, I was familiar with and could implement only the basics that I detailed. There were a lot more things I could have done.

Initial Population

The initial population is generated with the Nearest Neighbor heuristic starting once with each city.

Introduction

This is the Kaggle competition regarding Game AI and Reinforcement Learning.

Introduction

This series began with an introductory DSA course that taught the common data structures used, graph algorithms, greedy algorithms, divide-and-conquer algorithms, etc.

Introduction

This project was for the course AI: Search Methods for Problem Solving.

Introduction

These were a series of mandatory and elective courses including Deep Learning (basic framework), Computer Vision, Introduction to NLP, Speech Technology and Large Language Models.

Introduction

Although named Finance, this category included both Finance and Economics courses.

Introduction

This was the project for the course Financial Forensics, which gave us the financial statements and ratios of over 4000 stocks as well as prices at two time periods, and required us to make an investment portfolio, with a given investment budget.

Introduction

This was a group project done in a team of 5 members. The project was about researching aspects of behavioural finance and portfolio management, coming up with a strategy to manage personal finance.

Introduction

I started the BS degree in Data Science and Applications at IIT Madras in September 2021. The program consisted of trimesters (3 terms of 4 months long each every year) instead of semesters.

Introduction

This project was for the course Machine Learning Practice.

Introduction

There were mutiple courses on Machine Learning including Machine Learning Foundations (linear algebra and basic algorithms like PCA), Machine Learning Techniques (detailed mathematics of Support Vector Machines, Ensembling, etc) and Machine Learning Practice (implementation with scikit-learn, xgboost, etc).

Introduction

This is the beginner, introductory Kaggle competition that every new Kaggle member does. Since I had learned a lot of new techniques at the time, I decided to apply them all to this dataset as practice.

Iteration

After a new population has been created through Simulated Annealing, Genetic Algorithm can once again be applied to it to improve its fitness.

Minimax

The agent follows this algorithm to decide its next move deterministically:

MLOps

The problem with is every step - cleaning, imputation, encoding, feature engineering, etc, is done separately so if a new test sample is given, one cannot directly make a prediction and will have to carry out every step all over again. To solve this, I am going to create a 'preprocessor' class with a transform method that does everything I have done until now and make a pipeline with this preprocessor as the first step and the trained model clf as the second step.

Model

Now that the dataset is fully preprocessed and has the right features, the Ridge estimator is fitted and feature importance is computed.

Portfolio

A diversified portfolio is made through proportionate allocation of the budget in each of the shortlists.

Portfolio Optimization

A notebook was written to build a diversified portfolio for an individual, considering investment instruments such as mutual funds, ETFs, etc.

Preprocessing

The data was first cleaned and preprocessed to handle missing values, categorical features, outliers, class imbalance and redundant features.

Preprocessing

Now that we have cleaned the data into an organized format, we can proceed with preprocessing, i.e., imputing missing values, encoding categorical features and scaling the data if required.

Scoring

The fitted model is now used to predict t_2 prices and that is used to score each stock based on prediction error and growth.

Shortlisting

Three different strategies are used to shortlist stocks. df is the dataframe contaning all the features that was made earlier.

Simulated Annealing

Now that we have a fit population, we can try to arrive at an optimal solution starting from each member and traversing the solution space.