Work Experience


1. Razorpay | Data Scientist
developing statistical and probabilistic models.

2. Sharechat & moj | Machine Learning Engineer
- my latest project was with sharechat's camera AI team. I was working on optimising deep neural nets for face landmark prediction on low end mobile devices.
- in previous project, I worked with team to design video commerce pipeline for suggesting similar apperals as present in short videos. Prepared hierarchy for Indian apperals data annotation, did image tagging, implemented tag based and image similarity based search system.

Internships


3. Entrupy | Machine Learning Intern
- trained deep siamese networks for point cloud registration in 3D. Learned interesting synthetic data creation on blender.

4. Ultrainstinct | Machine Learning Intern
- re-trained/re-produced existing state-of-the-art multi model deep learning pipelines on ActivityNet for video captioning as a part of exploring the problem of automatic theft detection using survelence camera feed.

Past projects in Machine Learning


1. Video captioning on ActivityNet
abstract
Video captioning is a popular task that challenges models to describe events in videos using natural language. In this work, we investigate the ability of various visual feature representations derived from state-of-the-art convolutional neural networks to capture high-level semantic context. We introduce the Weighted Additive Fusion Transformer with Memory Augmented Encoders (WAFTM), a captioning model that incorporates memory in a transformer encoder and uses a novel method, to fuse features, that ensures due importance is given to more significant representations. We illustrate a gain in performance realized by applying Word-Piece Tokenization and a popular REINFORCE algorithm. Finally, we benchmark our model on two datasets and obtain a CIDEr of 92.4 on MSVD and a METEOR of 0.091 on the ActivityNet Captions Dataset.

2. BTech Project on Visual Question Answering
abstract
While a lot of work has been done on developing models to tackle the problem of Visual Question Answering, the ability of these models to relate the question to the image features still remain less explored. We present an empirical study of different feature extraction methods with different loss functions. We propose New dataset for the task of Visual Question Answering with multiple image inputs having only one ground truth, and benchmark our results on them. Our final model utilising Resnet + RCNN image features and Bert embeddings, inspired from stacked attention network gives 39% word accuracy and 99% image accuracy on CLEVER+TinyImagenet dataset.


books/reports/articles I had been reading

why nations fail? origin of power, prosperity and poverty by __
understanding power: noam chomsky
India after Gandhi by ranchandra guha
Our common Future - Report of the World Commission on Environment and Development (~1987)
the age of reform by richard hofstadter
Vision 2020 | India by APJ abdul kalam with Y S rajan
theextremefuture by james canton
random walk down wall stree by burton g malkiel
the intelligent investor by benjamin graham
getting things done by david allen
newsman: tracking India in modi era by rajdeep sardesai
the coalition years by pranab mukherjee
Superpower?: The Amazing Race Between China's Hare and India's Tortoise by raghav bahl
GST, Arun Kumar
The End of poverty, Economic Possibilities of Our Time, by Jeffrey Sachs
India UnInc (unincorporated)
Unleashing the Innovators: How Mature Companies Find New Life with Startups by Jim Stengel and Tom Post
Making Globablization WORK by Joseph Stinglitz
Rebuild
Play Bigger: How Pirates, Dreamers and Innovators Create and Dominate Markets by Al Ramadan, Christopher Lochhead, Dave Peterson, Kevin Maney
Ancient and Medieval History, Hayes Moon
Inquiry into the Nature and Causes of Wealth of Nations
The Value of Nothing by Raj Patel