Skip to main content

Command Palette

Search for a command to run...

Interview - Preperation

Published
13 min read
C

Hey there! I'm Chirag, a student currently pursuing a BTech in Artificial Intelligence and Data Science. I'm really passionate about two things: data science and teaching. I want to be a great data scientist.

I also love to teach and help others learn. It's my dream to be really good at both things.

I'm always learning and growing, working hard to achieve my goals. I'm excited for the future and all the possibilities it holds.

Join me on this journey as I explore the world of data and education. Let's make great things happen together!

Tell me about yourself ?

Hi, I’m Chirag Sanadhya, currently pursuing my B.Tech in Computer Science with a specialization in Artificial Intelligence and Data Science from Maharaja Agrasen Institute of Technology. I’ve always been fascinated by how machines can learn, and over time, that curiosity turned into a passion for building AI systems that actually work in the real world.

Most recently, I worked as a Research Intern at IIIT Naya Raipur under Dr. Santosh Kumar, where I focused on fine-tuning transformer models like BERT and RoBERTa for domain-specific NLP tasks such as text classification and question answering. I built custom data pipelines using spaCy and NLTK, and automated the entire training workflow using modular Python scripts.

Alongside my academic and research work, I’ve built several impactful projects—like ModelMatic, an end-to-end AutoML app, and a Multi-Modal OCR System using FastAPI, EasyOCR, and Streamlit. I’m also developing AI School, a learning platform that uses RAG and LLMs to make AI/ML education more interactive.

Beyond projects, I’ve qualified GATE CS and DA with AIR 1911 and mentor students through the Google Developer Student Clubs, where I conduct workshops and support AI initiatives.

Overall, I enjoy building practical solutions that bridge research and real-world applications, and I’m actively looking to grow in roles that combine machine learning, backend development, and AI research.

About Research Internship.

During my research internship at IIIT Naya Raipur, I worked under Dr. Santosh Kumar on a project focused on domain-specific Natural Language Processing tasks—primarily text classification and question answering using transformer models.

My main contribution was fine-tuning pre-trained language models like BERT and RoBERTa using Hugging Face Transformers, adapting them to a custom dataset specific to our domain. Additionally, I compared the performance of several other LLMs such as IndoBERT, DistilBERT, and ALBERT, evaluating them based on their accuracy and suitability for our task. This comparison helped identify the most effective model for our particular use case.

To prepare the data for these models, I built custom data preprocessing pipelines using spaCy, NLTK, and pandas, ensuring that the text was properly tokenized and cleaned. I also evaluated model performance using key metrics such as precision, recall, F1-score, and confusion matrices, helping us gauge how well each model was performing and where improvements were needed.

Overall, my contributions focused on selecting and fine-tuning the right models for our task and optimizing the data processing pipeline to ensure high-quality results.

About Projects

ModelMatic

One of the projects I'm particularly proud of is ModelMatic, which I built to simplify the machine learning process for users, especially those without a coding background. The idea behind ModelMatic was to create a no-code tool that could automate every step of the machine learning pipeline. Whether it’s regression or classification, users can upload a dataset, select the relevant columns, and train and test the data with just a few clicks. This no-code approach makes machine learning accessible to people who might not have any programming experience.

The tool does all the hard work behind the scenes: it automates the preprocessing, handles missing values, and takes care of feature engineering. It even allows users to visualize model performance in real-time, which is great for those who want to see how their model is performing without diving into complex code. I deployed ModelMatic on Streamlit, which provides an interactive, user-friendly interface for easy interaction with the tool. The goal was to empower anyone, regardless of technical expertise, to get hands-on with machine learning—and I believe ModelMatic achieves that by making it incredibly simple and approachable.


AI School: Interactive Learning Platform

Another exciting project I’ve been working on is AI School, which is still in its early stages but holds a lot of potential. The vision for AI School is to create an AI-driven educational platform that removes the traditional need for a teacher. Instead, the platform leverages the power of Large Language Models (LLMs) to guide students through chapters, explain concepts, and answer any questions they may have. It’s designed to make learning interactive and self-paced, where users can chat with the chapters themselves and get instant, personalized feedback.

One of the standout features of AI School is the ability to take mock tests based on custom syllabi, allowing users to prepare for exams at their own pace. Plus, it includes a progress tracker so students can monitor their learning journey across different subjects. The backend of the platform is written in FastAPI, which means it’s built with performance in mind and is ready for scaling as we add new features. While the frontend still requires some polish to make it user-friendly and engaging, the backend is fully functional and ready for further development.

I’ve also integrated Supabase for data storage, which handles both vector and non-vector data efficiently. The idea is to create an educational experience that feels more like interacting with a tutor than just following along with static materials. We plan to regularly update the platform as new learning methods and tools become available, keeping it relevant for a wide range of users.


Multi-Modal OCR System

One of my most recent projects is the Multi-Modal OCR System, a real-time platform designed to extract text from both printed and handwritten sources. This project is particularly exciting because it combines several cutting-edge technologies. At the heart of the system is EasyOCR, which powers the text recognition. The idea was to create a tool that could handle a wide variety of text styles, from neatly printed words to messy handwritten notes, with high accuracy.

The system also includes a YOLO-powered text detection module, which allows the platform to not only recognize the text but also detect the precise location of the text within an image, making it more effective for complex documents. I built the backend using FastAPI, which provides a solid foundation for exposing RESTful endpoints to interact with the OCR system. The Streamlit frontend offers an intuitive user interface, allowing users to upload images and instantly see the extracted text along with its corresponding bounding box and confidence scores.

One of the standout features of this OCR system is its multi-language support, which allows users to select different languages for text extraction and adjust the confidence threshold for each. This makes it highly adaptable for a global audience, whether you’re working with English, Spanish, or any other supported language. It’s a project that bridges the gap between research and real-world applications, and I’m excited to continue developing and refining it to handle more complex use cases in the future.


These projects reflect my passion for solving real-world problems through technology. Whether it’s creating accessible AI tools for non-programmers, building an educational platform powered by the latest AI advancements, or developing an OCR system that extracts text with precision and accuracy, I’m always focused on building systems that have a lasting impact and enhance everyday life.

ModelMatic

ModelMatic is a no-code tool I built to automate the machine learning pipeline and make it accessible to users with little to no programming experience. The idea was to enable anyone, regardless of their technical background, to upload a dataset, select the relevant columns, and train a model with just a few clicks.

Technologies Used:

  • Python: The core programming language used for building the logic and integrating the machine learning algorithms.

  • Streamlit: Used for deploying the interactive web application, providing a clean and user-friendly interface for users to interact with the tool.

  • Scikit-learn: Applied for model training and evaluation, handling both regression and classification tasks.

  • Pandas: Used for preprocessing the data, such as handling missing values, encoding categorical features, and normalizing the dataset.

  • NumPy: Used for numerical operations, especially for mathematical computations during data manipulation and model evaluation.

The platform’s simplicity makes it easy to take a dataset, select features, and train/test a machine learning model—whether it’s for predicting a continuous value or classifying data into categories. Scikit-learn handles the heavy lifting of training models, and Streamlit offers an intuitive way for users to interact with the tool.


AI School: Interactive Learning Platform

AI School is an educational platform I’m developing that uses Large Language Models (LLMs) to provide personalized learning experiences without the need for a teacher. Users can chat with chapters, get explanations on complex concepts, and take mock tests based on a custom syllabus.

Technologies Used:

  • FastAPI: Built the backend using FastAPI, which allowed for high-performance, scalable server-side functionality. It’s ideal for handling multiple requests efficiently as users interact with the platform.

  • Supabase: Used for vector and non-vector data storage, storing both structured data (like user progress) and unstructured data (like the actual content of chapters and quizzes).

  • LangChain: Utilized for enabling retrieval-augmented generation (RAG), allowing users to query the chapters in real-time and receive intelligent, context-based responses from the LLM.

  • Ollama LLM: Integrated the Ollama LLM for generating personalized quizzes and explaining concepts. This AI model is responsible for delivering detailed and contextually relevant explanations based on the chapters users are studying.

The platform is designed to evolve continuously, with new updates planned regularly based on user feedback and advancements in AI. The FastAPI backend is built to scale, while Supabase ensures that all user data and content are stored efficiently. LangChain helps create a highly interactive and dynamic learning experience by leveraging retrieval-augmented generation for personalized content delivery.


Multi-Modal OCR System

The Multi-Modal OCR System is designed to extract both printed and handwritten text from images. The main goal is to make text recognition as accurate and versatile as possible, even for complex, noisy documents.

Technologies Used:

  • FastAPI: The backend of the system is powered by FastAPI, providing RESTful APIs for seamless communication between the frontend and the OCR engine.

  • Streamlit: Deployed the frontend using Streamlit, creating an interactive interface where users can upload images and see the extracted text along with bounding boxes and confidence scores.

  • EasyOCR: The core of the text recognition process relies on EasyOCR, which uses deep learning models to detect and recognize text in images, both printed and handwritten.

  • YOLO: YOLO (You Only Look Once) is used for text detection, pinpointing the exact location of text within an image. This is especially useful for documents with multiple regions of text.

  • OpenCV: Utilized OpenCV for image processing tasks like resizing, cropping, and adjusting image quality to improve OCR accuracy.

  • NumPy: Employed for numerical computations in image transformations and text detection processes.

The system’s backbone is EasyOCR, which handles the text recognition, while YOLO helps detect the exact locations of the text within the images. The combination of these technologies allows for robust and precise OCR, even on complex handwritten documents. The FastAPI backend efficiently processes requests, and Streamlit makes it easy for users to interact with the platform and view results in real-time. The platform is also multi-language supported, giving users flexibility in text extraction from various languages.

Tools and Technologies in Resume.

  • Python: Core programming language for data manipulation, machine learning, and backend development.

  • C++: Used for system-level programming and algorithmic problem-solving.

  • SQL: Essential for working with relational databases, querying, and data management.

  • Bash: Used for scripting and automating repetitive tasks in development and deployment pipelines.

  • Scikit-learn: Python library for implementing machine learning algorithms for regression, classification, and model evaluation.

  • TensorFlow: Deep learning framework for building and training neural networks, especially for large-scale ML tasks.

  • PyTorch: Deep learning library focused on flexibility and ease of use, commonly used for research and production models.

  • NLP: Focused on natural language processing techniques like text classification, tokenization, and named entity recognition (NER).

  • LLMs: Used for fine-tuning and leveraging large language models (like BERT, RoBERTa) for various NLP tasks.

  • CNNs: Convolutional Neural Networks used for image processing and computer vision tasks.

  • RAG: Retrieval-augmented generation for creating more dynamic, context-aware AI responses in NLP tasks.

  • OpenCV: Open-source library for computer vision tasks, such as image processing, feature extraction, and video capture.

  • LangChain: A framework for building applications powered by language models, particularly used for retrieval-augmented generation.

  • Groq: Specialized hardware for accelerating AI and machine learning model inference.

  • Docker: Containerization tool for creating, deploying, and running applications in isolated environments.

  • FastAPI: Modern, high-performance framework for building APIs with Python, used in backend development for scalable applications.

  • Supabase: Backend-as-a-service platform for database management and real-time features, used in AI School for vector and non-vector storage.

  • Streamlit: Framework for building interactive, user-friendly web apps, primarily used for deploying machine learning models.

  • Flask: Micro web framework for Python, used for building lightweight backend applications.

  • Git: Version control system for managing code changes, collaborating, and maintaining project history.

  • GitHub: Platform for hosting and sharing code, collaborating on projects, and version control.

  • CI/CD: Continuous integration and continuous delivery pipelines for automating software testing and deployment.

  • AWS: Cloud computing platform for hosting applications, data storage, and scalable resources.

  • Google Cloud: Cloud services for machine learning, storage, and infrastructure, often used for AI/ML projects.

  • Hugging Face Hub: Platform for sharing and deploying pre-trained NLP models and integrating them into projects.

  • ONNX: Open Neural Network Exchange format for representing machine learning models across different frameworks.

  • PostgreSQL: Open-source relational database management system used for storing structured data.

  • MySQL: Popular relational database system used for backend data storage and retrieval.

  • Firebase: Real-time database and backend platform for building mobile and web applications.

  • Pinecone: Managed vector database used for machine learning applications involving large-scale vector search.

Some Question/Answers.

1. Can you describe a situation where you had to collaborate with a team to achieve a common goal?

In one of my projects, we worked as a team to build an AI-driven educational platform. My role involved backend development and integrating machine learning models. We collaborated closely to ensure the models performed well while the frontend team created a user-friendly interface. We had regular meetings to align our progress, which helped us quickly address any issues. By working together and sharing expertise, we successfully built the platform and made sure it met the project requirements.

2. Tell me about a time when you faced a technical roadblock. How did you overcome it?

While working on the Multi-Modal OCR System, I encountered challenges in extracting text from noisy or poorly scanned images. Initially, the OCR results were inconsistent, but I tackled the problem by fine-tuning the model using a custom dataset and optimizing the preprocessing pipeline. I also researched and implemented advanced image processing techniques like contrast adjustment and noise reduction. After iterating through different solutions, the OCR results became much more accurate and reliable.

3. Describe a time when you had to quickly learn a new tool or technology for a project. How did you approach it?

When I started working on the AI School project, I had to learn FastAPI quickly for the backend development. Since I had prior experience with Flask, I started by reading the official documentation and exploring tutorials. I also created small projects to practice using FastAPI’s features, like routing and database integration with Supabase. I consulted with online communities and participated in discussions to get deeper insights. By focusing on hands-on learning, I was able to build the backend efficiently.

4. How do you manage your time when working on multiple projects or tasks?

I manage my time by breaking down projects into smaller, manageable tasks and prioritizing them based on deadlines and complexity. I use tools like Trello and Google Calendar to plan my daily and weekly activities, ensuring I allocate enough time for each task. I also set short-term goals for each day to maintain focus and track progress. If I encounter unforeseen challenges, I adjust my schedule while making sure I don't compromise on the quality of my work. Regular reflection on my progress helps me stay organized and efficient.

5. What has been your biggest achievement in your academic or professional journey so far?

One of my biggest achievements was securing a Research Internship at IIIT Naya Raipur, where I worked on NLP models like BERT and RoBERTa for text classification and question answering. This opportunity allowed me to work on cutting-edge technologies, gain deep technical knowledge, and contribute to research in a meaningful way. I also published a blog on my work and learned a lot about automating workflows and fine-tuning LLMs. This experience helped me refine my skills and set the foundation for my future career in AI/ML.

6. Have you ever had to deal with failure or setbacks in your projects? How did you handle them?

In one of my machine learning projects, I initially faced issues with model accuracy, as the results didn’t meet expectations. I took the setback as a learning opportunity and reanalyzed the data and model architecture. I experimented with different algorithms, tuning hyperparameters, and incorporating additional features into the dataset. By iterating and learning from mistakes, I was able to significantly improve the model performance, and it taught me the importance of persistence and experimentation in AI/ML.