OKBase: Document matching (Lead Developer):
Tech Stack: Python, Langchain,LLMs(Opus, Haiku, Sonnet-3.5, gpt-4o),Graph DataBase(Neo4j), Cosine Similarity, fastapi,Pandas,llama-index,etc
Description:
- Converted PDF documents into structured JSON data for analysis and storage.
- Utilized Neo4j to store and manage complex relationships between document entities.
- Developed efficient methods to fetch and query data from the Neo4j graphdatabase.
- Implemented document similarity matching using LLM-generatedembeddings and cosine similarity.
- Created RESTful APIs with FastAPI to handle document processing, storage, and retrieval.
- Leveraged Pandas for data manipulation and analysis to support document processing and matching.
- Employed Llama-index to extract text from pdf for further processes.
Text to Image:
Tech Stack: Python, fastapi,streamlit openai(Dalle-3), Fine-Tune ,cloud service,etc
Description:
- Environment Setup: Install the required libraries including FastAPI, Streamlit, and OpenAI's API library. Ensure that you have access to DALL-E 3 through OpenAI's API.
- Create a FastAPI Backend: Set up a FastAPI application to handle image generation requests. This application will take in text input, call the DALL-E 3 API to generate images, and return the images as responses.
- Integrate with DALL-E 3: Write a function in FastAPI app that communicates with the OpenAI API, sending the text input to DALL-E 3 and receiving the generated image.
- Develop a Streamlit Frontend: Build a user-friendly interface using Streamlit, where users can input text and see the generated images. Streamlit will send the text input to the FastAPI backend and display the returned images.
- Deployment: Deploy the FastAPI and Streamlit applications on a cloud platform or server. Ensure that they are properly connected so that the frontend can interact seamlessly with the backend.
- Testing: Test the complete system by entering various text inputs and verifying that the images generated by DALL-E 3 match the descriptions.
- Optimization: If needed, fine-tune the interaction between text inputs,prompt and image outputs by adjusting parameters or refining the text descriptions sent to DALL-E 3.
Image to Image with prompt:
Tech Stack: Python, fastapi,streamlit openai(Dalle-2), Fine-Tune ,cloud service,etc
Description:
- Python: Core programming language for integrating various components and services.
- FastAPI: Utilized as the backend framework for handling API requests and managing the generation process.
- Streamlit: Frontend interface for user interaction, allowing users to input prompts and view generated images.
- Masking: Creating masking of image to change on specific part with prompt
- OpenAI (DALL-E 2): Image generation model used to transform the input image based on the provided prompt.
- Fine-Tune: Applied to enhance or specialize the model for specific image generation tasks.
- Cloud Services: Employed for scalable processing and storage, ensuring efficient handling of image data and model operations.
Stock Price Prediction and Decision-Making:
Tech Stack: Pre-processing the dataset and used ML models, LSTM, ARIMA, SARIMA and etc
Description:
- This project aims to develop a prediction model that forecasts future stock prices based on historical open prices and provides actionable investment decisions— whether to buy, sell, or hold stocks.
- Explore various algorithms such as Linear Regression, Decision Trees, Random Forests, Gradient Boosting Machines, and Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) networks, for time series forecasting.
- Use evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared. Perform hyperparameter tuning and cross-validation to enhance model performance.
- Develop an application or API to provide users with predictions and actionable recommendations.
Roles & Responsibilities:
- Client communication (timely updates)
- Code quality and review and version controls on github
- Pre-Processing (ETL pipeline) of Data
- Split data into train & test dataset
- Implementing several ML/DL model depending on requirement and R&D (i.e. Linear Regression, ARIMA, Decision Tree, ANN, CNN, RNN and etc)
SQLify: Next-Gen Natural Language Query Engine (Lead Developer):
Tech Stack: Python, langchain, GooglePalm(LLM), VectorDataBase(Chroma), HuggingFace Embeddings, etc
Description:
- The project integrates cutting-edge LLMs, GooglePalm to comprehend natural language queries and generate SQL queries with human-like fluency and accuracy.
- These models are fine-tuned on large datasets of SQL queries and corresponding textual descriptions, ensuring robust performance across Electronic devices and tools domains.
- The system employs a sophisticated query generation pipeline that transforms natural language input into executable SQL queries.
- This pipeline involves LLM model, db-connection, sentence embedding, storing embedded vectors, syntactic parsing, semantic analysis, and custom query setup template,etc ensuring the coherence and correctness of generated queries.
Soccer video analysis (Lead Developer):
Tech Stack: Python,Django,Computer vision, deep learning yolo,html,css, javascript, GCP- gcs, vm instance
Description:
- Analysis and Tagging of players in video ,designed to enhance the understanding and interaction with other players.
- The system aims to analyze the behavior and actions of Player in video, extracting valuable insights to improve and teach to Players for their action.
Recommender system(Content-based):
Tech Stack: Python, Pandas, linear regression, Correlation,Vectorization ,Feature
Engineering , word embedding
Description:
- A fully functional personalized recommender system capable of generating accurate and relevant recommendations based on user preferences and behavior.
- Improved user engagement and satisfaction through personalized content discovery, leading to increased user retention and loyalty.
- Insights into the effectiveness of different recommendation algorithms and strategies in various contexts, informing future enhancements and iterations of the system.
Electronic and Fashion Recommendation to the user on E-commerce site (Lead Developer):
Tech Stack: Cosine similarity, F1 Score, collaborative filtering technique
Description:
- Developed a system which recommends electronic and fashion items to users based on their previous preferences.
- The recommendation system is built using cosine similarity and F1 score to ensure accuracy. The Amazon dataset is used for this purpose, and data preprocessing is carried out to clean and transform the data into a usable format.
- The data preprocessing step involves cleaning the Amazon dataset by removing irrelevant and duplicate data, handling missing values,and transforming the data into a format that can be used for modeling.
- The recommendation system employs a collaborative filtering technique that analyzes the user's past behavior and the behavior of similar users to generate recommendations.
- Specifically, the system computes the cosine similarity between the user and the items, and recommends items that are most similar to the ones the user has already liked.
- To evaluate the accuracy of the system, the F1 score is used as a performance metric.
- The F1 score combines precision and recall to measure the system's ability to correctly recommend items that the user is likely to be interested in.
Topic Modelling using Machine Learning:
Tech Stack: LDA, NMF, NLP, Stop Words
Description:
- To predict the categories based on the dataset provided.
- Generation of word clouds used the Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) Machine Learning model on the text to identify the most important topics or themes in the text data.
- To visualize the topics, word clouds were generated for each topic.
- Word clouds represented the most frequent words in a topic, with larger font size indicating higher frequency.
- Also used K-Means clustering ML algorithms for grouping of similar categories based on the prediction.