GSoC

GSoC 2026

Welcome to the C2SI Google Summer of Code (GSoC) 2026 project ideas page.

🚀 We’re excited to participate in GSoC for the 11th time, providing students and contributors with an opportunity to learn, collaborate, and contribute to impactful open-source projects.

🌟 Become a GSoC Contributor
Are you new to open source and looking for exciting projects to contribute to? Google Summer of Code (GSoC) is the perfect opportunity! With the guidance of experienced mentors, you’ll gain hands-on experience working on real-world projects.

👉 Why should you engage early?
It’s very important to connect with organizations as soon as possible. The more you interact with mentors and the community before submitting your proposal, the better your chances of being selected for GSoC!

🎥 Want to learn more about Google Summer of Code?

🔹Who Can Contribute?

Anyone interested is welcome to participate—whether you’re a GSoC student, mentor, or simply passionate about open-source development!

🔹 How to Contribute to a Project

  1. Select a project idea from the list below.
  2. Engage with mentors and explore the project code.
  3. Submit a small contribution to demonstrate your understanding.
  4. Interact with mentors for feedback and improvements.
  5. Prepare your proposal and submit it to Google Summer of Code.

📢 Join Our Community

đź’¬ Slack: C2SI Slack Workspace
📝 Proposal Template: View Here
đź’» Explore Our Projects: C2SI GitHub Repository

Let’s build something great together! 🚀


Idea List for 2026

1. B0Bot

Brief explanation
B0Bot is a CyberSecurity News API tailored for automated bots on social media platforms. It is a cutting-edge Flask-based API that grants seamless access to the latest cybersecurity and hacker news. Users can effortlessly retrieve news articles either through specific keywords or without, streamlining the information acquisition process. Once a user requests our API, it retrieves news data from our knowledge base and feeds it to the LLM. After the LLM processes the data, the API obtains the response and returns it in JSON format. The API is powered by LangChain and a Huggingface endpoint, ensuring that users receive accurate and up-to-date information.

Expected results
This year, we are planning to integrate the following features into b0bot:

  • Implement CDC via RSS feed readers or debezium connectors with kafka bus.
  • Implement caching mechanisms (e.g. Redis) to reduce response time for frequent requests.
  • Add a subscription feature for users to receive daily or weekly summaries, over email.
  • Create an agentic AI framework using Langchain/LangGraph to create planner and executor agents. Example of possible agents can be scraper agent, responder agent, notification agent, analyzer agent. Thorough research is expected from the contributor before deciding the agentic framework.
  • Extend the LLM to support multi-turn dialogue, allowing users to engage in conversational interactions with the API.
  • Extend data sources to various social media websites by using their APIs.
  • Creating tests for the API and proper error handling.
  • Improved UI, possibly creating a dashboard.

Knowledge Prerequisite
Python, Large Language Models, Huggingface, LangChain, Database management, Pinecone, Flask, Agentic Frameworks

Mentors
Hardik Jindal (hardik1408), Nipuna

Estimate Project Length
350 hours

Github URL
https://github.com/c2siorg/b0bot

Difficulty
Hard

Slack channel
#b0bot

2. WebiU

Brief explanation
WebiU is a dynamic organization website built using reusable component architecture that fetches project data in real time from GitHub repositories, ensuring live updates without manual intervention. It provides configurable templates to showcase project details such as title, description, technology stack, demo links, and organization updates.

The project aims to further improve the platform by optimizing APIs for faster and lighter responses, exploring serverless backend solutions for scalable real-time data handling, integrating CI/CD workflows, and introducing lightweight AI features to enhance project presentation and discoverability without increasing system complexity.

Key Objectives
  1. API Optimization
    • Refactor APIs to reduce response times and payload sizes.
    • Implement in-memory caching and compression (e.g., GZIP).
    • Explore GraphQL for efficient data fetching.

  2. Alternative Backend Strategies
    • Leverage serverless architectures for scalable real-time data processing.

  3. CI/CD Integration
    • Automate testing, building, and deployment with tools like GitHub Actions.
    • Enable safe deployments with rollback and error handling.

  4. Admin Features
    • Extend admin controls with project analytics.
    • Provide manual API and AI content refresh options.

  5. AI Enhancements
    • Generate concise project summaries from GitHub README and metadata
    • Detect technology stack automatically for accurate tech badges and filtering
    • Enable optional natural-language project search mapped to existing metadata
    • Cache AI outputs and refresh only on repository updates

Expected Results
By the end of the summer, WebiU will be production-ready with optimized APIs, a scalable real-time architecture, automated CI/CD workflows, and selective AI enhancements. These improvements will reduce manual effort, improve project discovery, and deliver a cleaner, faster, and more maintainable platform.

Skills Required:
REST/GraphQL, GitHub APIs, Node.js & Serverless, Angular, CI/CD (GitHub Actions), Basic API-based AI integration.

Mentor
Mahender Goud Thanda (Maahi10001), Charith

Estimate Project Length
350 hours

Github URL
https://github.com/c2siorg/Webiu

Difficulty
Medium

Slack channel
#WebiU

3. GDB UI

Brief explanation
GDB-UI is a modern, web-based interface for the GNU Debugger (GDB), designed to simplify the debugging process for developers working with C, and C++. It provides real-time interaction with GDB, enabling features like monitoring program execution, inspecting variables, setting breakpoints, and more, all through an intuitive web application.

GDB-UI enhances the debugging workflow by offering a sleek, user-friendly UI, replacing the traditional command-line experience with a visual and accessible alternative. It supports both Docker-based and manual setups, allowing seamless integration into various development environments.

Key Objectives
  1. First Deployment:
    • Deploy the project for initial use, ensuring the application is accessible and functional for all users.

  2. CI/CD Integration:
    • Automate the testing, building, and deployment processes using tools like GitHub Actions.
    • Ensure smooth deployment pipelines with robust rollback mechanisms and proper error handling.

  3. Session Management for Multiuser Support:
    • Implement a system to store debugging sessions uniquely for each user to enable multiuser functionality.
    • Ensure session persistence and isolation to prevent interference between users.

  4. Real-Time Debugging Results:
    • Design the application to display debugger results in real time without requiring page refreshes.
    • Use WebSockets or similar technologies to handle live updates efficiently.

Expected Results
By the completion of the project, the application will be fully deployed with multiuser support, persistent session management, real-time debugging results, and a robust CI/CD pipeline. These enhancements will provide a seamless debugging experience, improve scalability, and simplify the development workflow for contributors.

Skills Required:
Proficiency in REST, Flask, React, WebSockets, Docker, CI/CD tools (e.g., GitHub Actions), session management, and real-time data handling

Mentor
Shubh Mehta (Shubh942), Nipuna, EMSDV

Github URL
https://github.com/c2siorg/GDB-UI

Estimate Project Length
350 hours

Difficulty
Medium

Slack channel
#gdb-ui

4. CodeLabz

Brief Explanation
CodeLabz is an interactive, cloud-based learning platform designed to facilitate engagement with online tutorials. It enables organizations to create, manage, and share structured learning resources with users. The platform is built with a ReactJS frontend, complemented by a scalable backend powered by Google Cloud Firestore and Firebase Realtime Database, ensuring seamless real-time data synchronization and an intuitive user experience.

This project focuses on optimizing learning workflows by integrating an enhanced UI, efficient data management, and dynamic real-time updates. CodeLabz serves as a centralized solution for tutorial creation, consumption, and collaboration, ensuring an effective and scalable educational experience. It currently requires the following improvements.

Key Objectives
  1. Backend and Deployment:
    • The app is temporarily deployed on the client side using Firebase Hosting, so we need a no-cost solution for serverless deployment.
    • Find a free serverless hosting solution and migrate APIs from Firebase Functions while ensuring scalability, reliability, and minimal maintenance.
  2. Implement Real-Time Notification System (No Cost):
    • Research and implement a real-time notification system integrated with in-app notifications.
    • Ensure functionality for push and in-app notifications within free-tier limits.
  3. Managing Org-Setting (Including Roles) and Implement Admin Features:
    • Develop functionalities for managing organization settings.
    • Implement role-based access control to ensure secure and controlled access.
    • Enhance the admin dashboard with real-time API refresh and analytics.
  4. Optimize NoSQL Queries for Performance:
    • Identify key indexes, optimize slow queries, and implement caching or denormalization strategies to improve retrieval times.
  5. Containerization & Dockerization:
    • Implement Docker for consistent development and production environments..
  6. UI/UX Refinement:
    • Improve design consistency, usability, and responsiveness across all devices with proper design principles.
    • The codebase contains partial migrations and multiple MUI versions, so make it consistent and update it to the latest versions.
  7. Ensure Proper Migration and Consistency:
    • The codebase includes deprecated npm libraries and modules, update or replace them.
    • Convert the existing JavaScript codebase to TypeScript, ensuring type safety and maintainability.

Expected results
By the completion of this project, CodeLabz will achieve:
  • Optimized API performance: Faster response times, reduced server load, and enhanced data retrieval, improving overall API performance.
  • Better looking UI/UX: A visually appealing and responsive interface that works seamlessly across all devices, improving user satisfaction.
  • Scalable real-time backend: A scalable, serverless backend that handles real-time data synchronization and notifications efficiently.
  • Refined CI/CD pipelines and faster deployment cycles: Improved consistency and reproducibility of environments through Docker.
  • Enhanced data security and privacy: Improved data security with role-based access control, ensuring only authorized users access sensitive information.
  • Improved admin functionality: Advanced analytics, data monitoring, and administrative control tools.
  • Updated and consistent codebase: Fully migrated, consistent, scalable and maintainable codebase.

Knowledge Prerequisite
Proficiency in React.js, Redux, Material-UI, TypeScript, Node.js, Express.js, Firebase, API design, Docker, CI/CD (GitHub Actions), Figma, NoSQL design patterns, query optimization, caching strategies, OAuth, and Role-Based Access Control (RBAC).

Mentor
Mallepally Lokeshwar Reddy(lokeshwar777), Utkarsh Raj(rajutkarsh07)

Github URL
https://github.com/c2siorg/codelabz

Estimate Project Length
350 hours

Difficulty
Medium

Slack channel
#codelabz

5. ImageLab

Brief explanation
ImageLab is a block-based image processing tool built on Blockly and OpenCV. Students drag operators into a pipeline and run it against an uploaded image. This project transforms ImageLab into a genuine learning environment by adding per-step intermediate previews, histogram analysis, pipeline persistence, batch processing, and custom composite operators (macros).

How it works
  • Input: An uploaded image and a Blockly-based operator pipeline designed by the user
  • Processing: OpenCV operators are executed step-by-step on the backend, returning the image state after every operator
  • Output: Intermediate images at each step, per-channel RGB histograms with statistics, saved/shareable pipelines, batch-processed ZIP results, and reusable macro blocks in the Blockly sidebar

Expected Results
  • Students can inspect every operator’s effect step-by-step instead of seeing only the final image
  • Quantitative analysis (histogram, statistics) is available for every intermediate step
  • Pipelines can be saved, loaded, and shared via URL, enabling collaboration and teaching workflows
  • ImageLab’s first database usage sets the foundation for future features (user accounts, pipeline ratings)
  • Batch processing allows applying saved pipelines to multiple images concurrently with progress tracking and ZIP download
  • Custom composite operators (macros) teach abstraction by letting users package sub-chains as reusable blocks in the Blockly sidebar

Tech stack / Tools
FastAPI, OpenCV, PostgreSQL, SQLModel, Alembic, React, TypeScript, Blockly, Zustand

Knowledge Prerequisite
  • Proficiency in Python and TypeScript/React
  • Familiarity with OpenCV or willingness to learn image processing basics
  • Understanding of REST API design and relational databases
  • Experience with Blockly is a plus but not required

Mentor
Oshan Mudannayake

Github URL
https://github.com/c2siorg/imagelab

Estimate Project Length
350 hours

Difficulty
Hard

Slack channel
#imagelab

6. DataLoom

Brief explanation
DataLoom is a web-based data wrangling tool with a unique git-like checkpoint/revert system. This project expands DataLoom into a complete data preparation platform by adding multi-format upload, data profiling, merge/join operations, formula columns, data visualization, an automated data quality engine, and multi-format export with quality reports.

How it works
  • Input: Data files in multiple formats (CSV, xlsx, json, parquet, tsv)
  • Processing: pandas-powered transforms with a checkpoint system, including profiling, merge/join, formula columns, quality assessment (duplicate detection, outlier flagging, pattern validation), and visualization generation
  • Output: Transformed datasets, column-level statistical profiles, composite quality scores with one-click fixes, interactive charts (histogram, bar, scatter, time series), and multi-format exports with downloadable HTML/PDF quality reports

Expected Results
  • Users can upload xlsx, json, parquet, and tsv files in addition to CSV
  • Automatic data profiling gives instant insight into column statistics on upload
  • All transform forms use dropdown column selectors instead of free-text input
  • Users can merge/join data across projects
  • Computed formula columns and reusable transformation pipelines are supported
  • Basic visualization (histogram, bar, scatter, time series) is available in-app
  • Automated data quality assessment detects duplicates, outliers, and pattern violations with a composite quality score and one-click fixes
  • Data can be exported in multiple formats (xlsx/parquet/json/tsv), and comprehensive quality reports can be generated as HTML or PDF

Tech stack / Tools
FastAPI, pandas, PostgreSQL, SQLModel, Alembic, React, TypeScript, scikit-learn, matplotlib

Knowledge Prerequisites
  • Strong Python skills with pandas experience
  • React/JavaScript/TypeScript proficiency
  • Understanding of SQL joins and data transformation concepts
  • Familiarity with charting libraries (recharts or similar)

Mentor
Oshan Mudannayake

Github URL
https://github.com/c2siorg/dataloom

Estimate Project Length
350 hours

Difficulty
Hard

Slack channel
#dataLoom

7. TensorMap

Brief explanation
TensorMap is a visual neural network builder where users drag layer nodes onto a ReactFlow canvas, connect them, and train Keras models. This project transforms TensorMap into a complete neural network design studio by adding a data-driven layer registry, 11 new layer types, real-time training visualization, model export in industry-standard formats, post-training interpretability tools, and automated hyperparameter tuning.

How it works
  • Input: A model architecture defined as ReactFlow nodes and edges (layer types with parameters) plus a training dataset
  • Processing: Keras model generation from the visual graph, training with structured Socket.IO metric callbacks, hyperparameter search (grid/random), and post-training analysis (confusion matrix, feature importance)
  • Output: Trained models with real-time interactive training charts, exported model files (SavedModel/ONNX/TFLite), comparison dashboards, interpretability reports, and tuning results with one-click best-parameter application

Expected Results
  • Adding a new layer type requires only a registry entry (data-driven) instead of 6+ file edits
  • 15 layer types available (up from 4), covering CNNs, RNNs, and regularization
  • Training progress shown as real-time interactive charts with structured metrics
  • Models can be exported in SavedModel, ONNX, and TFLite formats
  • Multiple training runs can be visually compared in a dashboard
  • Post-training analysis tools (confusion matrix, classification report, feature importance, prediction explorer) help users understand model behavior and diagnose issues
  • Automated hyperparameter tuning with grid and random search strategies eliminates manual trial-and-error, with real-time progress tracking and one-click application of best parameters

Tech stack / Tools
FastAPI, TensorFlow/Keras, PostgreSQL, SQLModel, Alembic, scikit-learn, Socket.IO, React, TypeScript, ReactFlow

Knowledge Prerequisite
  • Python proficiency with TensorFlow/Keras experience
  • React/JavaScript/TypeScript proficiency with component-based UI patterns
  • Understanding of neural network architectures and training workflows
  • Familiarity with Socket.IO or WebSocket-based real-time communication

Mentor
Oshan Mudannayake

Github URL
https://github.com/c2siorg/tensormap

Estimate Project Length
350 hours

Difficulty
Hard

Slack channel
#tensormap

8. Honeynet

Brief explanation
Develop a scalable, cloud-native honeypot deployment framework that leverages Terraform to provision and manage honeypot instances across multiple geographic regions. This platform will help security teams gather threat intelligence, understand attacker methodologies, and improve defensive postures by simulating realistic targets in various cloud environments.

Expected Results
Check project GeoDNSScanner (https://github.com/c2siorg/GeoDnsScanner), you will create something similar to this but to deploy honeypots

Key Objectives
  • Automated Deployment: Use Terraform to automate the provisioning, configuration, and decommissioning of honeypot infrastructure across multiple regions and potentially multiple cloud providers.
  • Distributed Architecture: Deploy honeypots in various regions (e.g., North America, Europe, Asia-Pacific) to capture a diverse range of attack vectors and adapt to region-specific threat landscapes.
  • Data Enrichment: Integrate logging, monitoring, and analytics to enrich raw data, correlating attack patterns with global threat intelligence feeds.
  • Scalability and Flexibility: Implement modular Terraform configurations and cloud-native services to enable rapid scaling, dynamic resource allocation, and easy modifications.

Knowledge Prerequisite
Cloud deployment, Terraform, bash

Mentor
Danushka V, WiztaMax, Keneth

Github URL
https://github.com/c2siorg/honeynet

Estimate Project Length
350 hours

Difficulty
Easy

Slack channel
#Honeynet

9. RustCloud

Brief explanation
RustCloud is a rust library which hides the difference between different APIs provided by varied cloud providers (AWS, GCP, Azure etc.) and allows you to manage different cloud resources through a unified and easy to use API.

Expected Results
  • By the end of the project, API for BigQuery, Vertex AI, GenAI for AWS, GCP, Azure
  • Documentation - Improve and maintain documentation related to the development areas, ensuring clarity for future contributors.

Knowledge Prerequisite
Rust, AWS, GCP, Azure

Mentor
Pratik Dhanave, Mohit Bhat

Github URL
https://github.com/c2siorg/RustCloud

Estimate Project Length
350 hours

Difficulty
Medium

Slack channel
#rust-cloud

10. Agentic Cognitive Firewall SDK

Brief explanation
The Cognitive Firewall is a Zero Trust control layer for agentic systems that protects the reasoning control plane of large language model (LLM) agents. It prevents prompt injection, context manipulation, memory poisoning, and unsafe tool-output re-injection by enforcing policy-driven validation before any input enters model context. The system acts as a programmable admission controller for agent cognition, ensuring that prompts, retrieved documents, memory writes, and tool outputs are continuously verified. It enables secure-by-design agent development through contextual integrity enforcement and cognitive telemetry.

How it works:
Input:
  • User prompts
  • Retrieved RAG documents
  • Tool outputs
  • Memory write operations
  • System prompts
Processing:
  • SDK intercepts agent inputs/outputs
  • Sends payload to Firewall API
  • Injection pattern detection (rule-based + heuristic scoring)
  • Context sanitization & instruction stripping
  • Policy-as-code evaluation
  • Risk scoring engine
  • Allow / Sanitize / Block decision
  • Logging & cognitive telemetry storage
Output:
  • Sanitized context (if allowed)
  • Block decision with reason (if denied)
  • Risk score metadata
  • Audit logs for governance

Expected Results
Milestone 1 (Weeks 1–4): Core Firewall MVP
  • FastAPI-based Cognitive Firewall service
  • Python SDK middleware for agent interception
  • Prompt injection detection engine (rule-based)
  • Document sanitization module
  • Policy-as-code YAML engine
Milestone 2 (Weeks 5–8): Context Integrity Controls
  • Tool output inspection with DLP scanning
  • Memory write validation with TTL enforcement
  • Risk scoring engine
  • Provenance tagging for context elements
  • Logging + audit trail storage
Milestone 3 (Weeks 9–12): Observability & Enterprise Readiness
  • Cognitive telemetry dashboard
  • Agent-level risk aggregation
  • Policy versioning & configuration management
  • Example integration with LangGraph agent
  • Documentation + deployment scripts (Docker)

12-Week Implementation Plan:
Weeks 1–2: Architecture & Foundations
  • Define system architecture (SDK + Firewall Service)
  • Design data model for context validation
  • Create FastAPI skeleton
  • Define validation schema (Pydantic models)
  • Create GitHub repository structure
Weeks 3–4: Prompt & Document Protection
  • Implement prompt injection detection rules
  • Implement document sanitization module
  • Add pattern detection (override attempts, encoded payloads)
  • Build policy-as-code YAML loader
  • Integrate SDK with validation endpoint
Weeks 5–6: Tool Output & Memory Controls
  • Implement DLP scanning module
  • Add secret detection patterns
  • Implement memory write validation rules
  • Add TTL enforcement & size constraints
  • Risk scoring engine (weighted scoring)
Weeks 7–8: Context Provenance & Logging
  • Add metadata tagging (source, trust score)
  • Implement logging to Postgres
  • Create audit trail schema
  • Add decision trace logging
Weeks 9–10: Observability Layer
  • Build basic dashboard (Streamlit or simple UI)
  • Display risk scores & blocked events
  • Add agent-level summaries
  • Add exportable security reports
Weeks 11–12: Hardening & Deployment
  • Dockerize service
  • Add basic authentication for firewall API
  • Write integration example with LangGraph
  • Write documentation
  • Final testing + injection scenario testing
  • Prepare proposal/demo presentation

Tech stack / Tools
  • Backend: FastAPI
  • LLM/AI: LangChain or LangGraph, OpenAI / HuggingFace
  • Storage/DB: PostgreSQL (audit logs), optional SQLite for MVP
  • Messaging/Streaming (if any): Not required for MVP
  • Caching (if any): Redis
  • Containerization: Docker
  • Dashboard: Streamlit or lightweight React frontend

Knowledge Prerequisite
Python, FastAPI, REST APIs, LLM fundamentals, Prompt engineering, Security basics, Regex, YAML parsing, Basic DevOps (Docker)

Mentor
Tharindu Ranathunga, Kavishka Fernando ([email protected])

Github URL
TBA

Estimate Project Length
120–150 hours (approx. 10–12 hours/week for 12 weeks)

Difficulty
Medium (security + AI systems design + architecture integration)

Slack channel
TBA

11. LensMint Web3 Camera

Brief explanation
LensMint Camera is a Web3-enabled camera system built on Raspberry Pi that captures media with cryptographic provenance and signed metadata and mints photos as nft on the blockchain. This project will rewrite and strengthen the camera OS and interface in Rust, and build a secure device key generation and identity layer to improve authenticity, tamper resistance, and integration with future decentralized systems.

How it works:
Inputs:
  • Camera sensor image/video data via Raspberry Pi CSI interface.
  • User interaction via touchscreen/UI or hardware buttons.
Processing:
  • Rust-based/Python-based firmware/OS controls board peripherals.
  • Secure entropy collection for device keypair generation.
  • Camera capture and processing pipeline.
  • Cryptographic signing and metadata generation.
  • Blockchain Processing for NFT generation.
  • ZK proof generation.
Output:
  • Signed photo/video data.
  • Unique device identity for provenance.
  • Stable camera UI and control interface.
  • An NFT that can be shared.

Expected Results
  • Rust-native core OS/firmware and improved camera interface
  • Secure device cryptographic key generation using hardware entropy
  • Safe key storage abstraction (flash/secure enclave)
  • Rust API for camera control and capture pipeline
  • Modular identity layer for verifiable device signatures
  • Performance optimizations and reliability improvements
  • Improve blockchain processing
  • Add AI features - Agentic camera support
  • OS/UI design improvement
  • Note: Not all features need to be implemented; you can work on a subset of features or all, depending on what you can accomplish during gsoc period. Also, you can suggest more improvements on your side that are not listed here

Tech stack / Tools
Hardware
  • Raspberry Pi (4, Zero, etc.) with CSI camera modules (e.g., Raspberry Pi Camera Module) Raspberry Pi
Programming languages:
  • Rust (embedded where applicable)
  • Python
  • Javascript
OS / Firmware:
  • Rust OS / embedded crate stack (embedded-hal, rp-pico SDK etc.)
  • Camera control via libcamera ecosystem (Linux stack) (GitHub)
Cryptography:
  • Rust crypto crates (e.g., ed25519-dalek, rand)
  • ZK Proof
Build & tooling:
  • cargo, cross-compile toolchains
  • Embedded flashing tools (OpenOCD / Pi Imager) (Wikipedia)
Technology:
  • Blockchain

Knowledge Prerequisite
  • Rust programming and systems-level coding
  • Embedded/Linux systems (Raspberry Pi environment)
  • Cryptographic key management and signatures
  • Understanding of camera hardware interfaces
  • Blockchain Knowledge

Mentor
Mohit Bhat

Github URL
https://github.com/c2siorg/lensmint-camera

Estimate Project Length
350 hours

Difficulty
Hard

Slack channel
#gsoc-lensmint

12. PII-Safe – Privacy Guard for Agentic AI & MCP Workflows

Brief explanation
As AI agents increasingly process security logs, chat transcripts, and incident reports, they are often exposed to sensitive personal information such as emails, usernames, IP addresses, and customer identifiers. PII-Safe is a middleware and MCP compatible privacy plugin that automatically detects, redacts, or pseudonymizes personal data before it reaches an LLM or is stored in memory. The project aims to make agentic AI deployments safer, more compliant, and production-ready without compromising analytical value.

How it works:
Policy Representation & Enforcement: PII-Safe represents privacy controls as policy-as-code, keeping privacy rules separate from application logic. Each incoming piece of data (such as a log entry or text document) is converted into a structured input that includes context like the operation type (e.g., analysis or export) and the types of sensitive information detected (e.g., email, IP address, phone number). This structured input is evaluated against declarative policies that return a clear decision such as allow, redact, pseudonymize, or block.
For example, a simple policy might state that during internal analysis, email addresses can be replaced with consistent placeholders (e.g., USER_01) to preserve context. However, if the same data is being exported outside the system, all email addresses and IP addresses must be fully redacted. If someone attempts to export raw, unsanitized data, the policy engine would return a deny decision and stop the operation.

Inputs:
  • Prompts,security logs, incident case data, tool-call payloads, or unstructured text containing potential personal information.
Processing:
  • Schema-aware PII detection (structured JSON + free text).
  • Policy-driven sanitization (redaction, pseudonymization, allowlists).
  • Context-preserving token mapping for case-level consistency.
  • Privacy exposure scoring and audit logging.
  • MCP tool interface for integration into agent workflows.
Output:
  • Sanitized data safe for AI processing, a privacy risk score, and an auditable transformation report (with secure mapping for authorized re-identification).

Tech stack
  • Backend: FastAPI (Python)
  • LLM/AI: spaCy or HuggingFace (for lightweight entity detection), Optional integration examples with LangChain / LangGraph, OLLAMA for local testing
  • Storage/DB: SQLite (default) or Postgres (optional extension)
  • Caching: Redis

Expected Results
  • Policy-based PII detection and configurable redaction engine
  • Stable pseudonymization system (consistent within incident scope)
  • FastAPI-based middleware + MCP server mode
  • CLI tool for batch dataset sanitization
  • Audit logging and privacy exposure scoring
  • Dockerized deployment with example datasets
  • Unit tests and integration tests with real world agentic AI framework

Knowledge Prerequisite
Python, REST API development, Basic cybersecurity knowledge, Understanding of JSON and data schemas, Foundational understanding of data privacy principles

Mentor
Tharindu Ranathunga, Kavishka Fernando ([email protected])

Github URL
TBA

Estimate Project Length
175–350 hours

Difficulty
Medium

Slack channel
TBA