RAJAT SINGH
Data Engineer
Building Scalable Data Infrastructure
0
Daily Records
0
Uptime %
0
Cost Reduction %
0
Faster Queries %
▼ Scroll to Explore ▼
PROFESSIONAL EXPERIENCE
Data Engineer
Chegg India
Jul 2024 – Present
- Engineered and maintained robust data pipelines using Apache Airflow, Databricks, PySpark, and AWS Redshift, processing 1M+ records daily for large-scale reporting and analytics workloads with 99.9% uptime.
- Automated API ingestion workflows from Google Ad Manager, AdSense, and AdMob to support ad revenue tracking and financial reporting, integrating with NetSuite and Braintree payment systems, reducing manual effort by 80%, improving data freshness from 24h to near real-time, and cutting operational costs by 30%.
- Led migration of 20+ legacy ETLFM pipelines to Airflow, implementing modern lakehouse architecture using AWS S3, Delta Lake, and Delta Live Tables, reducing query latency by 40% and improving scalability.
- Established comprehensive data quality framework with anomaly detection and validation checks, achieving 99.5% data accuracy in partnership with analytics teams and reducing data incidents by 60%.
Data Engineer Intern
Chegg India
Jan 2024 – Jul 2024
- Onboarded 15+ diverse data sources to Databricks lakehouse and architected Airflow DAGs for scheduled ingestion with SLA monitoring and alerting, reducing onboarding time by 50%.
- Conducted rigorous QA and data validation for RIO event tracking, resolving 30+ data quality issues and supporting cross-functional teams in issue triaging, improving overall pipeline reliability by 35%.
- Collaborated on instrumentation of IRD documents and monitored New Relic events for newly launched features, ensuring data integrity.
TECHNICAL PROJECTS
Real-Time Song Recommender
AI-powered emotion detection system using Python, OpenCV, and CNN models to analyze facial expressions in real-time with 85% accuracy. Integrated Spotify API to dynamically curate personalized playlists based on detected moods.
View on GitHub →
Phonebook Directory System
High-performance C++ CLI application with full CRUD operations using doubly linked lists for O(1) insertion/deletion and bidirectional traversal. Implemented binary search, multiple sorting algorithms, and comprehensive input validation.
View on GitHub →
TECHNICAL SKILLS
Cloud & Infrastructure
AWS
S3
Redshift
Lambda
EC2
GCP
Data Processing
Apache Airflow
Databricks
PySpark
Delta Lake
SQL
Programming
Python
SQL
C++
C
Tools & Concepts
Docker
Git
PostgreSQL
ETL/ELT
Data Lakehouse
CI/CD
EDUCATION
Delhi Technological University
Bachelor of Technology in Computer Science and Engineering
Aug 2020 – Jun 2024
CGPA: 8.56/10.0
CERTIFICATIONS
Gremlin Certified Chaos Engineering Practitioner
Gremlin
C for Everyone: Programming Fundamentals
Coursera
Text Mining and Analytics
Coursera
🎮 DATA PIPELINE GAME
Click to catch data packets and maintain 99.9% uptime! Can you process 1M+ records?
Score
0
Uptime
100%
Level
1
⌨️ CODE TYPING CHALLENGE
Select a level and type the code snippet correctly! Test your coding speed and accuracy.