Matteo Corain

mcorain • mattecora

Senior Data Engineer at Data Reply IT, focusing on the design and implementation of big data processing solutions, both on-premise and cloud-based (AWS, GCP).
Master of Science graduate in Computer Engineering (Data Science orientation) from Politecnico di Torino, Italy.

Work experience

Data Reply srl

2024-now

Senior data engineer

Main duties and responsibilities:

Data Reply srl

2021-24

Data engineer

Politecnico di Torino

2017-20

Laboratory assistantship

Education

Politecnico di Torino

2021

2nd Level Specializing Master in Artificial Intelligence and Cloud Computing: Hands-on Innovation

Thesis title: Engineering ETL processes on AWS: a framework-based solution
Final mark: 110/110 cum laude

Politecnico di Torino

2018-20

Master of Science (Laurea Magistrale) in Computer Engineering, Data Science orientation

Thesis title: A density-based method for scalable outlier detection in large datasets
Final mark: 110/110 cum laude

Politecnico di Milano, Politecnico di Torino

2018-20

Alta Scuola Politecnica

Project title: EnerChainge: Blockchain for smart energy applications

Alta Scuola Politecnica offers an excellence double degree program restricted to the top 150 students of Politecnico di Torino and Politecnico di Milano. ASP students are required to attend additional, ad-hoc courses and develop a final, year-long multidisciplinary project in collaboration with academic and industrial tutors.

University of Illinois at Chicago

2018-20

Master of Science in Computer Science (TOP-UIC)

TOP-UIC is a MS-level double degree program offered by Politecnico di Torino and University of Illinois at Chicago (UIC). TOP-UIC students are required to attend a semester of courses in Chicago and develop the final thesis in collaboration with advisors from both universities.

Politecnico di Torino

2015-18

Bachelor of Science (Laurea) in Computer Engineering

Final mark: 110/110 cum laude

Part of the Percorso per Giovani Talenti project, restricted to the top 200 students of the university, which integrates the normal educational plan with the addition of supplementary courses and activities.

Liceo Scientifico C. Cattaneo (Torino)

2010-15

High school diploma (Maturità Scientifica)

Final mark: 100/100 cum laude

Certifications

Amazon Web Services

2024

AWS Certified Data Engineer - Associate (DEA-C01)

Amazon Web Services

2024

AWS Certified DevOps Engineer - Professional (DOP-C02)

Amazon Web Services

2024

AWS Certified Solutions Architect - Professional (SAP-C02)

Databricks

2023

Databricks Certified Data Engineer Associate (V3)

HashiCorp

2023

HashiCorp Certified: Terraform Associate (003)

MongoDB

2022

MongoDB SI Architect Certification

Databricks

2021

Databricks Certified Associate Developer for Apache Spark 3.0

Publications

M. Corain, P. Garza and A. Asudeh

2021

DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, pp. 37-48, doi: 10.1109/ICDE51399.2021.00011.

Languages knowledge

Mother tongue. Italian

Other languages. English level C1 (CEFR)

Certificates. IELTS Academic 8.0 (April 14th, 2018)

Technical knowledge

Databases. Experience in the usage and administration of relational (PostgreSQL, Oracle, MySQL) and NoSQL (MongoDB, ElasticSearch, DynamoDB) database management systems.

Data processing. Strong experience in working with the Apache Spark ecosystem (including extensions like Delta Lake), as well as traditional data processing libraries such as Numpy, Pandas.

DevOps and IaC. Deep knowledge of code versioning systems based on Git and their integration in CI/CD pipelines on Jenkins and CodePipeline, and of infrastructure as code tools such as Terraform and CloudFormation.

Container. Experience with building and running containers based on Docker, working knowledge of Kubernetes and Helm.

Amazon Web Services. Extensive experience with most foundational and analytical services, including: S3, IAM, KMS, EC2, ECR, ECS, EKS, Fargate, Lambda, EMR, Glue, Athena, RDS, Aurora, DynamoDB, Redshift, Lake Formation, DMS, Kinesis, Sagemaker, QuickSight, Step Functions, SNS, SQS, Eventbridge, API Gateway, CloudFront, Cognito, Amplify, CloudWatch, CodeCommit, CodeBuild, CodePipeline, CloudFormation.

Google Cloud Platform. Basic working knowledge of main foundational and analytical services, including: Cloud Storage, Compute Engine, Cloud Functions, Dataflow, BigQuery, Firestore, Composer, PubSub, Cloud Scheduler, Logging.

Databricks. Experience with working with Apache Spark and Delta Lake to implement medallion architectures on the Lakehouse Platform, including components such as Delta Live Tables and Unity Catalog.

Known languages. C, Java, Scala, Python, JavaScript, HTML, CSS, SQL