Hua Cao

📧 [email protected] | 🌍 East Palo Alto, CA
🔗 GitHub - sagemaker studio dataengineering sessions
🔗 GitHub - sagemaker studio dataengineering extensions
🔗 GitHub - CorfuDB


💼 Experience

Amazon Web Services (AWS) – Software Engineer

Sep 2022 – Present · East Palo Alto, CA

  • Early engineer on the Amazon SageMaker Unified Studio team (launched in 2025, previewed at re:Invent 2024), redefining end-to-end experiences for Generative AI, ML workflows, and governance. Led design and development of foundational components from inception to launch.
  • Core contributor to open-source projects:
  • Developed key capabilities in AWS EMR Studio, a notebook-based environment for Apache Spark and Trino, enhancing real-time debugging and development in Python, Scala, and SQL for big data workloads.
  • Delivered core features for AWS EMR’s notebook ecosystem (Jupyter Enterprise Gateway, JupyterHub, Livy, Zeppelin, Hue), improving scalability, performance, and usability for petabyte-scale data processing and interactive analytics.

VMware – Software Engineer

Jul 2021 – Aug 2022 · Palo Alto, CA

  • Contributed to CorfuDB, an open-source distributed log-based database supporting strong consistency.
  • Led development of IDAS (ID Allocation Service) on Corfu, powering distributed IP/ID allocation for NSX network virtualization.
  • Re-architected IDAS to support expanded ID ranges (from 2³² to 2¹²⁸), unlocking IPv6 and future-proof scalability.
  • Increased test coverage from ~40% to 80%+, and introduced Corfu-wide reusable testing framework.

Morningstar – Software Engineer

Jul 2016 – Aug 2018 · Shenzhen, China

  • Member of the Presentation Studio team — core to Morningstar’s most profitable product line (>50% revenue contribution).
  • Migrated client-side computation to backend microservices in preparation for AWS cloud migration.
  • Designed and built a high-availability RESTful service using ASP.NET and C#, supporting 20K+ users and >1000 QPS.
  • Implemented async polling module, reducing user wait time by 15%, improving throughput by 10%.
  • Added caching layer, reducing DB pressure and boosting performance by 20%.
  • Optimized web request pre-processing logic, reducing latency by 80% via predictive behavior modeling.

🎓 Education

North Carolina State University
Master of Computer Science
Aug 2019 – May 2021

Hunan University
Bachelor of Computer Science
Sep 2012 – Jun 2016


🧠 Skills

Languages:
Python, Java, Scala, TypeScript, JavaScript, Go, C#, C/C++, SQL, Bash, HTML/CSS

Frameworks & Tools:
Jupyter (Hub, Enterprise Gateway), Spark, Trino, Zeppelin, Hue, React, Django, Spring Boot, ASP.NET, FastAPI, Flask, Protobuf, gRPC, JUnit, PyTest

Cloud & DevOps:
AWS (SageMaker, EMR, S3, Glue, Athena, Redshift), Docker, Kubernetes, CI/CD, GitHub Actions, CloudFormation, Linux/Unix

Big Data & ML:
Apache Spark, Hive, Trino, Airflow, Pandas, scikit-learn, TensorFlow, PyTorch, LLMs, LangChain, MLflow, Data Governance

Other:
Open Source Contributor, Agile/Scrum, System Design, Technical Leadership, Security & Compliance


🛠 Projects

AWS

VMware

IDAS

  • Led development of IDAS (ID Allocation Service), a critical system built on top of CorfuDB to enable distributed, unique ID and IP address allocation across the NSX platform.
  • Re-architected IDAS to expand its address space from 2³² to 2¹²⁸, enabling support for IPv6 and future-proof scalability.
  • Improved test reliability and maintainability by increasing unit test coverage from ~40% to over 80%, and by introducing a reusable test framework adopted across other Corfu projects.
  • Collaborated closely with infrastructure and networking teams to ensure performance, fault tolerance, and correctness in distributed environments.

Morningstar Inc

Backend calculation service

Built a RESTful service transferring the calculation services into server to fit the Agile Development and AWS migration.

  • Built up a bottom layer Asynchronous Programming Model to support 20k+ users with heavy workload of calculation
  • Designed the interface to support asynchronism and polling mechanism from client side, improving the QPS by 10%
  • implemented the cache layer to work with the polling mechanism, improving the performance by 20%
  • transferred the business logic of calculation part in 2 components, making sure no performance losses and it was unawaked by customers

Portfolio/Office Workflow

Built up two new workflows besides existing 4 workflows to support the new strategic plan of company.

  • Designed the new processes and implemented the UI part as well as business logic to match current framework
  • Added 5 new components under the new workflows and moved their calculation part into our backend services
  • Optimized the preprocess logic to reduce the time cost per web request by 80% from minute-level to second-level
  • Used the NOSQL to store the information of reports
  • Worked with our batch services to produce the PDF reports using the XML files