Research
My primary research focuses on designing and building resilient systems critical infrastructure. Many sectors of society depend on critical infrastructure such as the power grid, making them essential to the functioning of society. Supervisory Control and Data Acquisition (SCADA) systems provide automated control and remote monitoring of such systems. Traditionally SCADA systems (equipment and software) were designed to be operated in the air-gapped environment without network interconnectivity or exposure. However, over time many factors such as cost-effectiveness and scalability drove vendors and utilities to adopt standard Information Technology (IT) platforms, increasing their interconnectivity to other networks beyond the traditional limits of Operational Technologies (OT). As a result, sophisticated attacks increasingly target power grid SCADA systems. These overwhelmingly increasing threats drive the need to build intrusion-tolerant techniques in resilient SCADA systems.
Real-Time Byzantine Resilience
In the world of increasing cyber threats, a compromised protection relay can put power grid resilience at risk by
irreparably damaging costly power assets (tens of thousands of dollars), causing significant disruptions, or leading to an inconsistent state.
This projects develops Byzantine Fault Tolerant architecture and protocols
to protect bulk power system components (345kV transformer) even when some protection relays (protecting the assets) are compromised.
While developing Byzantine Fault Tolerant protocols is challenging, a power grid substation's worst-case latency requirement is a quarter power cycle,
i.e., four milliseconds, adding a real-time response challenge.
Spire of the Substation is the first Real-time Byzantine Resilient System for the power grid substations is built to maintain correct operation
while satisfying the performance and latency requirements, even in the face of successful compromises and network attacks
in power grid substations. The work uses proactive recovery and diversity to allow the system to survive unbounded number of
compromises over the system lifetime, as long as the number of simultaneous compromises does not exceed a certain threshold.
Finally, machine learning-based situational awareness modules supplement the intrusion-tolerant system.
The system is developed and delivered to be deployed in testbeds at General Electric, Siemens and Hitachi Energy.
Severe Impact Resilience
The joint threats of increasingly frequent severe natural disasters and follow-on sophisticated malicious
cyberattacks are becoming increasingly realistic and seriously threaten critical infrastructure systems.
This novel threat model and the impact of such threats on critical infrastructure are not well understood.
The project defines the threat model and develops a framework to assess
the impact of novel compound threats on critical infrastructure with the aim to develop severe impact resilient control systems.
One interesting outcome of initial work is that existing architectures are not resilient to compound threats.
This led to many interesting research directions in Resilitient Systems for Critical Infrastructure where
we are exploring more dynamic and flexible architectures compared to traditinal architectures.
Open Source Software Releases
The opensource software releases from my work in DSN Lab during my Ph.D.
Spire
Spire is an open-source intrusion-tolerant SCADA system for the power grid. Spire is designed to withstand attacks and compromises at both the system level and the network level, while meeting the timeliness requirements of power grid monitoring and control systems (on the order of 100-200ms update latency).
Spines
Spines is a generic messaging infrastructure that provides transparent unicast, multicast and anycast communication over dynamic, multi-hop networking environments without the need for expensive router programming environments or low level router coding. It provides automatic reconfiguration and network flexibility required for research and production deployments.
Prime
Prime is a Byzantine fault-tolerant replication engine that provides meaningful performance guarantees even after some of the replication servers have been compromised.
Publications
Tolerating Compound Threats in Critical Infrastructure Control System
Sahiti Bommareddy, Maher Khan, Huzaifah Nadeem, Benjamin Gilby, Imes Chu, John W. van de Lindt, Omar Nafal, Mathaios Panteli, Linon Wells II, Yair Amir, Amy Babay
Inproceedings of the 43rd International Symposium on Reliable Distributed Systems, Charlotte,USA, September 2024 (SRDS 2024). Best Paper Award.
Real-Time Byzantine Resilient Power Grid Infrastructure: Evaluation and Tradeoffs
Sahiti Bommareddy, Maher Khan, David J Sebastian Cardenas, Carl Miller, Christopher Bonebrake, Yair Amir, Amy Babay
Accepted at International Workshop on Explainability of Real-time Systems and their Analysis at the IEEE Real-Time Systems Symposium (RTSS 2022)
Real-Time Byzantine Resilience for Power Grid Substations
Sahiti Bommareddy, Daniel Qian, Christopher Bonebrake, Paul Skare, Yair Amir
International Symposium on Reliable Distributed Systems, Vienna, Austria, September 2022, pp. 134-144
Data-Centric Analysis of Compound Threats to Critical Infrastructure Control Systems
Sahiti Bommareddy, Benjamin Gilby, Maher khan, Imes Chiu, Mathaios Panteli, John W. van de Lindt, Yair Amir, Amy Babay
Workshop on Data-Centric Dependability and Security (co-located with IEEE/IFIP DSN), Baltimore, USA 2022
Teaching
I had the great pleasure and opportunity to be Teaching Assistant and Special help on the following courses:
Introduction to Machine Learning (JHU EN.601.475/675):
This course serves as a survey introduction to the field of machine learning. The course will cover major classes of problems in machine learning: supervised, semi-supervised, and unsupervised learning, prediction problems (regression and classification), graphical models, dimension reduc- tion, clustering, missing data, reinforcement learning, and causal machine learning.Fall 2024
Advanced Distributed Systems (JHU EN.601.717):
The course is managed as a few discussion groups, each is focused around a selected research topic. Each group investigates far-reaching ideas, and designs and implements a useful semester-long project related to the topic.Spring 2022
Distributed Systems (JHU EN.601.417/617):
The course teaches how to design and implement efficient tools, protocols and systems in a distributed environment.Fall 2021 , Fall 2019
Software for Resilient Communities (JHU EN.601.310):
This is a project-based course focusing on the design and implementation of practical software systems. Students will work in small teams to design and develop useful open-source software products that support our communities. Students will be paired with community partners and will aim to develop software that can be used after the course ends to solve real problems facing those partners today.Spring 2021
Intermediate Programming (JHU EN.601.220 ):
Programming in C and C++.Fall 2020
Introduction to AI (Duke University, CS270):
The course is algorithms and representations used in artificial intelligence. Introduction to and implementation of algorithms for search, planning, decision, theory, logic, Bayesian networks, robotics and machine learning.Spring 2019
Professional Experience
Specialized in application load analysis, performance evaluation, network performance analysis and optimization.
2013 - 2016 @ Aktrix Technologies : Co-Founder and Software Engineer
Led the performance engineering team on multiple client projects.
Ensured application performance Service-Level-Agreement(SLA) by identifying and resolving performance bottlenecks.
Initiated and led development of on an automated performance monitoring system to enable machine learning-based analysis, that reduced cost and time spent on RCA of performance degradation and bottleneck identification by 2x to 5x in each of the issue instances.
2011 - 2012 @ Deloitte : Performance Engineering Analyst
Reduced application transaction latency to bring it within the Service-Level-Agreement(SLA) window by identifying performance bottlenecks.
Received Applause Award in recognition of my application performance optimization efforts at Deloitte for both optimization and performing Root Cause Analysis (RCA) with traffic profiling, CPU, and memory utilization analysis.
Ensured guaranteed performance (backed by SLAs) in geo-distributed systems.