HIMSS Cloud Computing Forum
Follow-up Thoughts from Joey Liaw
A few weeks ago, I had the honor of co-presenting with Jess Kahn at the HIMSS Cloud Computing Forum - a convening of leaders in health technology around the challenges and opportunities of using the cloud in health care. Jess and I shared some of our collective learnings from building the next-generation Medicaid data warehouse on Amazon Web Services.
Audience members and HIMSS attendees asked us great questions about our choice to use AWS and its implications for security and deployment. These are important questions, so I wanted to expand my responses and share them here.
Caveat: I am not a Federal employee; I work at Nuna, subcontracted to QSSI to do this work. These opinions are my own and not the opinions of CMS, QSSI or Nuna.
Is consumer cloud really secure enough to store my org's PHI?
Center for Medicare and Medicaid Services (CMS) has very stringent security compliance requirements, the Acceptable Risk Safeguards (ARS), which are similar to NIST 800-53. Amazon publishes a handy compliance matrix which shows you which services are covered by the compliance standards you need for your purposes.
The fact that Medicaid has decided to host its data on Amazon Web Services can hopefully set an example to other organizations which are considering migrating from on-prem to cloud deployments. The performance is there, the security is there, the compliance is there, the total cost of owernship savings are there, the enterprise support is there.
Amazon hosts the internal infrastructure for many thousands of corporations and organizations including the Department of Defense and the Department of Health and Human Services. If Amazon ever were to have a breach, it would be devastating to all of its customers and shake the foundations of belief in Amazon as a secure cloud provider. In fact, all of these organizations are able to focus on their core competencies by leveraging the huge investment Amazon has spent on physical and virtual security. The dollars and person-hours that Amazon and other consumer cloud providers spend on physical and cybersecurity will most likely far exceed the amount that a healthcare services company would ever be willing to budget for an on-prem datacenter.
Note: No consumer cloud provider will automatically be secure and/or compliant for any purpose at the push of a button. General cloud computing infrastructure is flexibile to meet a huge range of performance/security/convenience tradeoffs. No matter which cloud provider you use, signifcant work and investment is required to make a production system secure, compliant, and perform to spec.
How did we select a cloud provider?
To meet the challenges of the project, we needed a cloud service provider with the following properties:
- support automated network+machine configuration and deployment
- spin up massive (100-CPU) on-demand compute clusters within minutes
- offer unlimited, durable, low-cost, encrypted-at-rest data storage
- offer managed database services that could scale to big data needs
- sign a HIPAA Business Associate Agreement (BAA) to share legal responsibility for PHI protection
- satisfy FedRAMP compliance needs for HHS/CMS
Amazon Web Services was the clear choice at the time we started the project in early 2015 from a compliance and feature set standpoint. In today's competitive consumer cloud landscape, every organization should do their own diligence on which consumer cloud offering from Amazon, Google, or Microsoft (or even multicloud strategy) is the best fit for each project.
In more detail:
There should be a programmatic API for everything, including defining the network topology, provisioning machines and resources, and preparing images. Otherwise, it's impossible to automate your deployment process and verify its correctness.
It should be possible to provision dozens to hundreds of machines on demand and be charged only for the resources that you use for the time that you use them. This gives you maximum flexibility for batch or burst periods without requiring you to pay for over-provisioned machines that sit idle most of the year.
There should be no limit to the amount of durable storage space available, and it should be durable against failure, have a good disaster recovery story, and cost a reasonable amount, paying only for what you use.
It should be very easy for all sensitive data to be encrypted at rest, protected by HSM keys the client never needs to see, using industry-standard encryption algorithms. All data and requests to cloud provider API and services should be encrypted in transit using industry-standard protocols and best practices. For example, the most recent ELB security policy has pretty good presets which disallow deprecated/insecure protocols over SSL.
Having cloud-managed database/cluster services such as RDS, Redshift, and EMR allows the engineering and devops team to focus on application business logic and configuration instead of database management.
Signing a HIPAA BAA to share the responsibility of protecting PHI is really table stakes to consider usage of a cloud's services.
We needed a FedRAMP-compliant, HIPAA-BAA-covered cloud service provider that also satisfied the above. This eliminated most of the cloud services out there. Before starting this project, I had just spent six months getting AWS approved for healthcare.gov, so I knew it would be possible.
AWS Services Used
- Virtual Private Cloud: software-defined network topology which provides a secure basic boundary for your cluster
- Elastic Compute Cloud (EC2) key features:
- Security Groups: more effective than traditional enterprise firewall; no SPOF, no performance bottleneck, and nicely human-readable firewall rules based on named roles and broken down into small chunks. Used correctly, security groups can make it vastly easier to secure, configure, understand, and maintain than of a monolithic configuration block of CIDR-based rules, complex triggers, and hierarchical priorities. Of course, you can still use CIDR rules if you really need them.
- Dedicated Tenancy: increased VM cost but guarantees single-tenant hardware, mitigating certain classes of VM hypervisor attacks from other accounts
- Simple Storage Service (S3): Unlimited durable data storage with low TCO, disaster recovery simple with cross-region replication, and encrypted-at-rest easily applied and policy-enforced on all objects.
- Elastic Map Reduce: managed cluster management for creating massive Hadoop MapReduce or Spark clusters on demand.
- Redshift: instant-on SQL-queryable data warehouse up to 2 PB. Good for many things, but not a silver bullet for everything. For example, Redshift does not use indexes like a traditional RDBMS, so careful table design with respect to distribution and sort keys and understanding of the underlying technology hidden by the SQL abstraction is even more important.
- Not used: GovCloud, which only provides a few more ITAR controls which was not needed for this project (and most likely, not your project either), and has the disadvantage of slightly less availability of instances and newer services.
Philosophy of deployment
- Since everything in AWS can be configured by API, devops can tie the configuration of the system to the code lifecycle process. In this way, feature changes, bug fixes, and firewall+network topology changes alike go through the same stringent code review, test automation, and certification process.
- As much as possible, avoid upgrading or applying security updates to OS's in-place or deploy code changes to long-running servers. Instead, terminate old or insecure instances, build new machine images from known clean state as part of CI process, and bring up completely new instances. Creating immutable infrastructure makes it much easier for the devops team to reason about the state of the entire cluster and raises the bar for attackers attempting to create a persistent beachhead.
- Write code that is efficient enough to make it feasible to reprocess all of the data with every code change. Keeping the generated (ETL-processed) data in lockstep with code results in a huge reduction of complexity in queries for analysts, the real end-users.
The above takes significant effort, investment in strong engineering and devops teams, and a disciplined code lifecycle process. As much as possible, we try to reduce the iteration feedback cycle between end-users and developers. Some folks call this agile. I think that labels are less useful than following good common-sense guidelines such as the USDS Playbook.
Configuration as code and heavy use of devops certainly takes a different mindset than configuration through manually-processed tickets and may require an organizational shift away from traditional system and database administrators. Good processes and enforcement of those processes is also vitally important since the security of the system now relies on the rigor of the software lifecycle process itself. The safety armor isn't removed so much as changing in shape; done correctly, overall protection is increased from increased automation and better processes.
On a long enough time scale, there is no such thing as perfect security. Therefore, it isn't a matter of if, but when, a breach will occur. Your organization's response to a breach will then be measured by the structural controls in place to contain the damage and the speed and accuracy of the human response to prevent catastrophic failure. Continuous integration, deployment automation, test automation, and immutable infrastructure can greatly improve the effectiveness of those controls and allow your team to concentrate on things that only humans can do, and let robots be good at being robots.
Final thought for CIOs and CISOs in the audience
If you haven't already, please schedule a Red Team exercise from a reputable security firm and close as many findings as you can. Usually, a compliance security auditor will run a Nessus and Fortify scan and check that your paperwork and procedures say that they are compliant. Beyond that, an effective Red Team will spend weeks infiltrating your physical security, sprinkle USB keys in your parking lot, disappear a few unencrypted hard drives, viciously phish your executive's assistants, and probe 0-days on every open port on every corporate server, laptop, HVAC system, badge scanner, and security camera connected to the internet, actually stress-testing your organization and infrastructure's ability to prevent unauthorized access of sensitive data.
My PHI data is probably on your systems. Your family's PHI data is probably on each other's systems. Every passing day, the gap between certified/compliant security and actual security is growing and accelerating. Nowadays, it takes a script kiddie only a few minutes to scan the entire IPv4 address space for open vulnerabilities, but some organizations still consider protecting IP address information an effective form of security through obscurity. Imagine if cars kept getting faster but we only used safety checklists and stopped using crash test dummies. If healthcare organizations continue not to regularly test the security of systems which contain PHI with realistic attack simulations, then our PHI is being collectively protected only by the power of belief.
Great appreciation to Anna Fuller, Michelle Mills, Clint Talbert, and Bob Wood for review and suggestions for this post.