Скачать презентацию Automating the Deployment of a Secure Multi-user HIPAA Скачать презентацию Automating the Deployment of a Secure Multi-user HIPAA

fb28c83c5e1abaed9c8b847d51a7575d.ppt

  • Количество слайдов: 31

Automating the Deployment of a Secure, Multi-user HIPAA Enabled Spark Cluster using Sahara Michael Automating the Deployment of a Secure, Multi-user HIPAA Enabled Spark Cluster using Sahara Michael Le Jayaram Radhakrishnan Daniel Dean Shu Tao Open. Stack Summit, Austin, 2016

Analytics on Sensitive Data Personal Sensitive Data Health information e. g. , lab reports, Analytics on Sensitive Data Personal Sensitive Data Health information e. g. , lab reports, medication records, hospital bill, etc. Financial information e. g. , income, investment portfolio, bank accounts, etc. Secure Hadoop YARN Regulations Numerous governmental regulations to ensure data privacy, integrity and access control, e. g. , HIPAA/HITECH, PCI, etc. Require generation of audit trails: “who has access to what data and when? ” 2

Challenging to Deploy Secure Analytics Platform • Mapping regulations to security mechanism not always Challenging to Deploy Secure Analytics Platform • Mapping regulations to security mechanism not always straightforward • Security mechanisms have lots of “things” to configure: • User accounts, access control lists, keys, certificates, key/certificate databases, security zones, etc. • Fast, repeatable deployment Approach: Use Sahara to expose security building blocks and automate deployment Prototype with Spark on YARN Focus on HIPAA 3

Outline • Introduction • HIPAA and Spark platform • Deploying with Sahara • • Outline • Introduction • HIPAA and Spark platform • Deploying with Sahara • • Automating security enablement Supporting multi-user cluster Supporting both Spark/Hadoop jobs using Sahara API Supporting deployment on Soft. Layer • Summary, lessons and improvements 4

Health Insurance Portability and Accountability Act (HIPAA) • What data is protected? • Any Health Insurance Portability and Accountability Act (HIPAA) • What data is protected? • Any Protected Health Information (PHI): e. g. , medication, diseases, health related measurements, etc. • Who does this apply to? • Entity that access/store data, e. g. , health care clearinghouses, health care providers, and business associates • HIPAA has two rules: Privacy and Security Rules • Privacy rules protect privacy of PHI, access rights, and use of PHI • Security rules operationalizes Privacy rules for electronic PHI (e-PHI) • In general (must maintain w. r. t to PHI): – – Confidentiality, integrity, availability Must protect against anticipated threats to security and integrity Must protect against anticipated impermissible uses or disclosures Ensure compliance by workforce • Continuous risk analysis and management (extension with HITECH) – Mechanisms to log and audit access to PHI (Breach Notification Rule) – Continuous evaluation and maintain appropriate security protections (Enforcement Rule) 5

HIPAA Safeguards for Security Rules • Administrative Safeguards • Physical Safeguards • Technical Safeguards HIPAA Safeguards for Security Rules • Administrative Safeguards • Physical Safeguards • Technical Safeguards • Access controls - ensure only authorized persons can access PHI • Audit controls - record and examine access and other activities in information systems containing PHI • Integrity controls - ensure and confirm PHI not improperly altered or destroyed • Transmission security - prevent unauthorized access to PHI being transmitted over network 6

Secure Spark Service • Goal: Allow Spark to be pluggable into HIPAA-enabled platform • Secure Spark Service • Goal: Allow Spark to be pluggable into HIPAA-enabled platform • Assumptions • • • Two types of users: trusted admin and user of Spark Multi-user (multiple doctors in Hospital A running analytics on multiple patients) One YARN cluster per customer (e. g. Hospital A, Institution B) YARN clusters created by different customers are not necessarily separated by VLANs External data sources should provide drivers to encrypt data ingestion and outputs 7

Secure Spark Service – Design Spark Cluster • Each user has separate account on Secure Spark Service – Design Spark Cluster • Each user has separate account on cluster • Spark executors run inside secure containers created under the credentials of authenticated user • Data files isolated from different users • Isolation of compute resources among containers, min/max CPU and RAM commitment Security among Spark components: • Encrypt communication among Spark executors using SSL and data shuffling via SASL • Spark components authenticate among each other using shared secrets distributed by YARN • Auto renewal of Kerberos credentials to access HDFS Secure access to Spark: • Access to YARN components must go through Kerberos; prevents unauthorized VM joins, impersonations, resource access, job submissions Spark Cluster Secure Container Spark Executor … YARN + HDFS Resource manager Name Node Manager Data Node Cluster Mgmt API Job Mgmt API … Kerberos Realm Sahara (Spark Service Manager) 8

Mapping HIPAA to Secure Spark Cluster HIPAA Technical Safeguards Access controls Mechanisms • • Mapping HIPAA to Secure Spark Cluster HIPAA Technical Safeguards Access controls Mechanisms • • Kerberos Authentication YARN Secure Containers HDFS file permissions HDFS security zones Description • • Prevents unauthorized access of principals, of machines, of platform and framework components Analytics jobs (and generated outputs) of different users are isolated from each other Audit controls • Record job submissions, file accesses by Spark app (Spark extension) • IBM Guardium for log collection and audit report generation Record activities on cluster and send them to IBM Guardium for storage, sorting, and generation of audit reports Integrity controls Access control and data encryption Ensures PHI not improperly altered or modified Transmission security Encryption within cluster: - RPC calls and control messages - Shuffle and web-console interaction Encryption from data source: - Rely on SSL encryption Protection of data at rest and in motion 9

Outline • Introduction • HIPAA and Spark platform • Deploying with Sahara • • Outline • Introduction • HIPAA and Spark platform • Deploying with Sahara • • Automating security enablement Supporting multi-user cluster Supporting both Spark/Hadoop jobs using Sahara API Supporting deployment on Soft. Layer • Summary, lessons and improvements 10

Sahara Provisioning Flow 0. Create base VM image, upload to Glance, register with Sahara Sahara Provisioning Flow 0. Create base VM image, upload to Glance, register with Sahara Heat Template 2. Launch 3. Issues commands Sahara Interface 1. Define VM and cluster templates Sahara Provisioning Engine 5. SSH to VMs: - start daemons - configure nodes 4. Instantiate Open. Stack Components: VMs Nova, Neutron, Glance 11

Sahara on Soft. Layer Portal 12 Sahara on Soft. Layer Portal 12

Sahara on Soft. Layer Portal 13 Sahara on Soft. Layer Portal 13

Sahara on Soft. Layer – Creating Node/Cluster Template HDFS Encryption Secure Mode Kerberos Server Sahara on Soft. Layer – Creating Node/Cluster Template HDFS Encryption Secure Mode Kerberos Server Location Kerberos KMS i. Python Spark Job Server 14

Sahara on Soft. Layer - Launching Cluster Credentials to Iaa. S Base VM image Sahara on Soft. Layer - Launching Cluster Credentials to Iaa. S Base VM image to use 15

Po. C - HIPAA Enabled Spark Service Key. Stone Horizon Cluster for Analytics Sahara Po. C - HIPAA Enabled Spark Service Key. Stone Horizon Cluster for Analytics Sahara Soft. Layer CCI Heat (w/ Soft. Layer resource extension) Soft. Layer Swift Object Storage 16

0. Create base VM image, upload to Soft. Layer Sahara Interface Heat Template Sahara 0. Create base VM image, upload to Soft. Layer Sahara Interface Heat Template Sahara Provisioning Flow – Soft. Layer Sahara Provisioning Engine 2. Launch 3. 4. SL: : Compute: : VM Heat resource plugin SL Python Client Bindings 1. Define VM and cluster templates 6. SSH to VMs: - start daemons - configure nodes 5. Soft. Layer VMs 17

Automating Secure Spark/YARN Deployment 1. Create VM image with base binary 2. Create VM Automating Secure Spark/YARN Deployment 1. Create VM image with base binary 2. Create VM node and cluster templates 3. Instantiate VMs utilizing Iaa. S API 4. Configure security options for YARN and Spark 5. Create Kerberos credentials and SSL certificate 6. Create user account setup and per user local/HDFS folders Sahara Master VM Kerberos Server Workers VM 18

Preparing VM Image for Deployment VM Image for Spark Cluster: • Hadoop YARN 2. Preparing VM Image for Deployment VM Image for Spark Cluster: • Hadoop YARN 2. 7. 1 • Spark 1. 5. 2 • Kerberos • Spark Job Server • Jupyter (i. Python) Currently, manual process of creating VM image for Soft. Layer Future: extend Disk Image Builder and automate uploading to Soft. Layer 19

Authentication using Kerberos 6 Service Session Ticket Users or Client Processes 7 Access Service Authentication using Kerberos 6 Service Session Ticket Users or Client Processes 7 Access Service Ticket Granting Server 4 Service request 1 Ticket Request Authentication Server YARN/Hadoop Service 5 Service lookup 3 TGS Session Key, TGT Users and YARN/Hadoop Processes ‒ authenticated with Keytabs (encrypted keys associated with principal passwords) ‒ Keytabs created while installing YARN/Hadoop and stored on local file system 2 User lookup Key Manager (KDC) Kerberos Server 20

Kerberos Server Configuration Sahara Kerberos Server Master VM Worker VMs Kerberos Server Spark Cluster Kerberos Server Configuration Sahara Kerberos Server Master VM Worker VMs Kerberos Server Spark Cluster 1 Master VM Worker VMs Kerberos Server Spark Cluster N Spark Cluster 2 21

Security Configuration Operations Create cluster • Per node: ‒ Configure for Kerberos: • Fixup Security Configuration Operations Create cluster • Per node: ‒ Configure for Kerberos: • Fixup Kerberos configuration: /etc/krb 5. conf • Create principals: host, each service - RM, NN, DN, KMS • Create keytabs for each service/host pair ‒ Configure secure container: • Fixup Hadoop configuration • Set correct permissions on Hadoop files and folder Sahara ‒ Configure for Kerberos: • Create principals: each user in cluster • Create keytabs for each principal • Copy keytabs to node that interacts with YARN ‒ Create user account on all nodes ‒ Create HDFS directory ‒ Generate HDFS encryption key and upload to KMS Kerberos Server Master VM • Per user: • Per cluster: ‒ Configure SSL: • On ‘master’ node: - Create SSL keystore for each node by obtaining a certificate from CA (self-signed for testing) - Create single SSL truststore for entire cluster - Copy keystore to respective node - Copy truststore to all nodes Workers VM Scale out • Configure Kerberos: ‒ Perform “Create Cluster, Per Node” operations ‒ Copy user keytabs to node if needed to interact with YARN • Configure SSL: ‒ Per node being added: • Create SSL keystore for each node • Copy keystore to respective node Scale in • Configure Kerberos: ‒ Per node being removed: Remove all principals 22 associated with node

Multi-user Cluster Requirements Single ‘Spark’ user not sufficient for privacy as analytics job potentially Multi-user Cluster Requirements Single ‘Spark’ user not sufficient for privacy as analytics job potentially can access other users’ data Expose new API to make on-boarding of new users easier • Create user account on each node of cluster • Create per user local directories and HDFS secure directories 23

Multi-user Cluster Support • Extend REST API v 10. py: /clusters/add_user/<user_name> • This operations Multi-user Cluster Support • Extend REST API v 10. py: /clusters/add_user/ • This operations creates a new user account on specified cluster • Create new Unix user account on each host • Create HDFS directory and secure zone key • Create and distribute Kerberos key • Specific to our use case: /clusters/add_user/ • Creates new user account on specified cluster and starts a new i. Python server process • Returns IP of node running i. Python Challenge: Sahara hardcodes ‘hadoop’ user! 24

Submit Spark Jobs on YARN Through Sahara • Extend Sahara to allow Spark and Submit Spark Jobs on YARN Through Sahara • Extend Sahara to allow Spark and Hadoop jobs to submitted to vanilla Hadoop cluster • No changes to REST API in Sahara Implementation: Detect Spark job type and construct spark-submit string accordingly based on plugin “engine” name • On cluster creation: configure spark_env. sh to point to Hadoop’s configuration files • Modify: plugins/vanilla/vx_x_x/versionhandler. py • return Spark EDP engine when job type is Spark • Modify: run_job() in Spark EDP engine. py: • Check cluster type (plugin name, e. g. , vanilla or stand-alone Spark or even Mesos!) • Craft the spark-submit string to match platform 25

Supporting Deployment on Soft. Layer • Modification to Sahara at Heat template creation • Supporting Deployment on Soft. Layer • Modification to Sahara at Heat template creation • Create new heat plugin: SL_compute • Create new resource type – SL: : Compute: : VM • Communicates using Soft. Layer’s (SL) REST API to create Cloud. Layer Computing Instances(CCIs) • Translates Open. Stack VM properties to SL’s equivalent • Generate SL heat template • Add new fields in Sahara to accept and use Soft. Layer credentials, data centers, object stores, etc. • Add new heat template resource files 26

Example Sahara Heat Template Output Example Sahara Heat Template Output "AWSTemplate. Format. Version" : "2010 -09 -09", "Description" : "Data Processing Cluster by Sahara on Soft. Layer", "Resources" : { "Outputs": { "test-spark 1 -6 -0 -1 -allwkrbsec-001" : { "test-spark 1 -6 -0 -1 -allwkrbsec-001_ip" : { "Type" : "SL: : Compute: : VM", "Description": "IP address of the instance", "Properties" : { "Value": { "Fn: : Get. Att": ["test-spark 1 -6 -0 -1 -allwkrbsec-001", "Public. Ip"] } "host_name" : "test-spark 1 -6 -0 -1 -allwkrbsec-001", }, "domain" : "whc. softlayer. com", "test-spark 1 -6 -0 -1 -allwkrbsec-001_priv_ip" : { "cpus" : "2", "Description": "Private IP address of the instance", "memory" : "4096", "Value": { "Fn: : Get. Att": ["test-spark 1 -6 -0 -1 -allwkrbsec-001", "Private. Ip"] } "hourly" : "True", }, "disk_type" : "SAN", "test-spark 1 -6 -0 -1 -allwkrbsec-001_password" : { "datacenter" : "hou 02", "Description": "login password for the instance", "nic_speed" : "1000", "Value": { "Fn: : Get. Att": ["test-spark 1 -6 -0 -1 -allwkrbsec-001", "passwords"] } "image_id" : “ccca 6 a 66 -f 8 ce-4 ff 1 -a 647 -a 42 bacfxdadsf", } "public_vlan": "Spark-FE-1", } "private_vlan": "Spark-BE-1", } "ssh_keys": [], "sl_user": “user 1", "sl_apikey": “xxxxxxxyyyyyzzzzzz", "user_data": { "Fn: : Join" : ["n", ["#!/bin/bash", "echo "ssh-rsa AAAAB 3 Nza. C 1 yc 2 EAAAADAQABAAABAQC 5 Ks. WE 2 Bj. Psz 7 ub. Vw+UDVtp. CUs. Fi. YOc. Kdxx 2 u. EHem 8 q. UGo. F 5 Hv. Iu 0 h. Ts 43 Ki. ELB 9 v 8 PJ 8 Xw. Ou. JCBm. Kg. W 1 un. Ukma. Ufo. F 7 fz. Mlhn 77 tq 1 KOc 3+8 s. Ddxdg. YIGi+x. GEASP Qb. Bvh 4 Mzi. TRx 0 cz 6+PXRa. UGv. Dq. BUp. Xwbzy. VCHVGla. LWd 9 o. Dsoqgh. UPPYRj. BTKpzt. INasdfasdfasdfasdfasdfadsfasdf. P 0 Pt. C 8 RRnf 1 Bs. VR+g. EXAG 4 m. Dpw/z/GCx 5 Fnwi 4 BLBi. Xm 6 Dk. Rm. NEK d 6 p. D 8 mr+7 ae. XDdc. Dk 7 r. R 7 YN 8 c 7 it 9 X/CDvf. FMv. LR+u. P Generated by Sahara", "" >> /root//. ssh/authorized_keys", ""]]}, 27

Summary • Show HIPAA requirements are mapped to security mechanisms needed in analytics cluster Summary • Show HIPAA requirements are mapped to security mechanisms needed in analytics cluster • Extended Sahara to provide easy way to deploy a HIPAA enabled Spark cluster on the cloud • Automated security configuration (authentication, encryption, access control) • Provide ability to add new users to running system • Extended Sahara to submit both Spark and Hadoop jobs to single vanilla YARN cluster • Extended Sahara to make use of Soft. Layer Heat extension 28

Lessons Learned • Not necessary to enable “all” security mechanisms to satisfy regulations • Lessons Learned • Not necessary to enable “all” security mechanisms to satisfy regulations • Enabling all security features drastically reduces performance • Not always clear what is minimum “security” needed to be compliant • Componentization of security mechanisms a useful thing! • Can we break down security features even more to allow fine-grained security mechanism selection and comparison? • Benefit: Allow easy/fast testing of different combinations of security components and assess tradeoffs (performance vs. additional security) • Provide a few “desirable” cluster templates very useful • Users are daunted by myriads of options (even with GUI) 29

Further Improvements • Sahara: • Further componentize security features and expose those features as Further Improvements • Sahara: • Further componentize security features and expose those features as part of cluster creation options, e. g. , no authentication needed but encrypt all data on disk, specify amount of logs collected, etc. • Add new user account management mechanisms: • Delete users, change permissions, etc. • Remove assumption about hardcoded ‘hadoop’ user • API to retrieve data from analytics run • Better support for other clouds like Soft. Layer • Security mechanisms: • Confirm integrity of data, e. g. automated checksum and verification • Manage keytab files better 30

Thank you! Questions? 31 Thank you! Questions? 31