Machine Learning with Apache Hama Tommaso Teofili tommaso

Скачать презентацию Machine Learning with Apache Hama Tommaso Teofili tommaso

a9072fd80066aae7cbaaae7afc8c28ed.ppt

Количество слайдов: 34

Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org 1

About me l l l l ASF member having fun with: Lucene / Solr Hama UIMA Stanbol … some others SW engineer @ Adobe R&D 2

Agenda l Apache Hama and BSP l Why machine learning on BSP l Some examples l Benchmarks 3

Apache Hama l Bulk Synchronous Parallel computing framework on top of HDFS for massive scientific computations l TLP since May 2012 l 0. 6. 0 release out soon l Growing community 4

BSP supersteps l A BSP algorithm is composed by a sequence of “supersteps” 5

BSP supersteps Each task l l Superstep 1 l l Superstep 2 l l l l Do some computation Communicate with other tasks Synchronize … … … Superstep N l l l Do some computation Communicate with other tasks Synchronize 6

Why BSP l l l Simple programming model Supersteps semantic is easy Preserve data locality Improve performance Well suited for iterative algorithms 7

Apache Hama architecture l BSP Program execution flow 8

Apache Hama architecture 9

Apache Hama l Features l l l l l BSP API M/R like I/O API Graph API Job management / monitoring Checkpoint recovery Local & (Pseudo) Distributed run modes Pluggable message transfer architecture YARN supported Running in Apache Whirr 10

Apache Hama BSP API l public abstract class BSP … l l l K 1, V 1 are key, values for inputs K 2, V 2 are key, values for outputs M are they type of messages used for task communication 11

Apache Hama BSP API l l l public void bsp(BSPPeer peer) throws. . public void setup(BSPPeer peer) throws. . public void cleanup(BSPPeer peer) throws. . 12

Machine learning on BSP l l Lots (most? ) of ML algorithms are inherently iterative Hama ML module currently counts l l l Collaborative filtering Clustering Gradient descent 13

Benchmarking architecture Node Hama Solr DBMS Lucene Mahout HDFS 14

Collaborative filtering l l Given user preferences on movies We want to find users “near” to some specific user So that user can “follow” them And/or see what they like (which he/she could like too) 15

Collaborative filtering BSP l l l Given a specific user Iteratively (for each task) Superstep 1*i l l Read a new user preference row Find how near is that user from the current user l l l That is finding how near their preferences are l Since they are given as vectors we may use vector distance measures like Euclidean, cosine, etc. distance algorithms Broadcast the measure output to other peers Superstep 2*i l l Aggregate measure outputs Update most relevant users l Still to be committed (HAMA-612) 16

" src="https://present5.com/presentation/a9072fd80066aae7cbaaae7afc8c28ed/image-17.jpg" alt="Collaborative filtering BSP l l l l Given user ratings about movies "john" ->" /> Collaborative filtering BSP l l l l Given user ratings about movies "john" -> 0, 0, 0, 9. 5, 4. 5, 9. 5, 8 "paula" -> 7, 3, 8, 2, 8. 5, 0, 0 "jim” -> 4, 5, 0, 5, 8, 0, 1. 5 "tom" -> 9, 4, 9, 1, 5, 0, 8 "timothy" -> 7, 3, 5. 5, 0, 9. 5, 6. 5, 0 We ask for 2 nearest users to “paula” and we get “timothy” and “tom” l l user recommendation We can extract highly rated movies “timothy” and “tom” that “paula” didn’t see l Item recommendation 17

Benchmarks l l l Fairly simple algorithm Highly iterative Comparing to Apache Mahout Behaves better than ALS-WR Behaves similarly to Recommender. Job and Item. Similarity. Job 18

K-Means clustering l We have a bunch of data (e. g. documents) We want to group those docs in k homogeneous clusters l Iteratively for each cluster l l l Calculate new cluster center Add doc nearest to new center to the cluster 19

K-Means clustering 20

K-Means clustering BSP l l l Iteratively Superstep 1*i Assignment phase l Read vectors splits l Sum up temporary centers with assigned vectors l Broadcast sum and ingested vectors count Superstep 2*i Update phase l Calculate the total sum over all received messages and average l Replace old centers with new centers and check for convergence 21

Benchmarks l l l One rack (16 nodes 256 cores) cluster 10 G network On average faster than Mahout’s impl 22

Gradient descent l l l Optimization algorithm Find a (local) minimum of some function Used for l l l solving linear systems solving non linear systems in machine learning tasks l l linear regression logistic regression neural networks backpropagation … 23

Gradient descent Minimize a given (cost) function l Give the function a starting point (set of parameters) l Iteratively change parameters in order to minimize the function l Stop at the (local) minimum l l There’s some math but intuitively: l evaluate derivatives at a given point in order to choose where to “go” next 24

Gradient descent BSP l Iteratively l Superstep 1*i l l each task calculates and broadcasts portions of the cost function with the current parameters Superstep 2*i l l aggregate and update cost function check the aggregated cost and iterations count l l Superstep 3*i l l cost should always decrease each task calculates and broadcasts portions of (partial) derivatives Superstep 4*i l aggregate and update parameters 25

Gradient descent BSP l Simplistic example l l Linear regression Given real estate market dataset Estimate new houses prices given known houses’ size, geographic region and prices Expected output: actual parameters for the (linear) prediction function 26

Gradient descent BSP l l Generate a different model for each region House item vectors l l price -> size 150 k -> 80 2 dimensional space ~1. 3 M vectors dataset 27

Gradient descent BSP l Dataset and model fit 28

Gradient descent BSP l Cost checking 29

Gradient descent BSP l l Classification Logistic regression with gradient descent Real estate market dataset We want to find which estate listings belong to agencies l To avoid buying from them Same algorithm With different cost function and features l l l Existing items are tagged or not as “belonging to agency” Create vectors from items’ text Sample vector l 1 -> 1 3 0 0 5 3 4 1 30

Gradient descent BSP l Classification 31

Benchmarks l l Not directly comparable to Mahout’s regression algorithms Both SGD and CGD are inherently better than plain GD But Hama GD had on average same performance of Mahout’s SGD / CGD Next step is implementing SGD / CGD on top of Hama 32

Wrap up l l l Even if ML module is still “young” / work in progress and tools like Apache Mahout have better “coverage” Apache Hama can be particularly useful in certain “highly iterative” use cases Interesting benchmarks 33

Thanks! 34