FP7 Logo

EC Project 257859

European Union

Co-funded by the 
European Union
 

TUB aims to process a 700 million interactions dataset for ROBUST

TU Berlin is currently continuing its work on distributed link prediction and collaborative filtering algorithms. At the heart of these approaches is a pairwise similarity comparison of the columns of an enormous sparse matrix.

TUB is running lots of experiments on its new Hadoop/Stratosphere cluster that consists of 6 machines with each having 2 8core AMD Opteron CPUs, 32 GB of memory and 4 1TB disk drives. The datasets of the usecase partners SAP and IBM that consist of graphs representing community interactions could both already be processed in less than an hour. As a first benchmark TUB is currently working on processing one of the largest freely available rating datasets: A Yahoo webscope dataset that consists of 700 million ratings given by 1.8 million users towards 136 thousand songs. We will keep you updated on the final results once we've accomplished this!

Comments (0)