Towards MapReduce for Desktop Cloud Computing

时间:2013年9月24日(周二)上午10:00-11:30
地点:446会议室

摘要
MapReduce is an emerging programming model for data intensive application proposed by Google, which has attracted a lot of attention recently. MapReduce borrows ideas from functional programming, where programmer de?nes Map and Reduce tasks to process a large set of distributed data. In this presentation, we show an implementation of the MapReduce programming model.
We present the architecture of the prototype based on BitDew, a middleware for large scale data management on Desktop Cloud. Dr. Haiwu HE as one of two co-authors, developed the large-scale distributed data management software BitDew (www.bitdew.net). BitDew provides an automatic data management in a distributed computing environment, a multi-protocol file transfer, transparent data scheduling mechanism for data distribution. For developers, BitDew provides a large scale data management for calculation tasks, whileBitDew adds P2P protocol transfer features for Desktop Cloud for a better scalability. Until February 2013, it has been downloaded more than 4600 times.
We also present performance evaluation of the prototype both against micro-benchmarks and real MapReduce application. The scalability test shows that we achieve a linear speedup on the classic Word Count benchmark. Several scenarios involving lagger hosts and host crashes demonstrate that the prototype is able to cope with an experimental context similar to real-world Internet.

主讲人简介
Dr. Haiwu HE received his M. Sc. and the Ph. D. degrees in computing science from University of Sciences and Technologies of Lille (LilleI),France, respectively in 2002 and 2005. He was a postdoctoral researcher at INRIA Saclay, France in 2007. Currently, he is a research engineer at ENS-Lyon in Lyon, France. He has published about 20 refereed journal and conference papers.He was the ChunHui Scholar of Ministry of Education of China in 2013. And he is also the president of SFSA (Sino-French Software Association). His research and development interest covers HPC, BigData, Cloud computing, Scientific computing, Desktop Grid, etc.