论文
2015 – 2016
2015 – Facebook – One Trillion Edges: Graph Processing at Facebook-Scale.(一兆边:Facebook规模的图像处理)
2013 – 2014
2014 – Stanford – Mining of Massive Datasets.(海量数据集挖掘)
2013 – AMPLab – Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. (Presto: 稀疏矩阵的分布式机器学习和图像处理)
2013 – AMPLab – MLbase: A Distributed Machine-learning System. (MLbase:分布式机器学习系统)
2013 – AMPLab – Shark: SQL and Rich Analytics at Scale. (Shark: 大规模的SQL 和丰富的分析)
2013 – AMPLab – GraphX: A Resilient Distributed Graph System on Spark. (GraphX:基于Spark的弹性分布式图计算系统)
2013 – Google – HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm. (HyperLogLog实践:一个艺术形态的基数估算算法)
2013 – Microsoft – Scalable Progressive Analytics on Big Data in the Cloud.(云端大数据的可扩展性渐进分析)
2013 – Metamarkets – Druid: A Real-time Analytical Data Store. (Druid:实时分析数据存储)
2013 – Google – Online, Asynchronous Schema Change in F1.(F1中在线、异步模式的转变)
2013 – Google – F1: A Distributed SQL Database That Scales. (F1: 分布式SQL数据库)
2013 – Google – MillWheel: Fault-Tolerant Stream Processing at Internet Scale.(MillWheel: 互联网规模下的容错流处理)
2013 – Facebook – Scuba: Diving into Data at Facebook. (Scuba: 深入Facebook的数据世界)
2013 – Facebook – Unicorn: A System for Searching the Social Graph. (Unicorn: 一种搜索社交图的系统)
2013 – Facebook – Scaling Memcache at Facebook. (Facebook 对 Memcache 伸缩性的增强)
2011 – 2012
2012 – Twitter – The Unified Logging Infrastructure for Data Analytics at Twitter. (Twitter数据分析的统一日志基础结构)
2012 – AMPLab –Blink and It’s Done: Interactive Queries on Very Large Data. (Blink及其完成:超大规模数据的交互式查询)
2012 – AMPLab –Fast and Interactive Analytics over Hadoop Data with Spark. (Spark上 Hadoop数据的快速交互式分析)
2012 – AMPLab –Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. (Shark:使用粗粒度的分布式内存快速数据分析)
2012 – Microsoft –Paxos Replicated State Machines as the Basis of a High-Performance Data Store. (Paxos的复制状态机——高性能数据存储的基础)
2012 – Microsoft –Paxos Made Parallel. (Paxos算法实现并行)
2012 – AMPLab – BlinkDB:BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.(超大规模数据中有限误差与有界响应时间的查询)
2012 – Google –Processing a trillion cells per mouse click.(每次点击处理一兆个单元格)
2012 – Google –Spanner: Google’s Globally-Distributed Database.(Spanner:谷歌的全球分布式数据库)
2011 – AMPLab –Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.(Scarlett:应对MapReduce集群中的偏向性内容)
2011 – AMPLab –Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.(Mesos:数据中心中细粒度资源共享的平台)
2011 – Google –Megastore: Providing Scalable, Highly Available Storage for Interactive Services.(Megastore:为交互式服务提供可扩展,高度可用的存储)
2001 – 2010
2010 – Facebook – Finding a needle in Haystack: Facebook’s photo storage.(探究Haystack中的细微之处: Facebook图片存储)
2010 – AMPLab – Spark: Cluster Computing with Working Sets.(Spark:工作组上的集群计算)
2010 – Google – Storage Architecture and Challenges.(存储架构与挑战)
2010 – Google – Pregel: A System for Large-Scale Graph Processing.(Pregel: 一种大型图形处理系统)
2010 – Google – Large-scale Incremental Processing Using Distributed Transactions and Noti?cations base of Percolator and Caffeine.(使用基于Percolator 和 Caffeine平台分布式事务和通知的大规模增量处理)
2010 – Google – Dremel: Interactive Analysis of Web-Scale Datasets.(Dremel: Web规模数据集的交互分析)
2010 – Yahoo – S4: Distributed Stream Computing Platform.(S4:分布式流计算平台)
2009 – HadoopDB:An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.(混合MapReduce和DBMS技术用于分析工作负载的的架构)