-
Notifications
You must be signed in to change notification settings - Fork 2
/
others.tex
49 lines (29 loc) · 8.8 KB
/
others.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
\section{Knowledge Transfer and Prototype Systems}
We plan to run this research project as a lab, similar to AMP Lab. We call the new lab {\em RISE (Real-time Intelligent Secure Evaluation)}.
{\bf Knowledge Transfer:} Over the years, the systems group at Berkeley has had great success transferring their technology by distributing open-source software (\eg BSD and Postgres) and engaging with companies to promote radical new technologies (e.g., RAID and RISC). More recently, the AMP and ASPIRE labs have distributed and transferred a wide range of technologies, including Apache Mesos~\cite{mesos}, Apache Spark~\cite{spark}, Mesos~\cite{mesos}, and RISC V~\cite{risc-v}. Apache Spark is the most popular big data framework today, with over 1,000 contributors, and over 60,000 meetup members worldwide. IBM has called it ``the Most Significant Open Source Project of the Next Decade'' and ``the OS of Big Data'', and last year announced a \$300 million investment to develop Spark technologies. Spark is included in virtually every big data software distribution, and it is used by thousands of organizations in productions. Apache Mesos is the first operating system for datacenters, and now is in used at major companies to manage thousand-node datacenters, including Twitter, Apple, and GE. RISC V, the first open source instruction set architecture, is already disrupting the CPU and custom ASIC markets. Google, HP Labs, and Oracle, just to name a few, are members of the RISC V foundation, and many companies have already started to develop chips based on RISC V. Finally, Berkeley students and faculty have founded companies to support these open source efforts for all these projects.
RISE Lab will continue this tradition, by developing a suite of software tools and algorithms which will dramatically lower the barrier for organizations and individuals alike to build real-time decision and predictive analytics applications. We hope this platform will have a significant impact in the industry, and will lead to the creation of new companies and the development of new applications which were not possible before.
%\cite{late-osdi,delay-scheduling-eurosys,mesos-hotcloud,mesos-techreport,drf-techreport,spark-hotcloud,perf-prediction-icde,scads-cidr,piql-socc,replay-sosp,log-mining-sosp,online-log-mining-icdm,fingerprinting-eurosys,spikes-socc,xtrace-nsdi,ml-security-sigmetrics,ml-security-leet,policy-aware-sigcomm,ml-security-raid,stat-debugging-icml,ethane,of-ccr,nox}
{\bf Industry Participation:} As with the AMP Lab (which had over 30 industrial partners over its five years), industrial participation in the new project will be crucial to its success. Only through close and constant interactions with industry can we learn about the problems that arise in large production systems. We are at the early stages to involve the industry on this project, but the early signs are very promising. We expect that starting with 2017 to raise \$1.5-\$2M from these partners. This funding will provide the additional resources necessary to run RISE Lab (i.e., which we expect will be 2/3 the size of AMP Lab), and it will be critical to cement the commitment of our industrial partners for a long-term collaborative relationship.
\section{Management and Collaboration}
Creating an effective collaborative climate among an interdisciplinary team of researchers is a daunting task. However, most of the team has worked together closely over the past five years as part of Berkeley's AMP and ASPIRE Labs. These brought a diverse group of researchers (with expertise ranging from machine learning to networking to operating systems to security to computer architecture) together to explore the new field of datacenter computation. Almost all publications from the AMP Lab involved interdisciplinary teams, with strong input from industry experts.
The AMP and ASPIRE Labs used a variety of measures to facilitate collaboration, and we expect to leverage these same practices for the RISE Lab. To foster collaboration, we will have:
\begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt]
\item Weekly all-hands meetings for talks on recent progress and to discuss future plans.
\item Weekly meetings for each of the smaller projects (we expect roughly a dozen such meetings weekly).
\item Monthly 3 hour faculty-only meetings over dinner, for in-depth technical discussions and long-term planning.
\item Biannual three-day retreats, where we present results to outside experts
%(particularly from industry)
and discuss their feedback.
\end{itemize}
In addition, the core members of the team will work in the laboratory developed for the AMP Lab, where both students and faculty share a large open space. This facilitates ongoing communication among all members of the project and increases faculty-student interactions.
%The lead investigator of the project is Ion Stoica, and he will be responsible for overseeing the effort. He will be assisted by Mike Jordan and Joe Hellerstein, who will help Stoics ensure that all areas of the project are moving forward. They will be assisted by two administrative staff.
\section{Broader Impacts}
As with AMP Lab, we will inspire students from under-represented groups to pursue rewarding careers in computer and information science and engineering through the following steps:
{\bf Leading by Example:} Of the 10 members of the team, one is woman, and another two belong to under-represented minorities. While we cannot predict which students will work on the next project, we note that out of the 50 postdocs and students currently sitting in the AMP Lab (excluding those belonging to a faculty member not participating in this effort) 14 are women. Moreover, the PIs have been leading the EECS departmental efforts for diversity hiring that recently led to the hiring of three women faculty over the past two years (one of them, Raluca Ada Popa, is participating in this project as a co-PI).
{\bf Broadening Training and Academic Impact:} Over the past few years, we have trained over 40K students on Spark and thousands on Mesos and Tachyon. We have done this via annual boot camps, running training classes at big data conferences, such as Strata, Hadoop Summit, and Spark Summit, and MOOCs. Right now Apache Spark is used at dozens of universities across the globe in data science classes. AMP lab has also produced leaders in their fields both in academia and industry. The AMP lab alone has produced students who are now faculty members at top universities, including MIT (3 faculty members), Stanford University (3 faculty members), UIUC, and University of Michigan. In addition, Matei Zaharia won the ACM Best Dissertation Award and John Duchi an Honorable Mention for the ACM Best Dissertation Award, both in 2014. Similarly, several students have gone to start successful companies to support the open source software we developed, including Databricks (Spark), Mesosphere (Mesos), and Alluxio (Tachyon).
{\bf Leveraging Local Programs:} This project will leverage the ongoing campus diversity programs, BFOIT and SUPERB. The Berkeley Foundation for Opportunities in Information Technology (BFOIT) \cite{bfoit} supports historically underrepresented ethnic minorities and women in their desire to become leaders in the fields of computer science, engineering and information technology.
%The intent is to provide youth with knowledge, resources, practical programming skills and guidance in their pursuit of higher education and production of technology.
Berkeley's College of Engineering has been a leader in offering opportunities to underrepresented undergraduates to work on research projects through the Summer Undergraduate Program in Engineering Research at Berkeley (SUPERB) Program \cite{superb}.
{\bf Broadening Research Participation:} The SRDS system will put a set of sophisticated data processing tools in the hands of all interested researchers. Researchers will be able to use SRDS directly to explore new applications, or contribute their own algorithmic modifications. In addition, for researchers that cannot afford to pay for cloud-based computational services, we will make our own computational resources available for them to run SRDS. We have already done this in the case of Spark. One of the startups we founded, Databricks, is now providing a hosted Spark version that allows everyone to learn Spark for free.
{\bf Broadening the Information Economy:} If our project is successful, it will increase people's participation in data economy. This will provide employment opportunities for any individual with Internet access, and give those individuals opportunities for more expansive roles as they gain experience and expertise.
%. And more than just providing a job, this will provide entry-level positions in the information economy that require little training, but yet hold the potential for more expansive roles as they gain experience and expertise.