-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.html
351 lines (278 loc) · 15.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Basic Page Needs
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<meta charset="utf-8">
<title>The Blaze Ecosystem</title>
<meta name="description" content="">
<meta name="author" content="Blaze Developers">
<!-- Mobile Specific Metas
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- FONT
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link href='//fonts.googleapis.com/css?family=Raleway:100,200,300,400,600' rel='stylesheet' type='text/css'>
<link href='http://fonts.googleapis.com/css?family=Droid+Serif:400,700' rel='stylesheet' type='text/css'>
<!-- CSS
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link rel="stylesheet" href="./theme/css/normalize.css">
<link rel="stylesheet" href="./theme/css/skeleton.css">
<link rel="stylesheet" href="./theme/css/custom.css">
<link rel="stylesheet" href="./theme/css/pygments.css">
<!-- Scripts
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<!-- Favicon
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link rel="icon" type="image/png" href="./theme/images/favicon.ico">
</head>
<body class="code-snippets-visible">
<div class="navbar-spacer"></div>
<nav class="navbar">
<div class="container">
<ul class="navbar-list">
<li class="navbar-item"><a class="navbar-link" href=".">Home</a></li>
<li class="navbar-item"><a class="navbar-link" href="./pages/overview">Overview</a></li>
<li class="navbar-item"><a class="navbar-link" href="./pages/projects">Projects</a></li>
<li class="navbar-item"><a class="navbar-link" href="./pages/talks">Talks</a>
<!--li class="navbar-item"><a class="navbar-link" href="./pages/examples">Examples</a></li-->
<!--li class="navbar-item"><a class="navbar-link" href="./pages/team">Team</a></li-->
<li class="navbar-item"><a class="navbar-link" href="./archives">Blog</a></li>
</ul>
<div class="sponsor">
<small> Sponsored by:</small>
<a href="https://www.continuum.io/"><img src="./images/cio_logo.png" class="cio-icon"></a>
</div>
</div>
</nav>
<!-- Primary Page Layout
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<div class="container">
<section class="site-header">
<div><img class="header-logo" src="./images/blaze.png"><h2 class="title">The Blaze Ecosystem</h2></div>
</section>
<section class="home-section">
<div class="row">
<p>The Blaze ecosystem is a set of libraries that help users store, describe,
query and process data. It is composed of the following core projects:</p>
</div>
<div class="row">
<ul>
<li class="talk-header"><a href="http://blaze.readthedocs.org/en/latest/index.html">Blaze</a>: An interface to query data on different storage systems</li>
<li class="talk-header"><a href="http://dask.readthedocs.org/en/latest/">Dask</a>: Parallel computing through task scheduling and blocked algorithms</li>
<li class="talk-header"><a href="http://datashape.readthedocs.org/en/latest/">Datashape</a>: A data description language</li>
<li class="talk-header"><a href="https://github.com/libdynd/dynd-python">DyND</a>: A C++ library for dynamic, multidimensional arrays</li>
<li class="talk-header"><a href="http://odo.readthedocs.org/en/latest/">Odo</a>: Data migration between different storage systems</li>
</ul>
</div>
</section>
<section class="home-section">
<h3 class="home-header">Recent blog posts</h3>
<article class="home-post">
<header>
<time datetime="" title="2016-02-17T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Wed 17 February 2016</time>
<br>
<h5><a href="./blog/2016/02/17/dask-distributed-1/">Introducing Dask Distributed</a></h5>
<h6>by Matthew Rocklin</h6>
</header>
<div class="article_content">
<p>We analyze GitHub data on a cluster using Dask.</p>
<div>
<a class="button button-primary button-read" href="./blog/2016/02/17/dask-distributed-1/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/dask.html">Dask</a>
<a class="label-default" href="./tag/distributed-computing.html">distributed computing</a>
</div>
</div>
</article>
<div class="separator"></div>
<article class="home-post">
<header>
<time datetime="" title="2015-11-13T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Fri 13 November 2015</time>
<br>
<h5><a href="./blog/2015/11/13/distributed-array/">Distributed Array Experiment</a></h5>
<h6>by Matthew Rocklin</h6>
</header>
<div class="article_content">
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Distributed-Arrays">Distributed Arrays<a class="anchor-link" href="#Distributed-Arrays">¶</a></h2><p>We use <a href="http://dask.pydata.org/en/latest/array.html"><code>dask.array</code></a>, a small cluster on EC2, and <a href="http://distributed.readthedocs.org/en/latest/"><code>distributed</code>
<div>
<a class="button button-primary button-read" href="./blog/2015/11/13/distributed-array/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/dask.html">dask</a>
<a class="label-default" href="./tag/distributed.html">distributed</a>
</div>
</div>
</article>
<div class="separator"></div>
<article class="home-post">
<header>
<time datetime="" title="2015-10-28T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Wed 28 October 2015</time>
<br>
<h5><a href="./blog/2015/10/28/distributed-hdfs/">PyData on HDFS without Java</a></h5>
<h6>by Matthew Rocklin</h6>
</header>
<div class="article_content">
<p>We use snakebite and distributed to run Pandas on CSV data in HDFS</p>
<div>
<a class="button button-primary button-read" href="./blog/2015/10/28/distributed-hdfs/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/hdfs.html">hdfs</a>
<a class="label-default" href="./tag/snakebite.html">snakebite</a>
<a class="label-default" href="./tag/distributed.html">distributed</a>
<a class="label-default" href="./tag/pandas.html">pandas</a>
</div>
</div>
</article>
<div class="separator"></div>
<article class="home-post">
<header>
<time datetime="" title="2015-10-27T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Tue 27 October 2015</time>
<br>
<h5><a href="./blog/2015/10/27/distributed-ad-hoc/">Ad-hoc Distributed Computation</a></h5>
<h6>by Matthew Rocklin</h6>
</header>
<div class="article_content">
<p>Ad-hoc distributed computations with a concurrent.futures interface</p>
<div>
<a class="button button-primary button-read" href="./blog/2015/10/27/distributed-ad-hoc/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/distributed.html">distributed</a>
</div>
</div>
</article>
<div class="separator"></div>
<article class="home-post">
<header>
<time datetime="" title="2015-10-19T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Mon 19 October 2015</time>
<br>
<h5><a href="./blog/2015/10/19/dask-learn/">Pipelines and Reuse with dask</a></h5>
<h6>by Matthew Rocklin</h6>
</header>
<div class="article_content">
<img class="blog-image" src="./images/dasklearn.png">
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><strong>tl;dr: We use <a href="http://dask.pydata.org/en/latest/">dask</a> to accelerate parameter searches over machine learning pipelines by naming consistently.</strong>
<div>
<a class="button button-primary button-read" href="./blog/2015/10/19/dask-learn/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/dask.html">dask</a>
<a class="label-default" href="./tag/sklearn.html">sklearn</a>
<a class="label-default" href="./tag/dasklearn.html">dasklearn</a>
</div>
</div>
</article>
<div class="separator"></div>
<article class="home-post">
<header>
<time datetime="" title="2015-09-16T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Wed 16 September 2015</time>
<br>
<h5><a href="./blog/2015/09/16/reddit-impala/">Analyzing 1.7 Billion Reddit Comments with Blaze and Impala</a></h5>
<h6>by Daniel Rodriguez and Kristopher Overholt</h6>
</header>
<div class="article_content">
<img class="blog-image" src="./images/reddit-impala.png">
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><a href="http://blaze.pydata.org">Blaze</a> is a Python library and interface to query data on different storage systems. Blaze works by translating a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze gives Python users a familiar interface to query data living in other data storage systems such as SQL databases, NoSQL data stores, Spark, Hive, Impala, and raw data files such as CSV, JSON, and HDF5. <a href="https://hive.apache.org/">Hive</a>
<div>
<a class="button button-primary button-read" href="./blog/2015/09/16/reddit-impala/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/blaze.html">blaze</a>
<a class="label-default" href="./tag/impala.html">impala</a>
<a class="label-default" href="./tag/hive.html">hive</a>
<a class="label-default" href="./tag/reddit.html">reddit</a>
</div>
</div>
</article>
<div class="separator"></div>
<article class="home-post">
<header>
<time datetime="" title="2015-09-08T00:00:00+00:00" pubdate style="float:right" class="blogpost-date">Tue 08 September 2015</time>
<br>
<h5><a href="./blog/2015/09/08/reddit-comments/">Analyzing Reddit Comments with Dask and Castra</a></h5>
<h6>by Jim Crist</h6>
</header>
<div class="article_content">
<img class="blog-image" src="./images/reddit-dask-castra.png">
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The scientific Python ecosystem is great for doing data analysis. Packages like NumPy and Pandas provide an excellent interface to doing complicated computations on datasets. With only a few lines of code one can load some data into a Pandas DataFrame, run some analysis, and generate a plot of the results. However, this workflow starts to falter when working with data that's larger than the RAM on your computer. At this point people often move their workflow from a Python based one into some other larger system like Spark or Hadoop. These are great at what they do, but for small problems are <a href="http://research.microsoft.com/pubs/179615/msrtr-2013-2.pdf">a bit overkill</a>
<div>
<a class="button button-primary button-read" href="./blog/2015/09/08/reddit-comments/">Read</a>
</div>
</div>
<div class="meta">
<div>
<a class="label-default" href="./tag/dask.html">dask</a>
<a class="label-default" href="./tag/castra.html">castra</a>
<a class="label-default" href="./tag/reddit.html">reddit</a>
</div>
</div>
</article>
<div class="separator"></div>
<br>
<section class="home-section">
<h3 class="home-header">Talks and Tutorials</h3>
<div class="row">
<ul>
<li><p class="talk-header"><a href="./pages/talks/ep2015-blaze/">Scale your data, not your process. Welcome to the Blaze ecosystem.</a>
<br><b>EuroPython 2015</b>, Christine Doig</p> </li>
<li><p class="talk-header"><a href="./pages/talks/pygotham2015-dask/">Going Parallel and Larger-than-memory with Graphs</a>
<br><b>PyGotham 2015</b>, Blake Griffith</p> </li>
<li><p class="talk-header"><a href="./pages/talks/scipy2015-dask/">Dask Out of core NumPy and Pandas through Task Scheduling</a>
<br><b>SciPy 2015</b>, James Crist</p> </li>
</ul>
</div>
<div class="row">
<a class="button button-primary button-more" href="./pages/talks">More</a>
</div>
</section>
</div>
<br>
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-67320551-1', 'auto');
ga('send', 'pageview');
</script>
<!-- End Document
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
</body>
</html>