Bryan W. Lewis
I work with a very talented group of friends at a start-up called
Paradigm4 on SciDB,
a free and open source array-oriented database.
I prefer to forage, and I enjoy many mushrooms,
wild foods, and living simply.
Everyone working on scientific computing problems should consider using R
, a wonderfully powerful and expressive system for computation and visualization.
Send electronic mail to me at:
- Check out the new http://www.htmlwidgets.org/
framework for easy integration of web technologies and R.
web pages. And you can use htmlwidgets in knitr documents.
Three.js-based 3-d visualization widgets for R and Shiny using htmlwidgets:
- I've been learning about clustering methods recently. Here is a link
to a simple hierarchical clustering implementation (<50 lines) that is
written only in R so it's easy to experiment with:
The native R hclust function in the statistics package is faster,
but includes lots of Fortran code.
- I wrote up a trivially simple implementation of and examples illustrating
Gene Golub's SVD subset
selection algorithm. Mike and I are using it in one of our GLM implementations
(see below). But it's a cool method and deserves more attention.
Mike and I have been writing down our working notes on generalized linear
models. Still incomplete and a bit rough, but maybe interesting to somebody...
Our focus is on numerics and performance. See http://bwlewis.github.io/GLM and the
associated project https://github.com/bwlewis/GLM.
- This is cool: http://xkcd.r-forge.r-project.org/
an fun little interactive illustration of the Gerschgorin's circles and Brauer's
Please feel free to fork and use the code available on Github here:
- I asked some questions about ill-posed problems and regularization
at Kent State recently. Here are the slides:
The slides include a simple R program that
applies regularization to stock returns in order to cluster stocks
by a relevance network graph.
- I gave a talk with Jake VanderPlas about SciDB at PyData 2013 NYC. Here is a link to a Wakari notebook: http://goo.gl/ovGaHS
The Redis client for R was recently updated! R package here on CRAN:
Source code here on GitHub:
And the package vignette (PDF):
- Here are some relatively recent papers I really like:
Network analysis via partial spectral factorization and Gauss quadrature
In Search of an Understandable Consensus Algorithm
A Scalable Bootstrap for Massive Data
Quadrature Rule-Based Bounds for Functions of Adjacency Matrices
Augmented Implicitly Restarted Lanczos Bidiagonalization Methods
OK, those last two are not so new, but they're super-cool.
- I gave talk on tips and tricks for performance computing with R at the Cleveland R meet-up on Wednesday, August 7th. Here are the slides: http://goo.gl/gcPezs. Perhaps the most interesting part shows that it's pretty easy to install the commercial but freely available AMD BLAS and LAPACK libraries for R on Windows and Linux.
- I gave a talk at the Boston PyData conference (http://pydata.org/) about SciDB-Py -- Jake Vanderplas' new interface between SciDB and Python. The interface defines a numpy/scipy-like array class for Python backed by SciDB arrays. Install the package directly from GitHub with pip install git+ssh://github.com/jakevdp/scidb-py.git.
- I've just been reading Patrick Burns' book, http://www.burns-stat.com/documents/books/tao-te-programming/, and really enjoy it.
- I gave a talk on SciDB and R and Python at JSM on Sunday, August 4th. Here are the slides: http://goo.gl/A2RPkn.
- So you like Python muthaph*kkahz!?! You got it: https://github.com/bwlewis/irlbpy. This is the fastest
partial SVD and PCA routine for dense and sparse matrices available in Python.
restricted right now to real-valued matrices and is still under active
development. Mike Kane presented
our work in progress at the SciPy Conference next week in Austin June 24--28.
- I get a lot of questions about using the fast truncated SVD
irlba package, especially for large problems.
So, I've started a page of miscellaneous tips here:
Whit Armstrong and I ran a seminar on high performance computing with R at the
R/Finance conference in May.
We emphasized elastic computing using 0MQ and Redis with R,
and a bit of parallel linear algebra with SciDB. Here
are the slides we used:
doRedis.html, a parallel back end for the R language that uses Redis and foreach.
Here is the vignette documentation:
The irlba package for
R provides a state of the art fast partial singular value decomposition. It's
suitable for very large scale problems and supports sparse and dense matrices.
To give you an idea how fast it is, one can compute a five-dimensional
principal components analysis (PCA) on the Netflix data set
(480,189 user IDs and 17,770 movies) in a few minutes on a dual-core notebook
(using R's sparse Matrix package).
- My lightning talk on SciDB and R for the Boston R meetup on 22-Jan-2013: goo.gl/btioG.
- I gave a talk at JSM about R and websockets. Here it is:
And, here is a nifty application of websockets and R in quant. finance:
Here is a silly cool "chat" script for R using websockets (many web clients can share
a super basic R session):
Joe Cheng over at RStudio has taken over active development
of the package.
- Slides about the R bigmemory, parallel linear algebra in R, and a preview of what I'm working on with R and SciDB from a recent talk at the Boston R Meetup:
- One new idea and one old idea that should be better known on the SVD and cointegration
(from a recent talk at R/Finance 2012):
- If you need to document anything, you should really consider
- A data frame promise for R that very quickly extracts subsets directly from raw delimited text files:
A native HTML 5 Websocket library for R:
I discussed some methods other than Hadoop for analyzing large data with
the New York CTO club. My notes are available here:
R is popular!
If you like R, or think you might, you should check out
http://rstudio.org. I highly recommend it.
Outlaw talk: "The Betfair Package" at
R/Finance 2011: Applied Finance with R
Betfair is the world's largest betting exchange with more than three million
global clients. The BetfaiR package implements the Betfair Sports API in the R
language, providing direct access to the Betfair sports exchange from R. All
of the Betfair Sports API functions are available, including functions for real
time market data and user account access. The package also provides a number of
high-level functions for sports betting analysis, modeling and graphics.
This was the first talk I ever gave where running the examples live would
require breaking the law.
An experimental new Rserve binary R server desgined for improved functionality on Windows systems.
- Talk: "How good are Krylov methods for discrete ill-posed problems?," March 25--28 AMS meeting in Lexington, KY: http://www.ms.uky.edu/~corso/amsmaa2010/.
Here are some slides:
pvshm.html: A Linux filesystem that
provides a memory mapping overlay for PVFS2 or other file systems lacking
memory mapping capability.
esperr, a package for streaming event processing for R.
http://github.com/bwlewis/fls, an implementation of Kalaba-Tesfatsion flexible least squares method for R.
R4P, an R library for Processing.
is an R-language interface package to the DTN IQ Feed API.
I've been playing with Google APIs:
A Particularly Silly Dictionary
GNU/Linux utilities for IQFeed (download) iqfeedutils.tar.bz2. A set of basic utilities to get DTN IQFeed up and running in Windows-free GNU/Linux environments, as well as facilitate communication between Linux quant boxes and Windows IQFeed boxes. A new Redis-based utility is included that is very effective at processing level 1 market data.
bars: A companion stream processor for DTN IQ Feed that builds real-time minute bars from streaming market quote data.
Ratlab, tools for foolin' with R and Octave (or Matlab) together.
http://etna.math.kent.edu/vol.30.2008/pp128-143.dir/zeros/index.html A newer Java applet illustrating the dynamical motion of the zeros of the partial sums of the exponential function (from work with Richard Varga and Amos Carpenter).