I like Python and I like Jupyter Notebook very much. I’m not a typical programmer but I use this framework frequently as an replacement for Excel. I got to know some basics of Pandas, Matplotlib and Plotly – they are great tools for data processing and visualization. But they are developed with great speed which causes troubles when you want to keep your working environment up-to-date. It is quite easy to break something by installing some fancy-but-not-well-tested plugin or package.
Again, docker is our ally. It is not only the way to avoid troubles with dependencies but also good platform to present examples on blog because they become easy to reproduce on other environments. So, here is short description how I run Jupyter Notebook inside the docker:
Create docker image
I use official Anaconda3 image which is available in the docker registry. Anaconda is company which maintains and support entire stack of Python and R packages used in Data Science – on most modern operating systems. Here is simple Dockerfile I use for now:
FROM continuumio/anaconda3 EXPOSE 8888/tcp RUN /opt/conda/bin/conda install jupyter pandas-datareader -y RUN pip install plotly cufflinks
Building image is very simple. Lets create image called ppp:
michal@sunman:~$ docker build -t ppp:latest .
Prepare directories
To be able to easily exchange files between my host and docker – I create some directories on my host:
michal@sunman:~$ mkdir /export/docker/in michal@sunman:~$ mkdir /export/docker/out michal@sunman:~$ mkdir /export/docker/notebooks
Start container
michal@sunman:~$ docker run -it --rm \ --name "ppp" \ -p 127.0.0.1:8888:8888 \ -v /export/docker/notebooks:/notebooks \ -v /export/docker/in:/in \ -v /export/docker/out:/out \ ppp \ /bin/bash -c "jupyter notebook --notebook-dir=/notebooks --NotebookApp.token='' --ip='0.0.0.0' --allow-root --no-browser"
Container is started interactively (-it
) and will be removed completely after it finishes (--rm
). I gave it simple name ppp – same as image used to create it. I also mount directories from my host and start Jupyter Notebook inside it. The notebook is started with empty token (normally unique one is generated) – but because I bind the docker only to my localhost address – I disable this security feature. Eliminating token allows me to blindly use http://127.0.0.1:8888 address to access it from host instead of copying generated token into my browser.
Verify container by opening the browser at http://127.0.0.1:8888
And now you can start working on new notebook. Below is simple example of such work showing plot with Oracle stock closing prices:
import pandas_datareader.data as pddr
import cufflinks as cf
cf.go_offline()
%matplotlib inline
orcl=pddr.DataReader('ORCL', 'stooq')
orcl.Close.iplot()
Summary
This post is not related to Oracle (ok, stock plot is related ;> ). But soon (hopefully), I’m going to prepare some examples of using Jupyter Notebook to analyze and visualize performance data. So this post is kind of preparation for it. You can find all files necessary to run examples here.