Data Analytics Reference Stack
This guide explains how to use the DARS, and to optionally build your own DARS container image.
Any system that supports Docker* containers can be used with DARS. This steps in this guide use Clear Linux* OS as the host system.
The Data Analytics Reference Stack release
The Data Analytics Reference Stack (DARS) provides developers and enterprises a straightforward, highly optimized software stack for storing and processing large amounts of data. More detail is available on the DARS architecture and performance benchmarks.
The Data Analytics Reference Stack provides two pre-built Docker images, available on Docker Hub:
- A Clear Linux OS-derived DARS with OpenBlas stack optimized for OpenBLAS
- A Clear Linux OS-derived DARS with Intel® MKL stack optimized for MKL
We recommend you view the latest component versions for each image in the
README found in the Data Analytics Reference Stack GitHub*
repository. Because Clear Linux OS is a rolling distribution, the package version numbers
in the Clear Linux OS-based containers may not be the latest released by Clear Linux OS.
注解
The Data Analytics Reference Stack is a collective work, and each piece of software within the work has its own license. Please see the DARS Terms of Use for more details about licensing and usage of the Data Analytics Reference Stack.
Using the Docker images
To immediately start using the latest stable DARS images, pull an image directly from Docker Hub. This example uses the DARS with Intel® MKL Docker image.
Once you have downloaded the image, you can run it with
docker run -it --ulimit nofile=1000000:1000000 --name mkl <name of image>
This will launch the image and drop you into a bash shell inside the container. You will see output similar to the following:
root@fd5155b89857 /root # spark-shell spark-shell Config directory: /usr/share/defaults/spark/ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0 /_/ Using Scala version 2.12.7 (OpenJDK 64-Bit Server VM, Java 1.8.0-internal) Type in expressions to have them evaluated. Type :help for more information. scala>
The --ulimit nofile parameter is currently required in order to increase the number of open files opened at certain point by the spark engine.
Building DARS images
If you choose to build your own DARS container images, you can customize them as needed. Use the provided Dockerfile as a baseline.
To construct images with Clear Linux OS, start with a Clear Linux OS development platform that has the containers-basic-dev bundle installed. Learn more about bundles and installing them by using swupd.
Clone the Data Analytics Reference Stack GitHub* repository.
git clone https://github.com/clearlinux/dockerfiles/tree/master/stacks/dars -b master
Inside the DARS directory, run make to build OpenBLAS and MKL images.
make
Run make baseline to build the baseline CentOS image. Depending on the system, it may take a while to finish building.
make baseline
Once completed, check the resulting images with Docker
docker images | grep darsYou can use any of the resulting images to launch fully functional containers. If you need to customize the containers, you can edit the provided
Dockerfile.