CLML Datasets Tutorial

(C) 2015 Mike Maul -- CC-BY-SA 3.0

This document is part series of tutorials illustrating the use of CLML.

Datasets, what and why

CLML datasets are two dimensional tabular data structures. In CLML datasets are used for (not to sound recursive) storing datasets. Datasets may contain numerical and categorical data. Datasets also contain column metadata (dimensions) and also provide facilities for extracting columns, dataset cleaning and splitting. Datasets in CLML are similar to dataframes in R or Pandas.DataFrames in Python.

Lets get started by loading the system necessary for this tutorial and creating a namespace to work in.

In [1]:
(ql:quickload '(:clml.utility ; Need clml.utility.data to get data from the net
                :clml.hjs ; Need clml.hjs.read-data for dataset
                :iolib
                :clml.extras.eazy-gnuplot
                :eazy-gnuplot
            ))
To load "clml.utility":
  Load 1 ASDF system:
    clml.utility

; Loading "clml.utility"
....
To load "clml.hjs":
  Load 1 ASDF system:
    clml.hjs

; Loading "clml.hjs"

To load "iolib":
  Load 1 ASDF system:
    iolib

; Loading "iolib"
.....
To load "clml.extras.eazy-gnuplot":
  Load 1 ASDF system:
    clml.extras.eazy-gnuplot

; Loading "clml.extras.eazy-gnuplot"

To load "eazy-gnuplot":
  Load 1 ASDF system:
    eazy-gnuplot

; Loading "eazy-gnuplot"

Out[1]:
(:CLML.UTILITY :CLML.HJS :IOLIB :CLML.EXTRAS.EAZY-GNUPLOT :EAZY-GNUPLOT)
In [2]:
(defpackage #:datasets-tutorial
  (:use #:cl
        #:cl-jupyter-user ; Not needed unless using iPython notebook
        #:clml.hjs.read-data
        #:clml.hjs.meta ; util function
        #:clml.extras.eazy-gnuplot))
Out[2]:
#<PACKAGE "DATASETS-TUTORIAL">
In [3]:
(in-package :datasets-tutorial)
Out[3]:
#<PACKAGE "DATASETS-TUTORIAL">

Lets load some data that we will use as we learn about datasets.

In [4]:
(defparameter dataset (read-data-from-file 
                       (clml.utility.data:fetch "https://mmaul.github.io/clml.data/sample/cars.csv") 
                       :type :csv :csv-type-spec '(integer integer)))
Out[4]:
DATASET

Data and Datasets

CLML has a number of different specializations of datasets such as

  • unspecialized-dataset untyped and unspecialized data
  • numeric-dataset dataset containing numeric (double-float) data
  • category-dataset dataset for categorical (string) data
  • numeric-and-category-dataset dataset containing a mixture of numeric and categorical data
  • numeric-matrix-dataset dataset where numeric values are stored as a matrix
  • numeric-matrix-and-category-dataset dataset where numeric values are stored as a matrix as well as having categorical data

Data representation

All datasets except the matrix datasets represent data as a vector of vectors. The inner vector contains the columns of each row. For datasets with categories, numeric and category data are stored in seperate vectors.

We can see below how the data is represented.

In [5]:
(dataset-points dataset)
Out[5]:
#(#(4 2) #(4 10) #(7 4) #(7 22) #(8 16) #(9 10) #(10 18) #(10 26) #(10 34)
  #(11 17) #(11 28) #(12 14) #(12 20) #(12 24) #(12 28) #(13 26) #(13 34)
  #(13 34) #(13 46) #(14 26) #(14 36) #(14 60) #(14 80) #(15 20) #(15 26)
  #(15 54) #(16 32) #(16 40) #(17 32) #(17 40) #(17 50) #(18 42) #(18 56)
  #(18 76) #(18 84) #(19 36) #(19 46) #(19 68) #(20 32) #(20 48) #(20 52)
  #(20 56) #(20 64) #(22 66) #(23 54) #(24 70) #(24 92) #(24 93) #(24 120)
  #(25 85))

It may not be convenient to display the whole dataset to take a look at is. We could have used subseq but there is a helper method called head-points.

In [6]:
(head-points dataset)
Out[6]:
#(#(4 2) #(4 10) #(7 4) #(7 22) #(8 16))

Dimensions

All Datasets have the dimensions slot which contain the column metadata. The dimensions slot contains a list of dimension instances. Each dimension instance contains the following slots (accessor prefix is dimension):

  • name column name
  • type type of data in column (e.g. :category :numeric :unknown)
  • index index on column vectors of column
  • metadata - alist that CAN containing useful information, such as equality tests for category data
In [7]:
(dataset-dimensions dataset)
Out[7]:
#(#<CLML.HJS.READ-DATA::DIMENSION NAME: speed, TYPE: UNKNOWN, INDEX: 0.>
  #<CLML.HJS.READ-DATA::DIMENSION NAME: distance, TYPE: UNKNOWN, INDEX: 1.>)

Creating datasets

Datasets can be created directly or can be created by reading them from a file. Supported data formats or CSV and SEXP. Earlier we used the read-data-from-file function to read a dataset from a CSV file. The file in this case is a file that is obtained with the fetch from the clml.utility system, which downloads and caches a file in a location on a local files system or a URL. Datasets can also be created programatically.

In [11]:
(make-numeric-and-category-dataset  
 '("cat 1" "num 1")                           ; <-- Column names 
 (vector (v2dvec #(1.0d0)) (v2dvec #(2.0d0))) ; <-- Numeric data 
          '(1)                                ; <-- Indexes of numeric column
          #(#("a") #("b"))                    ; <-- Category Data
          '(0)                                ; <-- Indexes of category data
)
Out[11]:
#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: cat 1 | num 1
TYPES:      CATEGORY | NUMERIC
NUMBER OF DIMENSIONS: 2
CATEGORY DATA POINTS: 2 POINTS
NUMERIC DATA POINTS: 2 POINTS

Specializing datasets

The dataset we loaded is currently unspecialized, we haven't told CLML much about it yet. We can use the pick-and-specialize-data method to fill in the details.

In [8]:
(pick-and-specialize-data dataset :data-types '(:numeric :numeric))
Out[8]:
#<NUMERIC-DATASET >
DIMENSIONS: speed | distance
TYPES:      NUMERIC | NUMERIC
NUMBER OF DIMENSIONS: 2
NUMERIC DATA POINTS: 50 POINTS

We can see pick-and-specialize-data returned a numeric dataset based on the supplied :data-types specification. pick-and-specialize-data has two parameters :range and :except. Both parameters deal with column selection :range specifies a range of columns (as a list) to use in our new dataset while :except specifies a list of columns to exclude from our new dataset. We had also mentioned the matrix datasets, pick-and-specialize-data can also change the representation from a vector of vectors to a matrix.

In [56]:
(let ((ds (pick-and-specialize-data dataset :data-types '(:numeric :numeric) 
           :store-numeric-data-as-matrix t)))
           (print ds)
           (dataset-numeric-points ds))
#<NUMERIC-MATRIX-DATASET >
DIMENSIONS: speed | distance

TYPES:      NUMERIC | NUMERIC

NUMBER OF DIMENSIONS: 2

NUMERIC-MATRIX DATA POINTS: 50 POINTS
 
Out[56]:
#2A((4.0d0 2.0d0)
    (4.0d0 10.0d0)
    (7.0d0 4.0d0)
    (7.0d0 22.0d0)
    (8.0d0 16.0d0)
    (9.0d0 10.0d0)
    (10.0d0 18.0d0)
    (10.0d0 26.0d0)
    (10.0d0 34.0d0)
    (11.0d0 17.0d0)
    (11.0d0 28.0d0)
    (12.0d0 14.0d0)
    (12.0d0 20.0d0)
    (12.0d0 24.0d0)
    (12.0d0 28.0d0)
    (13.0d0 26.0d0)
    (13.0d0 34.0d0)
    (13.0d0 34.0d0)
    (13.0d0 46.0d0)
    (14.0d0 26.0d0)
    (14.0d0 36.0d0)
    (14.0d0 60.0d0)
    (14.0d0 80.0d0)
    (15.0d0 20.0d0)
    (15.0d0 26.0d0)
    (15.0d0 54.0d0)
    (16.0d0 32.0d0)
    (16.0d0 40.0d0)
    (17.0d0 32.0d0)
    (17.0d0 40.0d0)
    (17.0d0 50.0d0)
    (18.0d0 42.0d0)
    (18.0d0 56.0d0)
    (18.0d0 76.0d0)
    (18.0d0 84.0d0)
    (19.0d0 36.0d0)
    (19.0d0 46.0d0)
    (19.0d0 68.0d0)
    (20.0d0 32.0d0)
    (20.0d0 48.0d0)
    (20.0d0 52.0d0)
    (20.0d0 56.0d0)
    (20.0d0 64.0d0)
    (22.0d0 66.0d0)
    (23.0d0 54.0d0)
    (24.0d0 70.0d0)
    (24.0d0 92.0d0)
    (24.0d0 93.0d0)
    (24.0d0 120.0d0)
    (25.0d0 85.0d0))

We should also show an example of a dataset with categories.

In [10]:
(pick-and-specialize-data (read-data-from-file 
                           (clml.utility.data:fetch "https://mmaul.github.io/clml.data/sample/UKgas.sexp"))
     :data-types '(:category :numeric))
Out[10]:
#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas
TYPES:      CATEGORY | NUMERIC
NUMBER OF DIMENSIONS: 2
CATEGORY DATA POINTS: 108 POINTS
NUMERIC DATA POINTS: 108 POINTS

Datasets can be created and combined. Generally the dataset creation methods take the form of make-<dataset type> and either take vectors containing data or other datasets and create a new dataset.

Missing Values

CLML datasets support missing values. Missing values are represented as follows in the dataset-points:

  • category CLML.HJS.MISSING-VALUE:+C-NAN+
  • numeric CLML.HJS.MISSING-VALUE:+NAN+
  • unspecialized :NA

There are also the following predicates available to detect missing values:

  • CLML.HJS.MISSING-VALUE:C-NAN-P
  • CLML.HJS.MISSING-VALUE:NAN-P

The read-data-from-file also supports the mapping representations of missing values in data files to datasets. The missing-values-list keyword argument specifies the character sequences that will be recognized as missing values.

To illustrate missing values support lets read in a CSV file containing the follow:

a,   b,   c
1.0, 2.0, x
NA,  3.0, NA

Here missing values are represented in the CSV file by NA. For the read-data function to recognize the missing values we must set the :missing-values-list parameter as shown below:

In [12]:
(let ((ds (read-data-from-file 
            (clml.utility.data:fetch "https://mmaul.github.io/clml.data/sample/simple1.csv") 
            :type :csv 
            :csv-type-spec '(double-float double-float string) 
            :missing-values-list '("NA")
          )))
            (format nil "~A~%~A~%" ds (dataset-points ds)))
Out[12]:
"#<UNSPECIALIZED-DATASET >
DIMENSIONS: a | b | c
TYPES:      UNKNOWN | UNKNOWN | UNKNOWN
NUMBER OF DIMENSIONS: 3
DATA POINTS: 2 POINTS

#(#(1.0d0 2.0d0 x) #(NA 3.0d0 NA))
"

We can also see how missing values are represented in a specialized dataset:

In [13]:
(let ((ds (pick-and-specialize-data
             (read-data-from-file 
               (clml.utility.data:fetch "https://mmaul.github.io/clml.data/sample/simple1.csv") 
               :type :csv 
               :csv-type-spec '(double-float double-float string) 
               :missing-values-list '("NA")
              )
              :data-types '(:numeric :numeric :category)
           )))
     (format nil "~A~%~A~%~A~%" ds  (dataset-numeric-points ds) (dataset-category-points ds))
)
Out[13]:
"#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: a | b | c
TYPES:      NUMERIC | NUMERIC | CATEGORY
NUMBER OF DIMENSIONS: 3
CATEGORY DATA POINTS: 2 POINTS
NUMERIC DATA POINTS: 2 POINTS

#(#(1.0d0 2.0d0) #(#<DOUBLE-FLOAT quiet NaN> 3.0d0))
#(#(x) #(0))
"

Dataset manipulation

The following operations can be preformed on datasets:

  • copying
  • splitting and sampling
  • cleaning
  • shuffling
  • deduplication
  • storing

We will use the UK Gas dataset to illustrate these operations.

In [14]:
(defparameter ukgas (pick-and-specialize-data (read-data-from-file 
                           (clml.utility.data:fetch "https://mmaul.github.io/clml.data/sample/UKgas.sexp"))
                           :data-types '(:category :numeric)))
Out[14]:
UKGAS
Copying

The simplest operation is copying. copy-dataset makes a deep copy of the contents of a dataset.

In [15]:
(copy-dataset ukgas)
Out[15]:
#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas
TYPES:      CATEGORY | NUMERIC
NUMBER OF DIMENSIONS: 2
CATEGORY DATA POINTS: 108 POINTS
NUMERIC DATA POINTS: 108 POINTS
Splitting and Sampling

Datasets can be subdivided by two similar methods make-bootstrap-sample-dataset and divide-dataset

The divide dataset returns a dataset split into two parts based upon the :divide-ratio like pick-and-specialize-data divide-dataset can limit the section values accessed with the :range and :except parameters. It can also pull values in a pseudo-random manner values in to their new datasets.

In [17]:
(multiple-value-list (divide-dataset ukgas :divide-ratio '(3 1) :random t))
Out[17]:
(#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas

TYPES:      CATEGORY | NUMERIC

NUMBER OF DIMENSIONS: 2

CATEGORY DATA POINTS: 81 POINTS

NUMERIC DATA POINTS: 81 POINTS

 #<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas

TYPES:      CATEGORY | NUMERIC

NUMBER OF DIMENSIONS: 2

CATEGORY DATA POINTS: 27 POINTS

NUMERIC DATA POINTS: 27 POINTS
)

make-bootstrap-sample-datasets on the other hand shuffles a dataset into a number of specified datasets of equal length to the original dataset. The :number-of-datasetsparameter defaults to 10.

In [18]:
(make-bootstrap-sample-datasets ukgas :number-of-datasets 3)
Out[18]:
(#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas

TYPES:      CATEGORY | NUMERIC

NUMBER OF DIMENSIONS: 2

CATEGORY DATA POINTS: 108 POINTS

NUMERIC DATA POINTS: 108 POINTS

 #<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas

TYPES:      CATEGORY | NUMERIC

NUMBER OF DIMENSIONS: 2

CATEGORY DATA POINTS: 108 POINTS

NUMERIC DATA POINTS: 108 POINTS

 #<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas

TYPES:      CATEGORY | NUMERIC

NUMBER OF DIMENSIONS: 2

CATEGORY DATA POINTS: 108 POINTS

NUMERIC DATA POINTS: 108 POINTS
)
Cleaning

One nice features of CLML is the dataset cleaning capabilities. The dataset-cleaning method provides the following:

  • outlier detection for numeric points
    • standard deviation
    • mean deviation
    • user provided function
    • smirnov-grubbs
    • outlier detection for categorical points
    • frequency based
    • User provided function
  • Outlier and missing value interpolation using:
    • zero
    • min
    • max
    • mean
    • median
    • mode
    • spline

To illustrate we will preform dataset cleaning where outliers will be points that exceed 1 standard deviation and will be replaced by zero.

In [19]:
(dataset-cleaning ukgas :outlier-types-alist '(("UKgas" . :std-dev)) 
                         :outlier-values-alist '((:std-dev . 1)) 
                         :interp-types-alist '(("UKgas" . :zero)))
Out[19]:
#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas
TYPES:      CATEGORY | NUMERIC
NUMBER OF DIMENSIONS: 2
CATEGORY DATA POINTS: 108 POINTS
NUMERIC DATA POINTS: 108 POINTS
Adding dimensions and Concatenating Datasets
Adding dimensions

In some cases you may want to add a computed column or add a column to a dataset to hold the product of a computation on a dataset. The add-dim method can accomplish this easily. It can add an existing column of points with the :points parameter, it can also create a column with points filled with a initial value with the :initial-value parameter. The two mandatory parameters are the dataset to add the dimension to, the name of the new dimension and the type. If the dataset is either a category or numeric only dataset add-dim will create a numeric-and-category-dataset if a column of a different type is added.

In [20]:
(add-dim ukgas "mpg" :numeric :initial-value 0.0d0)
Out[20]:
#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas | mpg
TYPES:      CATEGORY | NUMERIC | NUMERIC
NUMBER OF DIMENSIONS: 3
CATEGORY DATA POINTS: 108 POINTS
NUMERIC DATA POINTS: 108 POINTS
Concatenating datasets

Two datasets with equal numbers of rows can be concatenated or glued together vertically. concatenate-datasets takes two datasets as parameters and return a dataset with the points of the first dataset stacked on top of the points of the second dataset. The dimension name names of the first one dataset are retained in the new dataset.

In [21]:
(concatenate-datasets ukgas ukgas)
Out[21]:
#<NUMERIC-AND-CATEGORY-DATASET >
DIMENSIONS: year season | UKgas
TYPES:      CATEGORY | NUMERIC
NUMBER OF DIMENSIONS: 2
CATEGORY DATA POINTS: 216 POINTS
NUMERIC DATA POINTS: 216 POINTS
Deduplicating datasets

Datasets can be deduplicated in place with the dedup-dataset! method. This functionality is currently only implemented for numeric, category and unspecialized datasets.

Storing datasets

Datasets can be saved to a file in csv format with the write-dataset method.

In [24]:
(write-dataset ukgas "gasgas.csv")
Out[24]:
(CLML.UTILITY.CSV::OK)

Working with dataset points

Columns and values can be accessed and extracted from datasets using the !! macros. This macro returns the column name or list of column names as a vectors of vectors if multiple column names are specified or as a single vector if a single column name is specified.

In [23]:
(!! ukgas "UKgas")
Out[23]:
#(160.1d0 129.7d0 84.8d0 120.1d0 160.1d0 124.9d0 84.8d0 116.9d0 169.7d0 140.9d0
  89.7d0 123.3d0 187.3d0 144.1d0 92.9d0 120.1d0 176.1d0 147.3d0 89.7d0 123.3d0
  185.7d0 155.3d0 99.3d0 131.3d0 200.1d0 161.7d0 102.5d0 136.1d0 204.9d0
  176.1d0 112.1d0 140.9d0 227.3d0 195.3d0 115.3d0 142.5d0 244.9d0 214.5d0
  118.5d0 153.7d0 244.9d0 216.1d0 188.9d0 142.5d0 301.0d0 196.9d0 136.1d0
  267.3d0 317.0d0 230.5d0 152.1d0 336.2d0 371.4d0 240.1d0 158.5d0 355.4d0
  449.9d0 286.6d0 179.3d0 403.4d0 491.5d0 321.8d0 177.7d0 409.8d0 593.9d0
  329.8d0 176.1d0 483.5d0 584.3d0 395.4d0 187.3d0 485.1d0 669.2d0 421.0d0
  216.1d0 509.1d0 827.7d0 467.5d0 209.7d0 542.7d0 840.5d0 414.6d0 217.7d0
  670.8d0 848.5d0 437.0d0 209.7d0 701.2d0 925.3d0 443.4d0 214.5d0 683.6d0
  917.3d0 515.5d0 224.1d0 694.8d0 989.4d0 477.1d0 233.7d0 730.0d0 1087.0d0
  534.7d0 281.8d0 787.6d0 1163.9d0 613.1d0 347.4d0 782.8d0)

Dataset points can also be accessed with the slot accessor. Since category and numeric data are stored separately in heterogeneous datasets separate accessors are used to access the points.

The list below shows which methods are applicable to the dataset type.

  • dataset-points: unspecialized-dataset
  • dataset-numeric-points: numeric-dataset numeric-and-category-dataset numeric-matrix-dataset numeric-matrix-and-category-dataset
  • dataset-category-points: category-dataset numeric-and-category-dataset numeric-matrix-dataset numeric-matrix-and-category-dataset
In [25]:
(dataset-numeric-points ukgas)
Out[25]:
#(#(160.1d0) #(129.7d0) #(84.8d0) #(120.1d0) #(160.1d0) #(124.9d0) #(84.8d0)
  #(116.9d0) #(169.7d0) #(140.9d0) #(89.7d0) #(123.3d0) #(187.3d0) #(144.1d0)
  #(92.9d0) #(120.1d0) #(176.1d0) #(147.3d0) #(89.7d0) #(123.3d0) #(185.7d0)
  #(155.3d0) #(99.3d0) #(131.3d0) #(200.1d0) #(161.7d0) #(102.5d0) #(136.1d0)
  #(204.9d0) #(176.1d0) #(112.1d0) #(140.9d0) #(227.3d0) #(195.3d0) #(115.3d0)
  #(142.5d0) #(244.9d0) #(214.5d0) #(118.5d0) #(153.7d0) #(244.9d0) #(216.1d0)
  #(188.9d0) #(142.5d0) #(301.0d0) #(196.9d0) #(136.1d0) #(267.3d0) #(317.0d0)
  #(230.5d0) #(152.1d0) #(336.2d0) #(371.4d0) #(240.1d0) #(158.5d0) #(355.4d0)
  #(449.9d0) #(286.6d0) #(179.3d0) #(403.4d0) #(491.5d0) #(321.8d0) #(177.7d0)
  #(409.8d0) #(593.9d0) #(329.8d0) #(176.1d0) #(483.5d0) #(584.3d0) #(395.4d0)
  #(187.3d0) #(485.1d0) #(669.2d0) #(421.0d0) #(216.1d0) #(509.1d0) #(827.7d0)
  #(467.5d0) #(209.7d0) #(542.7d0) #(840.5d0) #(414.6d0) #(217.7d0) #(670.8d0)
  #(848.5d0) #(437.0d0) #(209.7d0) #(701.2d0) #(925.3d0) #(443.4d0) #(214.5d0)
  #(683.6d0) #(917.3d0) #(515.5d0) #(224.1d0) #(694.8d0) #(989.4d0) #(477.1d0)
  #(233.7d0) #(730.0d0) #(1087.0d0) #(534.7d0) #(281.8d0) #(787.6d0)
  #(1163.9d0) #(613.1d0) #(347.4d0) #(782.8d0))

A little something extra, R-datasets

One thing that I've always found handy in R is a standard, curated, extensive and documented series of datasets. Wouldn't it be nice to have access to these directly as datasets in CLML. The R-datasets system in clml.extras provides this capability. A particularly good use case for these datasets is to be able to follow along with examples and tutorials for R in CLML.

Note

The clml.extras systems are not currently part of quicklisp so if you are following along with this tutorial and are expecting just to (quickload :clml.extras.Rdatasets) you can't till you clone the clml.extras repository http://github.com/mmaul/clml.extras.git into your quicklisp/local-projects directory

Using Rdatasets

The Rdatasets package makes datasets included with the R language distribution available as clml datasets. R datasets are obtained csv files on Vincent Centarel's github repository. More information on these datasets can be found at http://vincentarelbundock.github.com/Rdatasets

Because type information is not included it may be necessary to provide a csv-type-spec for the columns in the csv file.

In [49]:
(ql:quickload :clml.r-datasets)
To load "clml.r-datasets":
  Load 1 ASDF system:
    clml.r-datasets

; Loading "clml.r-datasets"

Out[49]:
(:CLML.R-DATASETS)
In [27]:
(use-package :clml.r-datasets)
Out[27]:
T
In [28]:
(defparameter dd (get-r-dataset-directory))
Out[28]:
DD
In [54]:
(subseq (inventory dd :stream nil) 0 505)
Out[54]:
"Package                   Item                      Title                     
------------------------- ------------------------- ------------------------- 
datasets                  AirPassengers             Monthly Airline Passenger Numbers 1949-1960 

datasets                  BJsales                   Sales Data with Leading Indicator 

datasets                  BOD                       Biochemical Oxygen Demand 

datasets                  Formaldehyde              Determination of Formaldehyde"
In [57]:
(subseq (dataset-documentation  dd  "datasets" "BOD" :stream nil) 0 200)
Out[57]:
"\"
R: Biochemical Oxygen Demand




BODR Documentation

 Biochemical Oxygen Demand 

Description

The BOD data frame has 6 rows and 2 columns giving the
biochemical oxygen demand versus time in an eval"
In [38]:
(defparameter bod (get-dataset dd "datasets" "BOD" :csv-type-spec '(double-float double-float double-float)))
Out[38]:
BOD
In [39]:
bod
Out[39]:
#<UNSPECIALIZED-DATASET >
DIMENSIONS:  | Time | demand
TYPES:      UNKNOWN | UNKNOWN | UNKNOWN
NUMBER OF DIMENSIONS: 3
DATA POINTS: 6 POINTS
In [40]:
(pick-and-specialize-data bod :data-types '(:numeric :numeric :numeric))
Out[40]:
#<NUMERIC-DATASET >
DIMENSIONS:  | Time | demand
TYPES:      NUMERIC | NUMERIC | NUMERIC
NUMBER OF DIMENSIONS: 3
NUMERIC DATA POINTS: 6 POINTS

Conclusion

The iPython notebook and source for this tutorial can be found in the clml.tutorials https://github.com/mmaul/clml.tutorials.git github repository.

Stay tuned to clml.tutorials blog or RSS feed for more CLML tutorials..