Introducing ZetaMachina.

Machine Learning in R

What is machine learning? Quote from Wikipedia, "Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence." In other words, it is using previously given data, to predict future outcomes of current data. This can help us understand correlation between two different events or ideas in the world.

There are multiple types of Machine Learning. There is regression and classification. In both types, we are training a model from given, previous data, and using that model to help us predict outcomes based on the testing data (Data we do not know the outcome to). Regressions is when we are given a numerical input and are expected to find an output. Classification is when we are given a categorical input and are expected to find an output. In this post, we will mainly focus on classification.

We have built ZetaMachina, an implementation for classification prediction in R. This post will not show the full code but please check the GitHub Repo for it.

In this project we created our own people data frame as a demo.

We create a female and male data frame and then merge it. The whole data frame includes people with three atrributes: their race, hair-color, and gender. The two input categorical variables are race and hair color, while the output variable is the gender another categorical.

female <- air="femaleHair," br="" data.frame="" gender="F" race="femaleRace,">people <- br="" female="" male="" rbind="">count <- 1:1000="" br="">ind <- 800="" count="" replace="FALSE)<br/" sample="">training <- br="" ind="" people="">testing <- br="" ind="" people="">

We have now create our people data frame as well as the training data for our model and our tesing data.

We then create our model using ZetaNaiveBayes. Naive Bayes assume that events are independent.

model <- ace="" air="" c="" code="" ender="" training="" zetanaivebayes="">

Now we create a prediction based on the model and our testing data.

prediction <- br="" model="" testing="" zetapredict="">prediction <- br="" factor="" levels="c(" prediction="">

If we print out the result we get our prediction:
#console
  [1] M M M M F M M M M M M M M M M M M M M M M F M F M M F M M M F M M M M F M M M M M M M M F M M M M M F M M M M M
 [57] M M M M M M M M M M M M M M M F M M M M M M M M M M M M F M F M M M M M M M M M M M M M F M F F F M F F F M M F
[113] F F F F F F M F M M F F F F M F F F F F F M F F M M F F F F F F M F F F M F F F M F F F F F F M F F F F F M F F
[169] F F F F M F F F M F F F M F F M M F F F F M M F F M F M F F F F
Levels: M F

If you enjoyed this project, visit our other projects on our GitHub Page

NumPy is a fundamental Python package used for scientific computing. Within the package, there are many useful features including n-dimensional arrays, element-wise operations, broadcasting functions, linear algebra, random number capabilities, etc. This tutorial provides some fundamental examples of those features in NumPy.

Create An Array with NumPy

In [26]:

import numpy as np #import from python libary
a = np.array([1, 2, 3, 4, 5])
print a
print a*2

Out [26]:

[1 2 3 4 5]
[ 2  4  6  8 10]

Create A One-Dimensional Array¶

In [27]:

a = np.arange(20) # one dimension
print a
print a.shape
print a.ndim
print a.dtype

Out[27]:

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
(20L,)
1
int32

Create A Two-Dimensional Array¶

In [28]:

a = np.array([[1, 2, 3], [4, 5, 6]]) # two dimensions
print a 
print a.shape
print a.ndim
print a.dtype

Out[28]:

[[1 2 3]
 [4 5 6]]
(2L, 3L)
2
int32

Create A Three-Dimensional Array¶

In [29]:

a = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) # three dimensions
print a
print a.shape
print a.ndim
print a.dtype

Out[29]:

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
(2L, 2L, 3L)
3
int32

How to Generate an Array from a Sequence¶

In [21]:

print np.arange(20, 30, 2) # return evenly space numbers over a specified interval.(integers)
print np.linspace(0, 1, 20) # return evenly space numbers over a specified interval.(float numbers)
print np.random.rand(5)# returns random numbers between 0 and 1

Out[21]:

[20 22 24 26 28]
[ 0.          0.05263158  0.10526316  0.15789474  0.21052632  0.26315789
  0.31578947  0.36842105  0.42105263  0.47368421  0.52631579  0.57894737
  0.63157895  0.68421053  0.73684211  0.78947368  0.84210526  0.89473684
  0.94736842  1.        ]
[ 0.05481854  0.83857248  0.31190602  0.30903261  0.17243025]

Indexing and Slicing¶

Indexing and Slicing are very important when dealing with vast amounts of data. This allows you to cut down the array and narrow it to the part of the data you want to analyze.

In [39]:

print a
print a[0:2, 0:2, 2]

Out[39]:

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
[[ 3  6]
 [ 9 12]]

Manipulating Array Shape¶

Array shape manipulation can help when needing to perform operations with other multi-dimensional arrays.

In [54]:

b = np.copy(a)
b.shape = (12L,)
print b

Out[54]:

[ 1  2  3  4  5  6  7  8  9 10 11 12]

Boolean Masking¶

Boolean Masking can help with cutting down an array given a specific condition. In this case the array "b" is a list of boolean values which are true when the value of a is a multiple of 5. "b" is then masked against "a" to get those values.

In [72]:

a = np.arange(0, 105) + 1
b = a%5==0
print a[b]

Out[72]:

[  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90
  95 100 105]

Element-wise Operations¶

Element wise operations are used to efficiently perform functional operations against each element of the array.

In [77]:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print a+b

Out[77]:

[5 7 9]

Matrix Multiplication¶

In [78]:

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8], [9, 10], [11, 12]])
a.dot(b)

Out[78]:

array([[ 58,  64],
       [139, 154]])

Logical Operations¶

In [86]:

c = a>2

d = a>8
print c
print d
print np.logical_and(c, d)

Out[86]:

[[False False  True]
 [ True  True  True]]
[[False False False]
 [False False False]]
[[False False False]
 [False False False]]

Basic Reductions¶

In [106]:

a.shape = (6L) # One dimension
print a
print np.sum(a)
print np.mean(a)
print np.std(a)
print np.size(a)

Out[106]:

[1 2 3 4 5 6]
21
3.5
1.70782512766
6

In [125]:

a = np.arange(0, 105) + 1
a.shape = (3L, 5L, 7L)
print a
print np.sum(np.sum(a, axis=2), axis=0)

Out[125]:

[[[  1   2   3   4   5   6   7]
  [  8   9  10  11  12  13  14]
  [ 15  16  17  18  19  20  21]
  [ 22  23  24  25  26  27  28]
  [ 29  30  31  32  33  34  35]]

 [[ 36  37  38  39  40  41  42]
  [ 43  44  45  46  47  48  49]
  [ 50  51  52  53  54  55  56]
  [ 57  58  59  60  61  62  63]
  [ 64  65  66  67  68  69  70]]

 [[ 71  72  73  74  75  76  77]
  [ 78  79  80  81  82  83  84]
  [ 85  86  87  88  89  90  91]
  [ 92  93  94  95  96  97  98]
  [ 99 100 101 102 103 104 105]]]

array([ 819,  966, 1113, 1260, 1407])

In [126]:

sum(range(1, 8)+range(36, 43)+range(71, 78))

Out[126]:

All of these examples shown above are the basics of operation with the NumPy library. It can greatly help a coder with efficiency while working with big data. For more information on NumPy, visit here.

Programming Fundamentals

Sunday, May 22, 2016

ZetaMachina - Easy Machine Learning in R