Command Line Machine Learning - Gnu Octave
In my previous note, I compiled a C program that generates a dataset for a simple classification problem. More complex patterns can be obtained with this MATLAB/Octave function.
It's rather straightforward to turn this function into a script that I can call from the command line. I cloned this repository and then added at the top and at the bottom of generateData.m file the code below.
#! /c/Users/Seve/workplace/octave-6.4.0-w64/mingw64/bin/octave-cli.exe -qf
% here's the path to the octave-cli.exe
% see https://tessarinseve.pythonanywhere.com/nws/2022-01-10.wiki.html
arg_list = argv ();
if nargin>9
disp("Too many input parameters");
quit;
else
angleMean =str2num(arg_list{1});
angleStd =str2num(arg_list{2});
numClusts =str2num(arg_list{3});
xClustAvgSep =str2num(arg_list{4});
yClustAvgSep =str2num(arg_list{5});
lengthMean = str2num(arg_list{6});
lengthStd = str2num(arg_list{7});
lateralStd = str2num(arg_list{8});
totalPoints = str2num(arg_list{9});
endif
%generateData.m
...
[data cp idx] = generateData(pi/angleMean,pi/angleStd,numClusts,xClustAvgSep,yClustAvgSep, ...
lengthMean,lengthStd,lateralStd,totalPoints);
numpoints = sum(cp);
for i = 1:numpoints
printf("%f, %f, %d\n", data(i,1), data(i,2), idx(i));
endfor
The generated 2D clusters are streamed to standard output and then redirected to the csvgui.py script. The standard error stream, eventual GNU Octave warnings, can be suppressed by redirecting the output to the stream termination file.
$ ./generateData.m 2 8 4 0.4 0.4 0.1 0.04 0.06 500 2>/dev/null | ./csvgui.py
PandasGui allows to visualise the data aggregated into four clusters, as shown below.
It's also easy to filter out the last column containing the index as follows:
$ ./generateData.m 2 8 4 0.4 0.4 0.1 0.04 0.06 500 2>/dev/null |awk -F"," '{print $1", " $2}'>testmlpack.csv
and save it into a comma-separated values file.
The number of centroids and their positions, unknown a priori, can be obtained with mlpack_mean_shift binary from the command line with:
$ ~/workplace/vimfastml/mlpack/mlpack_mean_shift.exe -i ./testmlpack.csv -C centroids.csv -m 5000 -v
Finally, this script shows the centroids (large red dots) above each group of data points.
# filename: showcentroids.py
import pandas as pd
import matplotlib.pylab as plt
df = pd.read_pickle("last.pkl")
centroids = pd.read_csv("./centroids.csv",header=None)
ax=df.plot.scatter(x = "x1",y = "x2",c = "c", colormap = 'viridis')
centroids.plot.scatter(ax = ax,x = 0,y = 1,s = 75,c = "r")
plt.show()