delphi Fast Base64 encoding/decoding


The library utilizes AVX2 instructions to speed up the encoding and decoding process. The library was tested against the Delphi reference: Indy and achieved a speedup up to 10 times.

delphi artificial intelligence library


* Included flags that allow multithreaded routines that help speed up the SVM calculations.
* Extended the test app to reflect that.


bugfixes for the SVM classifier and the threaded descission stump learner.

updated some unit names (Fischer -> Fisher) and extended the decission tree classifier by a threaded model (+bugfixes)

updated quite a few confidence calculations

* Added confidence calculations for SVM, neural nets, LDA, naive Bayes, kmeans, RBF
* The test application now provides the option to show the confidence map by coloring the ground

updates to include the changed interface in the mrMath library

* SVD output is now different - it returns V transposed.
* The test routine now also show the usage of LDA for simple data sets.

updates on the neural network classifier

The momentum approach has been added to the neural network learner. In addition a certain percentage of the training set can now be selected to be a distinct validation set. This functionality has been moved into the base class so it may be used for other custom algorithms as well.

new neural network classifier

A simple feed forward neural network has been added to the AI library whic hincludes linear, tanh and exponential neuron activation. For the learning step a simple backpropagation aglorithm has been added.

The library now utilizes the new random engine provided by the mrMath library

new kmeans classifier

The kmeans classifier features normal or median update steps as well as kmeans++ initial center search.

Extended the Library by a Radial Basis Function classifier

The Radial Basis Classifier features different Kernels (Gauss, Quad, Inverse Multiquad, Multiquad) as well as different radial basis extractors. These are randomly selected subset from the learning set or mean/median of the class centers.

The first version if the Delphi AI library includes:

* Fisher Linear Discriminant Analaysis (FLDA) with an face recognition example
* FLDA in two different robust (against outlying pixels or areas) flavors.
* Incremental FLDA
* C4.5 learning algorithm
* Naive Bayes
* Decission Stumps
* Support Vector Machines with Least Squares and Lagragian Learning
* Ensemble Classification Algorithms: AdaBoost, GentleBoost and Bagging
* All the above classifiers may be used in the Ensemble classification tasks.
* Features Extractors: Haar1D, Haar2D and the Integral Image approach.
* A simple version of the Viola Jones Face detection algorithm is avail as unit test.
* Persistence for all classifiers.

You can download the library from the download section or fetch it from github
Thats all for now - if you like the library of course you may drop a donation to the author ;)

delphi image processing library

active appearance models framework

* Active Appearance Models based on the paper of Cootes et al.
* Pyramid approach for better stability
* RGB Color models are optional
* A simple image annotation tool included.
* Testing application to show how to use and build AAM's.
* Linear and thin plate splines image warping.
* Models can be saved and loaded from files.

adjustments for the mrAI library

I needed to made a few mor adjustments to fit the needs for the mrAI library. There is also a neat image filter class

A new image processing library has arrived

The new image processing library adds a few small but nice features to the mrMath mathematical library. It includes a simple conversion routine from and (to Delphi's TPicture known) input graphic format and coverts it from RGB to either gray scale images or RGB planes in the mrMath's matrix format/class (and vice verse)
Based on that format the library implements a few nice image transformation algorithms e.g. to allow warping images. There can be chosen by either a simploe triangulation warping method or one more sphisticated method based on Thin Plate Splines which generally creates smoother warping results but needs more computational power. Note that there exists an 32bit assembler implementation of the warping method based on triangles which makes it lightning fast!
As a neat add on there is Delauny triangulation class and a class fro procustres analysis which is helpfully to determine the global affine transformation properties from one point cloud to another.
To get a feeling of the libraries capabilities check out the attached unit test project which nicely shows most of the implemented features.

delphi matrix library

New rolling median functions

Besides of bugfixes there is a new rolling median function implemented that works like a moving average but rather use the median instead of the mean.

changes in the symmetric Eigenvalue problem

The routines were changed such that the blocke wise lapack algorithm is now used. Including SSE/AVX optimized routines and multithreaded operations that speed up the calculations.


A new routine for Hessenberg decomposition is available as well as a ChaCha based random number generator.

* Hessenberg decomposition using optimized matrix operations
* Multithreaded version of that decomposition
* 3 versions of ChaCha (Salsa20): pure Pascal, SSE, AVX. Note the AVX version needs an AVX2 compatible processor and SSE needs an SSE3 compatible one. * The AVX version is capable to create 2 ChaCha matrices in one go.

New subspace method: Singular spectrum analysis

In conjunction with the new method there are a few new sub matrix selector routines available.

A new thread pool model based on IO Completion Ports

added tsne algorithm for matrices

The library now includes the base (exact) t-Distributed Stochastic Neighbor Embedding algorithm in an extra class.

added B-Spline support and expectation maximization

* Added a new class that calculates B-splines in a robust way.
* Periodic extension support
* Least squares approach: the input may be overdetermined.
* Added the expectation maximization algorithm.
* Added demo applications for EM and spline algorithms.

new assembler optimization -> FMA instruction set support

* Where appropriate I added FMA (Fused Multiply and Add) support. Actually on all multiplication routines.

new assembler optimization -> AVX instruction set support

* All SSE optimiized functions have been converted to AVX routines with further optimizations.
* AVX is originally implemented on FPC and converted by an automatic process to Delphi by converting AVX opcodes to DB instructions.
* Added a new project that does the automatic conversion.

a new global subspace method has been added: Partial Least Squares

Thanks to Gustav Kaiser who actually implemented the algorithm the library is now including an algorithm for partial least squares. * Projection function to map new examples to the output space i.e. for regression tasks
* Persistence

added a debugger visualizer

For all Delphi2010 (and later) users out there... there is now a debugger visualizer around that allows you - so check out the download area:
* examine some properties of the evaluated matrix
* examine and edit the matrix data in a grid (up to 65536x65536)
* colored visualization of submatrices
* configurable appearance

new functions added

* A new least squares solver based on QR decomposition
* Auxilary functions diff and cumulative sum row and columnwise
* Repeat matrix

speed update and bug fixes

* SVD speed improvement of about 40% due to assembler optimized plane rotations
* Most of the routines use some working memory via getMemory. This memory could have been used before and left in an uninitialized state. This is not very troublesome but if this already used memory has some NAN in it it may happen that the QR, Cholesky, LU and SVD decompositions raise exceptions.
* Implemented a fast mem alloc (init to 0)

panel update and multithreaded SVD

* Blockwise SVD
* SVD now is able to use multithreaded QR, Mult and matrix rotations routines.
* SVD interface changed: The V matrix is now returned as V transposed. * Updated the simple thread pool and removed the suspend and resume calls. Events seem to be way faster in waking a thread than the resume method.
* Revisited the multithreaded calling semantics. The current thread also executes one block so we spare one thread wake up and the suspension of the current thread.

matrix sorting

* Quicksort algorithm to sort matrices column or row wise
* Multithreaded version of that which splits the matrix in indepedpendent columns or rows and sorts these
* New "TakeOver" routine that grabs the memory from a different matrix and clears that. This avoids quite a few matrix copying calls

block based cholesky decomposition

* new block based cholesky decomposition implementation.
* threaded cholesky decomposition (matrix multiplication step).
* the recursive and single line implementations have also been implemented for comparison.

random number generation

* Mersenne Twister random number generation.
* Thread save standard Delphi random number generator (got rid of the global RandSeed variable).
* Intel RDRAND assembler instruction.
* Operating system supported random number generation via the CryptGenRandom (win32/win64) function and and (untested) version of reading a file stream from /dev/random when compiled for OSX.
* New utility functions for least common multiple and greatest common divisor

new subspace method: independent component analysis

* The implementation is based on the matlab fast ica package including pca preprocessing.
* Some new matrix initialization functions (random init) * One new assembler optimized matrix multiplication that speeds up the QR decomposition a bit.

new features

* Vector access property
* Column or row wise Median (one vector is processed in O(n) time!)

full cache oblivious QR decomposition

* Block wise Q and R generation from the economy size qr matrix
* Threaded version of Q and R generation.
* Added a lot of QR test cases also for asymmetric input matrices.
* The resulting class type is now always from the same as the one the operation took place.
* Changed an NNMF test case such that no floating point exception occurs any more.

new global subspace method added: non-negative matrix factorization

* Implementation of an iterative update scheme which leads to the NNMF matrices.
* Schemes are: Alternating Least Squares, Divergence Updates and Eukledian updates.
* Since the NNMF is not unique and also depends on the randimized input you see different matrices H and W in the tests.
* Optimized the memory handling in the blocked matrix/vector multiplication routine.
* Fixed a problm on platforms with many cpus (> 10) where the breaking criteria was met too early leading to an incomplete result.

new cache oblivous QR decomposition

* Implementation fo Lapacks dgeqrf and dgeqr2 functions.
* Multithreaded version where the big matrix updates use more cores.
* Fixed a problem in the variance calculation on Delphi XE6 - the constants are unaligned.
* The x64 stack juggling now always uses movupd for saving and storing xmm registers.
* Some code cleaning - removed non referenced units.

Adjustments for the mrAI library

* Made a few functions public for the mrAI library

New functions

* Added variance and standard deviation calculation (row or column wise) to the matrix class
* Implemented these as SSE3 optimized function (x86 and x64)
* Some tidy up work (removed unnecessary unit references).

freepascal x64 compatibility

* Freepascal compatiblity for x64 has been added and tested with Lazarus 1.2.0
* Fixed a few unit testing problems with wrongly used indices.
* Rewrote x64 stackhandling to mimic the .savemm and .pushenv Delphi x64 asm pseudoinstructions.
* Delphi x64 seems to not differentiate between param names and the used registers. FPC does.
* Found a few SegFaults due to the different handling of Delphi and FPC memory allocations and checks.

freepascal compatibility

* For 32 bit targets the library is now working with FreePascal (Lazarus).
* A unit test project was added which is runnable with the latest Lazarus package.
* A memory leak has been detected and fixed in the threaded LU decomposition version.
* Reduced the threaded LU unit test size since FreePascal seems to do floating point a bit different than Delphi. The accuracy seems to be a bit to low for very large matrices.
* The matrix size can now be set with one call to "SetWidthHeight" in the matrix class (made public).
* The initialization of some arrays is now explicitly done. FreePascal seems to be a bit alergic on these arrays.

New Subspace Method: CCA

* A new class TCCA is available in the matrix library which implements the Canonical Correlation between two matrices utilizing a SVD approach.
* The matrix class has been extended with an elementwise ABS (absolute value) and an elementwise addition.
* Instead of asserts the matrix class will now always throw an error on bad input.

MacOS support

* Thanks to William Cantrall there is now support for the MacOS platform.
* Rewrote the thread pooling code to make the compatible for both platforms.
* Windows now supports either native Win thread pools or the original home brewn one.
* Some minor changes within the assembler routine interfaces to use more nativeInt declarations instead of integer.

Incremental PCA

* there are now two classes to calculate the PCA incrementaly according to a paper of Danijel Skocaj: "Robust Subspace Approaches to Visual Learning and Recognition". I also implemented a fast robust version of that which is basically the same as used in the standard PCA calculation.
* Unfortunately the threaded implementations of weren't fully fit for more than 2 cores. On my new testing PC I ran into a few issues there and fixed them in this library.
* There were quite a few Pascal implementations in the ASMMatrixOperations.pas unit - these were moved to the SimpleMatrixOperations.pas unit.
* The unit ASMConsts.pas was deleted and it's content was moved to MatrixConst.pas


* The multiplication of x*y' (transposed) could cause an AV if the resulting matrix width is 1. The transposed matrix multiplication is also used in the normal matrix multiplication.
* The elemetwise matrix multiplication can fail on x64 systems on odd matrix widths since the mulpd operation does not support unaligned direct memory accesses.

Bufixes for the mrMath library

* A problem was found in the element wise multiplication. In case of an odd width the last element was added instead of multiplied.
* Added some unit tests which check for this problem.
* The matrix thread pool initializes now with less threads but these threads are created in the constructor.

Recursive LU decomposition algorithm

* Changed the core algorithm for calculating the LU decomposition (check InternalRecursiveMatrixLUDecompInPlace in LinearAlgebraicEquations. Original algorithm from here.
* This affected the outcome of: Matrix inversion, Matrix determinant, Matrix solve
* New row exchange methods.
* Multithreaded version of the LU decomposition algorithm.
* Fixed a problem in the 64bit assembler version of matrixScaleAdd
* Extended the persistence interface a bit.

Bugfix update and a new Non Linear Fitting class

There has been problems in:
ASMMatrixelementWiseMultOperations.pas - an assert popped up although everything was fine.
ASMMatrixNormOperations.pas Fixed problems with larger matrices. Some elements were wrongly scaled.

New Features:
Matrix.pas - A new cloning function was introduced.
NonLinearFit.pas - A non linear fitting class based on the "Levenberg-Marquardt" algorithm. The algorithm is basically a gradient decent alg. wheras the gradient is estimated by the data points itself - a calculated gradient function is missing.

New Features for the Delphi Matrix Library

There is a new update available for the matrix library. I'm proud to announce that there is a new class for Principal Component Analysis available featuring even a fast robust implementation when mapping e.g. an image into the PCA feature space. This algorithm checks for outlying pixels (e.g occlusion) which can completely disrupt the result when mapping into the feature space. Note that this algorithm can handle up to 40 to 50% of occluded regions and is still fast. The test project was updated and shows the handling of images and matrices.
Another step forward is the implementation of current BLAS3 algorithm in the matrix multiplication algorithm, basically implementing the Strassen multiplication algorithm. This algorithm can be faster that a conventional algorithm in most cases - up to 20% on large multiplication cases - but can also suffer of round off errors in cases of badly scaled matrices. The library thus allows you the to choose between a conventional or fast multiplication.

New Delphi Matrix Library

This library contains numerous assembler hand optimized base matrix functions as well as many higher order functions like singular value decomposition, matrix inversion, pseudo inversion, LU decomposition and many others. Note the even x64 assembler code is available to grant maximum compatibility. To install the downloaded package simply compile the attached package  and add the folder to the  library path. You can find a few examples on how to use the matrix class (or the plain functions) in the test project.
I'm open for ideas to enhance this library as much as possible so feel free to contact me.

The library is released under the Apache license meaning that it may also be integrated into commercial products.

If you think that the software is useful and want to support the development then feel free to  make a donation.

Update: The matrix class  had a bug in some of the 64bit pointer arithmetic code. This has been fixed.