Skip to main content

Scilab matrix norm optimization

I had used Scilab for learning Digital Image Processing. It is a good open source alternative to Matlab. Although it has a good foundation, it was very clear that it was improvement was still possible.

In a recent image processing workshop, I learnt that the Scilab community had open sourced the source code on GitHub.

Keeping in mind that the software is continually evolving, I knew I could put my optimization skills to good use.

So, I dug into the code-base and found a few optimizable programs. So, I decided to inculcate parallelism into matrix norm calculation program, which is quite simple yet used by many other subprograms and external function calls.

https://github.com/opencollab/scilab/blob/master/scilab/modules/linear_algebra/src/c/norm.c

There were many potential areas for optimization:
  • Elimination of redundant comparisons
  • Use else if construct to remove redundant checks
  • Remove floating point equality comparison!
  • Embarrassingly parallel/perfectly parallel for loop with reductions
  • Usage of min,max reduction of for loops, supported from openMP 3.1 onwards
  • Move iterative invariants outside loops and reuse variables
  • Factor out the FREE function from conditional statements (Scilab's version of the C memory d function free)
  • Intermediate variables to store repeated value computations
The following link leads to the optimized version of norm:

https://github.com/varun-manjunath/scilab/blob/master/scilab/modules/linear_algebra/src/c/norm.c


I'll pull request the Scilab repository soon.
I've issued a pull request on the Scilab repository!

A floating point comparison can really become a harbinger of erroneous code. It will definitely not be portable due to the hardware dependability of precision and representation of floating point numbers.

http://stackoverflow.com/questions/19837576/comparing-floating-point-number-to-zero

I've also extensively used OpenMP reductions, especially with the + operator. The min reduction has also been used. Here is a link explaining min and max reductions with for loops.

http://www.techdarting.com/2013/06/openmp-min-max-reduction-code.html

There were some instances of code where I was perplexed with it's utility. There were idempotent equality comparisons and trivial integral divisions. I have left them unchanged, in the feeling that there is some fine level detail not visible with a simple inspection. I will update this point as soon as I understand what those pieces of code are meant for! Please comment to this post in case you know what is happening.  :)

The optimized code also decreased in size, owing to the elimination of unnecessary conditional statements and redundant expression calculations!

Comments

Popular posts from this blog

Parallel computing jargon

Parallel is always better than Serial... right? NO Well, in the most general case, parallel computing is better than serial computing in terms of speed and throughput. Sometimes, we have to make other considerations too. As a comparison , consider computer networking where serial transmissions are straight-forward and faster than their parallel SCSI  counterparts! Some processes are inherently not parallelizable, due to presence of data dependency. (Two bank account withdrawals from different locations which may lead to negative account balance if done simultaneously! Anyway, such a pair of withdrawals with critical section management using semaphores/mutexes conceptually and momentarily reduces to serial execution...) On a lighter note, the process of Sneezing and keeping your eyes open is not parallelizable for example! Before jumping into the concepts and principles of parallelizing a given task, let us go through some interesting set of ( controversial ...

A simple python program using multiprocessing... or is it?

I would like to show you a very simple, yet subtle example on how programs can seem to produce unreasonable outputs. Recently, I was glancing through certain programs in Python, searching for places to optimize code and induce parallelism. I started thinking of threads immediately, and how independent contexts of computation can speed up code. Although I program frequently with Python, I hadn't been using any kind of explicit parallelism in my code. So, using my C knowledge, I went towards the Threading library of Python. Long story short, that was a mistake! Turns out that the Python implementation which is distributed by default (CPython) and Pypy, both have a process-wide mutex called the Global Interpreter Lock. This lock is necessary mainly because CPython's memory management is not thread-safe. The GIL locks up any kind of concurrent access to any objects in the Python run-time to prevent any race conditions or corruption of states. This is effectively a synchroniz...

EnchantingProgram: Spoiler alert

This is part-2 of the "EnchantingProgram" post, read this post first: http://magical-parallel-computing.blogspot.in/2017/04/a-simple-python-program-using.html So, let's see the actual reason of the speedup in the C and C++ programs. Lo and Behold, it is the effect of Branch Prediction ! Surprised? Well, at least my comments in the programs should have given you some direction!!! The if condition leads to a branch in the control flow. We know that branch predictions lead to pipeline flushes and create a delay in the piped execution scheme. Modern microprocessors utilize complex run-time systems for throughput, execution speed and memory efficiency. One example of such a technique is dynamic branch prediction. A long time ago, microprocessors used only a basic technique called static branch prediction, with two general rules: A forward branch is presumed to be not taken A backward branch is presumed to be taken Now, static branch p...