The 21st Century is experiencing an unimaginable situation. A lot of data is being or has been collected, but only a small portion of it has been analysed. This means that a large proportion has yet to unleash its hidden knowledge and augment our intelligence.
It is true not all data can be easily analysed. There is no doubt that individuals data, secrecy act and other legislations bring some barriers. Unstructured data and humungous data can also prevent their transfert and complex pre-processing prior to analysis bring other kind of barriers too.
Many advancements has been made through development of mainframes and other high performance computing facilities. However, we need to consider the algorithms that transform data into information. Each of these algorithms needs to work efficiently with an even larger amount of data. While the hardware is improving, greener aspect and time resources need to be kept low. Otherwise, it becomes too costly and infeasible to use.
In a previous posts, I have argued that symbolic AI and programming languages should be integrated; Algorithms past, present, and future. In another post, I have discussed about scalability in machine learning. Today, I would like to discuss the need of scalability of simpler algorithms and rely on mathematical ideas. I believe not only neural and symbolic AI will suffer from a scalability predicaments, but simpler techniques too.
Let’s use a simple example, factorial
If you are not aware, but help us approximating the Euler number. e is used in many analytical methods and neural AI. So, if we could use an effective and economical method to find its value, we could save money on computation and reduce resources consumption. We could save the planet too.
Four algorithms compute factorial. All these algorithms have been written in Python with no additional libraries used. So, it would be quite quick to compute the factorial of 10 or 15. However, how long would it take to compute the factorial of 10,000?
This Jupyter notebook demonstrates how quickly each of these algorithms compute the factorial of 10,000. Some algorithms simpler algorithms can take nearly 20 milliseconds, which is quite a long time for a computer. The most complex one take half of that time approximately. The additional pre-computations reduce the complexity of execution.
Complexity and execution time
Mathematics can model the execution time and predict how algorithms may behave if we increase the size of the input. Mathematics and counting operations can be expressed as algebraic expression. Then, the latter indicates the behaviours as the number of inputs grow large. Some pretty curves show how fast the use of resources increase. The least wanted behaviour is having an algorithm that uses rapidly some resources and a lot of computation time. Then, we would be waiting quite a long time to find our answer. Think of the Hitchicker guide of the galaxy, 42 was computed over many centuries. Well, an optimised tour that visit every cities in China was computed using the equivalent of more than 100 years of computation, if no parallel HPC was used.
What do we need to do…
Our aim is to write some algorithms that have a flatten complexity curve, to unhearth more knowledge from our data. We need to continue to explore and skill computer scientists with tools that enable them to design, implement and test these algorithms. We need to enable them to understand how and why they are effective with large datasets. We need to develop analytical skills and curiousity to understand the tools we may be using for granted to develop them. We need to innovate in our design of algorithms by either automating their design or designing them through human endeavour.



