Data, Data, Data, Data, Data, Data, Data, Data, Data, Data…… Well not so long ago it was machine learning, machine learning… Because we have awaken to some limit of data-driven AI, one possible way forward is to focus on data and find some explanations why data-driven AI may suitably work or not.
Our society, companies and researchers may be starved of data. It is a bit a conundrum considering as a lot of data is captured in our use of social media, multi-media, physical activities, careers and other platforms. IoT has also been opening doors to potentially capture data about many aspects of our environment, society, and voting habits. However, data captured needs to be stored adequately, cleaned and merged before becoming in a state ready for analysis.

2022 and future skills
Jobs in data science and data engineering are currently needed to require individuals with skills including SQL, database tools, data visualisation, some statistical techniques, (including data driven AI) and also statistical programming. In addition, there is a hint of data governance and security. It is worth noting the decentralisation to various cloud and other types of storage is continuing the need of distributed databases and other method of analysing the data; i.e., bring the computation to the data or data to the computations.
It has been acknowledged the future is data skills. The focus has moved from desperately applying some machine learning algorithms to gathering, storing, and bringing some structure to the data. The humongous and decentralisation of the web is also throwing a spanner in the whole picture.
Nonetheless, there is some effort to upskills our workforce; the UK government is currently bringing some intitiatives to address this issue. This is a good start to bring skills in the future and upskills our current working force. Preparing future generations requires to up skill the teachers, trainers, and other educationalists.
One example of such needs
Data skills encompasses many aspects of the data from its capture to its analysis and governance. Amazon has valued itself a high value on its innovative approach to e-commerce and use of data; it may have the largest data about customers, goods, and shopping habits. Data is translated in monetary value.
In some cases human bias may have driven capturing data. For example, women may not represented enough in martial arts sport injuries. Capturing data for one gender would limit findings and valuable research. More importantly the learning and research outcomes should focus on eliciting whether treatments and perception of injury may vary by gender. It would a lack of leadership and skills to suggest writing appropriate forms and choice of questions sufficient findings. For a school leaver, it may be suitable, at a higher level it is not.
What is needed
A conservative approach would suggest relational databases, SQL, visualisation, statistical analysis, machine learning, statistical programming. Another viewpoint would look at what is hiding behind those skills. It is sets theories, statistics, computations, differentiation, integration, matrices algebra, optimisation, model fitting, … We need good mathematical skills. It is not enough to call some commands without appreciating how and why the outcome is appropriate and correct. Think about the parametrisation of Keras methods, a good understanding of the choice of architectures and compilers requires reading the initial journal paper at least…
As the technology becomes even more complex and distributed, the upskills process is likely to continue to become a continuous part of our working life. The Cloud and integration of learning algorithms within analytical framework with other software brings new challenges in up skilling and achieving it.
The up-skills should also take into consideration not all done is stored in relational databases, but also object-oriented ones, images, videos, genome, texts, and other types of specialised storage.
Up killing should also encourage individuals to appreciate better limitations and expectations of all stakeholders. Large datasets may not be easily transferable or manipulated in memory. One example is the Covid 19 data lake. Downloading all the raw data breaks data frames in R. The solution is using smaller download and then incrementally store the data in a data table. The latter has access to paving for memory management. Ultimately, something more sophisticated will be required. Another example is importing data into Excel that exceed the limit of the spreadsheet
Up skilling should not be about clicking on some graphical user interfaces. Access let people to create relational databases without any theoretical knowledge. Down the line, many limitations were experienced. Tableau mixes Excel and Access; it does not guarantee quality of analytics and visualisation either. Only domain and mathematical knowledge can.