The big question(s)
Often one may question - why do we get into all this statistics, six sigma, data analytics and all such jargon to do my day-to-day engineering work? I know my stuff and have years of experience behind me. Can a statistical model interpret or predict better than me? Don't we lose focus by doing all this? Will it not require extra efforts?Well, my article may not provide answer to each and every question a traditional engineer like many of us may have, but will try to address the concerns and apprehensions at a broader level.
What is data analytics and where is it applied?
I do not want to provide any standard textbook or classroom definition of what is meant by data analytics, but one can always find it by googling or referring to the right resources available aplenty online. Many are aware of it by now, and many aren't aware too!
With or without saying, analytics is applied everywhere. When you have a numerical data, as an engineer, you will at least try to find what is its mean and what is its range (maximum and minimum).
This is a very elementary application of data analytics. As you further probe, you will find other parameters like standard deviation, median, mode etc. which statistically describe your data. For large populations, you will do sampling. This is followed by finding the probability distributions associated with your data, making inferences, and the journey continues...
This is a very elementary application of data analytics. As you further probe, you will find other parameters like standard deviation, median, mode etc. which statistically describe your data. For large populations, you will do sampling. This is followed by finding the probability distributions associated with your data, making inferences, and the journey continues...
From the above examples, it is evident that the concept of data analytics is not at all new. Analytics at its most elementary level does not require a computer at all!
I am a non-IT engineer. Why do I need it?
In engineering, a lot of data gets generated. One good example is traffic studies. For a structured review and for drawing meaningful conclusions, one needs to apply analytics.
Well, I design a structure or machine. I have my formulas and software. When I use the formulas rightly, I will get the right output. Why do I need analytics? One may say.
In engineering, many of the formulas are either complex or empirical. When we use such formulas, it is always essential to study the variation in the output as the inputs vary.
When the formulas are not derived from first principles but were empirically arrived at from observations, regression etc., it will be to one's own advantage to study and analyze the data generated by those formulas. This enables one to have control over one's designs and take informed decisions.
When the formulas are not derived from first principles but were empirically arrived at from observations, regression etc., it will be to one's own advantage to study and analyze the data generated by those formulas. This enables one to have control over one's designs and take informed decisions.
If an equation is empirically arrived at using regression, then studying it further generates additional insights. One can make new models, re-validate the equation, or may even come up with a better equation!
These types of empirical equations we find often in fluid mechanics for example. Unfortunately, in most of the text books I studied in my engineering in the 90s, almost nowhere the rationale behind the empirical formulas was explained.
Even when you have a derived complex formula, the formula will be based on several assumptions. Also the inputs like material properties have their own confidence level attached to them. A statistical study will bail us out from the complexities of the equation as we get to study the numerical data and the trends it follows.
Needless to say that the data from experimentation and observations needs to be studied statistically, which many do, with or without calling it analytics. If a statistical study is not performed, the findings are just not right and it is high time one realized that.
In any case, one can always find out what are the inputs that influence the output most. This can be achieved by design of experiments and sensitivity analyses.
Without getting into the standard technical terms in data analytics, I would classify the numerical data into two types from an engineer's perspective. The first one is visible data, abundantly available everywhere. The second type of data is invisible data, an example of which is hidden in the form of formulas and gets generated only when you calculate the output using those formulas and varying the inputs.
The visible data needs to be handled, using relevant data analytics techniques.
The invisible data can also be handled the same way, by generating data, arranging it in tables, plotting it graphically for visualization and then analyzing.
Sources of the visible data are, data from experimentation, published and validated data, data gathered from observations and so on.
Sources of invisible data not only those from the equations, but also those from computer simulations. For example, the output data generated by a finite element software can be put in this category.
Should I test my data when there is a clear formula?
As people say "seeing is not believing", seeing is not always believing. Hence the data needs to be put to statistical tests. This helps in decision making for those who are unsure of the direction in which one needs to go. Even for those who are "sure" by experience, it helps in re-validating their experience based knowledge.
Needless to say that the data from experimentation and observations needs to be studied statistically, which many do, with or without calling it analytics. If a statistical study is not performed, the findings are just not right and it is high time one realized that.
In any case, one can always find out what are the inputs that influence the output most. This can be achieved by design of experiments and sensitivity analyses.
Two types of engineering data
Without getting into the standard technical terms in data analytics, I would classify the numerical data into two types from an engineer's perspective. The first one is visible data, abundantly available everywhere. The second type of data is invisible data, an example of which is hidden in the form of formulas and gets generated only when you calculate the output using those formulas and varying the inputs.
The visible data needs to be handled, using relevant data analytics techniques.
The invisible data can also be handled the same way, by generating data, arranging it in tables, plotting it graphically for visualization and then analyzing.
Sources of the visible data are, data from experimentation, published and validated data, data gathered from observations and so on.
Sources of invisible data not only those from the equations, but also those from computer simulations. For example, the output data generated by a finite element software can be put in this category.
Should I test my data when there is a clear formula?
As people say "seeing is not believing", seeing is not always believing. Hence the data needs to be put to statistical tests. This helps in decision making for those who are unsure of the direction in which one needs to go. Even for those who are "sure" by experience, it helps in re-validating their experience based knowledge.
Examples of some applications
In manufacturing engineering, one often uses the principles of six sigma. There are many statistical techniques that are common between six sigma and data analytics. By exposing oneself to data analytics, one will have a wider range of tools to handle one's data.
Design of experiments is often conducted in engineering, particularly mechanical engineering.
I see the term sensitivity analysis often being used in the analysis of high rise buildings. There are some who are loosely using this term just because the standards are using it. Maybe the standards and leaders of the industry in developing them need to elaborate more on what it means by sensitivity analysis in line with its standard definition. It will be of great help if a standardized definition is included in the Indian tall concrete buildings code IS 16700.
As one is aware, advanced concepts like artificial intelligence, machine learning etc. are used in all industries.
As one is aware, advanced concepts like artificial intelligence, machine learning etc. are used in all industries.
Epilogue
One may say what I have written is nothing but statistics and hence old wine in new bottle! I would rather call it old nectar in new bottle, often left untasted by many.
The techniques that I have discussed and even more complex techniques are often used by researchers traditionally in engineering. But knowledge and application of the basic concepts at working level will be of great help as discussed and will elevate the society to the next level of a data literate society from a compute literate society.
Path forward
Below is the path forward I would suggest for promoting data literacy among engineers.
- Engineers should consider refreshing their undergraduate level statistics knowledge as part of their Continuous Professional Development.
- Guidelines should be framed to include a minimum number of questions on statistics and probability in exams like Professional Engineer's exams.
- A fixed number of minimum Professional Development Units (PDUs) in statistics and probability/data analytics/data science should be prescribed for retaining a Professional Engineer's or a project manager's certification.
- Non-IT engineering students interested in analytics with the intention of using it in their core field or making it their career, can consider doing a project in analytics (wait for my next article).
- Authors of engineering textbooks should include brief notes on how the empirical formulas have been arrived at, and provide reference to the original sources.
No comments:
Post a Comment