Engineering musings......: September 2019

Wednesday, September 25, 2019

To attempt or not to attempt - A risk management approach toward negative marking

1.0 Introduction

In some of the competitive examinations, there is negative marking. When equal weightage is given for both right answer and wrong answer, the risk of losing marks when a wrong answer is given is high. In case of multiple choice questions, there will be always some probability of answering the question correctly, which increases when there is more than one right answer.

In risk analysis, we have the following formula for the expected value of a variable X.

E[X]=Sp_iX_i for all possible values of X=X_i,

where

E[X]=expected value of X.

p_i= probability of the value of X being X_i

2.0 Joint Entrance Exam Main (JEE Main) - A case study

2.1 Introduction - JEE Main

In India, we have the famous Joint Entrance Examination (JEE) for admission into various engineering, architecture and planning courses across the nation offered by various academic institutions run by or funded by the Government of India. This is conducted in two phases i.e. JEE Main and JEE Advanced. JEE Main is the basic qualifying examination for the said courses.

For 2020, the information bulletin for JEE Main can be found at the following link:

https://jeemain.nta.nic.in/WebInfo/Handler/FileHandler.ashx?i=File&ii=58&iii=Y

From the above bulletin (see pages 6 and 7), we find that every correct answer is awarded 4 marks and every wrong answer is awarded one negative mark (-1).

2.2 Definition: Uncertain question

Let us define an uncertain question as the MCQ for which the examinee doesn't know the right answer.

2.3 Answering uncertain questions at random

Going by the question paper for 2019, there are four options for each multiple choice question (MCQ).

The probability of choosing the right answer at random is 1/4=0.25.

The probability of choosing the wrong answer at random is 3/4=0.75.

So, the expected value of attempting a question by randomly choosing an answer is 0.25x4-0.75*.1=0.25.

This means, one may still score positively by taking a chance. However, the probability of the answer being wrong is very high.

When four questions are wrongly answered and one question is rightly answered out of five questions, the outcome is 4*(-1)+4=0. From this, it is evident that the risk of losing marks happens only when one scores less than 1/5th of the total number of questions where one is uncertain of the answer.

When p is the probability of answering a question correctly, 1-p is the probability of answering it wrong. In this case, p and 1-p are 0.25 and 0.75 respectively.

The probability of answering r answers or less correctly for n questions is,

When n= 1 to 5, negative marks will be scored only when i=0.
The probability of obtaining negative marks for n=1 to 5 is as follows:

n	P
5	0.237
4	0.316
3	0.422
2	0.563
1	0.750

Given that there are 20 MCQs in each question paper, the probability of obtaining negative marks by answering all questions at random occurs when less than 4 answers are right i.e. when the number of right answers is 3 or less.

The probability of obtaining negative marks when 20 questions are answered at random is P=0.225.

From this, though the probability of scoring positive marks or zero out of all uncertain questions is higher except when the no. of MCQS are one or two, there is still a good amount of risk involved. Whether one can take this much risk or not, depends on the future plans one has.

2.4 Making intelligent guesses for MCQs

The risk of scoring negative can be minimized by guesstimating the answers that appear to be the most right.

Assumption: Let us assume the probability of choosing the right answer increases to p=0.5 when one makes an intelligent guess combined with subject matter knowledge.

In this case, the estimated value of attempting one uncertain answer is 0.5*4-0.5*1=1, which is much higher than before.

Now, the probability of a negative score when 8 uncertain questions are answered is,

P=0.035 which is statistically insignificant, as statistical signifcance is often attached to a confidence level of 95% (probability of 1-0.05) and more. This means, we can say with a confidence level of 0.965(1-0.035) or 96.5% that will either improve one's score or there is no harm at least, by making intelligent guesses when the no. of uncertain questions is 8 or more.

Even when there is only one uncertain question, the probability of scoring a negative mark is 0.5 only if the assumption made above is right. For a higher number of uncertain questions, it will be less than 0.5 which means the probability of net benefit or no loss out of the whole exercise is higher.

3.0 Applicability of the case study

The above case study is applicable in similar cases when the following conditions are met

When the options provided are four out of which only one is correct. When more than one option is correct, however, the risk comes down.
The ratio of positive marking to negative marking is 4:1.

4.0 Conclusions

For the above case study, it is shown that the expected value from taking the risk of attempting uncertain questions is always positive.
The probability of scoring positive is higher than the probability of scoring negative, when more than two multiple choice questions are answered at random.
The probability of scoring positive is higher than the probability of scoring negative when more than one multiple choice question is answered by making an intelligent guess, when one can guesstimate correctly with a probability of 0.5.
The risk of scoring negative marks comes down when one makes an intelligent guess of the answer rather than selecting the answer at random.
When 8 or more questions are answered by an intelligent guess, there is a very good possibility of scoring positive or suffering no loss from the guess work, when the probability of guesstimating the right answer is 0.5.

5.0 Path forward

One can do similar math when the number of options and/or the marking scheme are different.
The confidence level of scoring no negative mark can be recalculated by varying the probability of success for an intelligent guess.
Coaching centers can find out the probability of their students guessing the answer right in each subject separately and advise the students individually. As they have math and stat experts, they can even come up with better models (Good if they are already doing it, but I have not heard of any).
Coaching centers also need to teach and demonstrate how to make better guesses. Those preparing themselves for competitive exams on their own need to develop their own strategies.
When one has multiple options for pursuing higher studies, or multiple career options, one can take bigger risks, as bigger rewards may come from bigger risks.

Tuesday, September 17, 2019

Importance of data literacy for non-IT engineers - Some random thoughts

The big question(s)

Often one may question - why do we get into all this statistics, six sigma, data analytics and all such jargon to do my day-to-day engineering work? I know my stuff and have years of experience behind me. Can a statistical model interpret or predict better than me? Don't we lose focus by doing all this? Will it not require extra efforts?

Well, my article may not provide answer to each and every question a traditional engineer like many of us may have, but will try to address the concerns and apprehensions at a broader level.

What is data analytics and where is it applied?

I do not want to provide any standard textbook or classroom definition of what is meant by data analytics, but one can always find it by googling or referring to the right resources available aplenty online. Many are aware of it by now, and many aren't aware too!

With or without saying, analytics is applied everywhere. When you have a numerical data, as an engineer, you will at least try to find what is its mean and what is its range (maximum and minimum).
This is a very elementary application of data analytics. As you further probe, you will find other parameters like standard deviation, median, mode etc. which statistically describe your data. For large populations, you will do sampling. This is followed by finding the probability distributions associated with your data, making inferences, and the journey continues...

From the above examples, it is evident that the concept of data analytics is not at all new. Analytics at its most elementary level does not require a computer at all!

I am a non-IT engineer. Why do I need it?

In engineering, a lot of data gets generated. One good example is traffic studies. For a structured review and for drawing meaningful conclusions, one needs to apply analytics.

Well, I design a structure or machine. I have my formulas and software. When I use the formulas rightly, I will get the right output. Why do I need analytics? One may say.

In engineering, many of the formulas are either complex or empirical. When we use such formulas, it is always essential to study the variation in the output as the inputs vary.

When the formulas are not derived from first principles but were empirically arrived at from observations, regression etc., it will be to one's own advantage to study and analyze the data generated by those formulas. This enables one to have control over one's designs and take informed decisions.

If an equation is empirically arrived at using regression, then studying it further generates additional insights. One can make new models, re-validate the equation, or may even come up with a better equation!

These types of empirical equations we find often in fluid mechanics for example. Unfortunately, in most of the text books I studied in my engineering in the 90s, almost nowhere the rationale behind the empirical formulas was explained.

Even when you have a derived complex formula, the formula will be based on several assumptions. Also the inputs like material properties have their own confidence level attached to them. A statistical study will bail us out from the complexities of the equation as we get to study the numerical data and the trends it follows.

Needless to say that the data from experimentation and observations needs to be studied statistically, which many do, with or without calling it analytics. If a statistical study is not performed, the findings are just not right and it is high time one realized that.

In any case, one can always find out what are the inputs that influence the output most. This can be achieved by design of experiments and sensitivity analyses.

Two types of engineering data

Without getting into the standard technical terms in data analytics, I would classify the numerical data into two types from an engineer's perspective. The first one is visible data, abundantly available everywhere. The second type of data is invisible data, an example of which is hidden in the form of formulas and gets generated only when you calculate the output using those formulas and varying the inputs.

The visible data needs to be handled, using relevant data analytics techniques.

The invisible data can also be handled the same way, by generating data, arranging it in tables, plotting it graphically for visualization and then analyzing.

Sources of the visible data are, data from experimentation, published and validated data, data gathered from observations and so on.

Sources of invisible data not only those from the equations, but also those from computer simulations. For example, the output data generated by a finite element software can be put in this category.

Should I test my data when there is a clear formula?

As people say "seeing is not believing", seeing is not always believing. Hence the data needs to be put to statistical tests. This helps in decision making for those who are unsure of the direction in which one needs to go. Even for those who are "sure" by experience, it helps in re-validating their experience based knowledge.

Examples of some applications

In manufacturing engineering, one often uses the principles of six sigma. There are many statistical techniques that are common between six sigma and data analytics. By exposing oneself to data analytics, one will have a wider range of tools to handle one's data.

Design of experiments is often conducted in engineering, particularly mechanical engineering.

I see the term sensitivity analysis often being used in the analysis of high rise buildings. There are some who are loosely using this term just because the standards are using it. Maybe the standards and leaders of the industry in developing them need to elaborate more on what it means by sensitivity analysis in line with its standard definition. It will be of great help if a standardized definition is included in the Indian tall concrete buildings code IS 16700.

As one is aware, advanced concepts like artificial intelligence, machine learning etc. are used in all industries.

Epilogue

One may say what I have written is nothing but statistics and hence old wine in new bottle! I would rather call it old nectar in new bottle, often left untasted by many.

The techniques that I have discussed and even more complex techniques are often used by researchers traditionally in engineering. But knowledge and application of the basic concepts at working level will be of great help as discussed and will elevate the society to the next level of a data literate society from a compute literate society.

Path forward

Below is the path forward I would suggest for promoting data literacy among engineers.

Engineers should consider refreshing their undergraduate level statistics knowledge as part of their Continuous Professional Development.
Guidelines should be framed to include a minimum number of questions on statistics and probability in exams like Professional Engineer's exams.
A fixed number of minimum Professional Development Units (PDUs) in statistics and probability/data analytics/data science should be prescribed for retaining a Professional Engineer's or a project manager's certification.
Non-IT engineering students interested in analytics with the intention of using it in their core field or making it their career, can consider doing a project in analytics (wait for my next article).
Authors of engineering textbooks should include brief notes on how the empirical formulas have been arrived at, and provide reference to the original sources.

Saturday, September 7, 2019

Higher concrete grade or more reinforcement? - A data driven approach for optimal shear strength

1. Introduction

Many a time the structural engineer has to choose between higher concrete grade and higher percentage of reinforcement. Some believe that increasing the concrete grade can result in savings and some intuitively view a high concrete grade to be very uneconomical for a small project.

The shear strength of concrete section varies based on the % of reinforcement and the grade of concrete.

In this article, let us examine whether higher concrete grade or higher reinforcement % makes significant difference for the shear strength of the section, by referring to the Indian concrete code IS 456:2000.

2.0 Data collection

2.1 Shear strength:

The below table (table 1) from IS 456:2000 is our data to be examined.

The parameters A_{s ,}b and d in table 1 are defined as below:

A_s=area of tension reinforcement.
b=width of the concrete member
d=effective depth of the concrete member (overall depth minus distance between reinforcement centroid and extreme concrete fiber on tension side).

Table 1

2.2 Material price

The below material prices have been assumed.

Table 2

Grade of Conc.

Price (Rs/cum)

Shuttering (Rs/Sqm)

M15

4500

500

M20

4800

500

M25

5000

500

M30

5500

500

M35

5800

500

M40

6000

500

Reinforcement cost (Fe500): Rs. 60000/- per ton

2.3 Data clean-up

In the absence of an excel file for the shear strength table, the above picture was converted into Excel by using the below online character recognition tool.

https://www.onlineocr.net/

The data converted into excel had some decimal points missing, and some numbers not properly recognized. The data table being small, this was cleaned up by visual inspection.

3.0 Case study

Let us consider a one way slab of 200 mm thickness. Assume 20mm clear cover and 16mm dia rebar. Then effective depth of the section d=200-20-16/2=172 mm.

Our shear strength studies will be based on this slab.

4.0 Assumptions

The following assumptions have been made.

Maximum concrete grade is M40.
Minimum % of reinforcement is 0.15.
Maximum % of reinforcement is 3.0
The depth of the section is constrained to be constant.
Costing data is assumed.
The slab requires no shear links.

5.0 Exclusion

The below is the exclusions

Distribution reinforcement is not considered in estimation and costing.

6.0 Data generation

For the slab under consideration, the following data is further generated from the shear strength and material pricing data collected.

6.1 Material price of reinforced concrete

The following material prices have been arrived at per cu m.

Table 3

		Cost of reinforced concrete in Rs. per cu m
		M15	M20	M25	M30	M35	M40
100A_s/bd	0.15	7608	7908	8108	8608	8908	9108
	0.25	8013	8313	8513	9013	9313	9513
	0.5	9025	9325	9525	10025	10325	10525
	0.75	10038	10338	10538	11038	11338	11538
	1	11051	11351	11551	12051	12351	12551
	1.25	12063	12363	12563	13063	13363	13563
	1.5	13076	13376	13576	14076	14376	14576
	1.75	14089	14389	14589	15089	15389	15589
	2	15101	15401	15601	16101	16401	16601
	2.25	16114	16414	16614	17114	17414	17614
	2.5	17127	17427	17627	18127	18427	18627
	2.75	18139	18439	18639	19139	19439	19639
	3	19152	19452	19652	20152	20452	20652

6.2 Normalized material price of reinforced concrete

In table 3 above, M15 @ 0.15% reinforcement costs the least. Let us normalize the reinforced concrete costs with respect to the same, presented in the below table.

Table 4

		Cost of reinforced concrete per cu m (normalized w.r.t. M15 @ 0.15% A_s)
		M15	M20	M25	M30	M35	M40
100A_s/bd	0.15	1.00	1.04	1.07	1.13	1.17	1.20
	0.25	1.05	1.09	1.12	1.18	1.22	1.25
	0.5	1.19	1.23	1.25	1.32	1.36	1.38
	0.75	1.32	1.36	1.39	1.45	1.49	1.52
	1	1.45	1.49	1.52	1.58	1.62	1.65
	1.25	1.59	1.63	1.65	1.72	1.76	1.78
	1.5	1.72	1.76	1.78	1.85	1.89	1.92
	1.75	1.85	1.89	1.92	1.98	2.02	2.05
	2	1.99	2.02	2.05	2.12	2.16	2.18
	2.25	2.12	2.16	2.18	2.25	2.29	2.32
	2.5	2.25	2.29	2.32	2.38	2.42	2.45
	2.75	2.38	2.42	2.45	2.52	2.56	2.58
	3	2.52	2.56	2.58	2.65	2.69	2.71

6.3 Shear strength per unit normalized price of reinforced concrete

By dividing the shear strength in table 1 by the unit normalized price of reinforced concrete in table 4, let us arrive at the strength we are achieving per unit concrete price (normalized), as in table 5 below.

Table 5

		Shear strength per unit normalized price (MPa)
		M15	M20	M25	M30	M35	M40
100A_s/bd	0.15	0.28	0.27	0.27	0.26	0.25	0.25
	0.25	0.33	0.33	0.32	0.31	0.30	0.30
	0.5	0.39	0.39	0.39	0.38	0.37	0.37
	0.75	0.41	0.41	0.41	0.41	0.40	0.40
	1	0.41	0.42	0.42	0.42	0.41	0.41
	1.25	0.40	0.41	0.42	0.41	0.42	0.42
	1.5	0.40	0.41	0.41	0.41	0.41	0.41
	1.75	0.38	0.40	0.41	0.40	0.41	0.41
	2	0.36	0.39	0.40	0.40	0.40	0.40
	2.25	0.34	0.38	0.39	0.39	0.39	0.40
	2.5	0.32	0.36	0.38	0.38	0.38	0.39
	2.75	0.30	0.34	0.37	0.37	0.38	0.38
	3	0.28	0.32	0.36	0.36	0.37	0.37

7.0 Data visualization

From table 5, we see the value of shear strength we achieve per unit price increases as the % of reinforcement increases initially and then starts decreasing. This happens for all values of concrete grade, from M15 through M40. On the other hand, the strength per unit price decreases as the concrete material grade increases at lower values of 100A_s/bd. This reverses at higher values of 100A_s/bd, with the strength per unit price increasing with concrete grade.

Since the strength per unit price increases with increasing 100A_s/bd for all concrete grades, let us plot the mean shear strength across all grades considered against 100A_s/bd as in the chart below.

8.0 Data analytics

8.1 Correlation

From the above data visualization, we understand there is a steep rise in the shear strength with reinforcement increasing up to 1% and then there is a relatively mild fall.

The correlation coefficient calculated in Excel works out to be around 0.208, which indicates mild positive correlation.

8.2 ANOVA

Though ANOVA is ideally for categorical variables, it can also be used against continuous independent variables as in this case.

By performing single factor ANOVA across the rows and across the columns in Excel, below are the results obtained.

Table 6

Table 7

9.0 Interpretation of results

The mild positive correlation in the chart plotted is to initial steep increase in the value of strength followed by subsequent relatively mild decline.
From the ANOVA in Table 6, the P-value is very high across the columns i.e. for varying concrete grades. So we accept the null hypothesis and infer that the difference in the mean values of shear strength per unit price across varying concrete grades is not statistically significant.
From the ANOVA in Table 7, the P-value is very low and nearly zero across rows i.e. for varying % reinforcement. So we reject the null hypothesis and infer that the difference in the mean values of shear strength per unit price is statistically significant.

10.0 Conclusions

The below conclusions are drawn from this study.

It is the % of reinforcement that influences the shear strength achieved per every unit of currency, more than the grade of concrete.
From data visualization, it is be concluded that the influence of % reinforcement has a positive effect on the cost effectiveness of the cross-section with respect to shear strength, only up to certain % of reinforcement.

11.0 Practical Applications

Reinforcement % increased up to some percentage (1% in the present study) in a flexural member will improve the cost efficiency of the section in terms of its shear strength.
Where the section design is governed by shear strength, it will be judicious to increase the reinforcement only up to some percentage. Beyond that, other options like revising the section dimensions/slab depth etc. need to be explored.

Grade of Conc.	Price (Rs/cum)	Shuttering (Rs/Sqm)
M15	4500	500
M20	4800	500
M25	5000	500
M30	5500	500
M35	5800	500
M40	6000	500

Labels