## NPTEL Data Science for Engineers Assignment 3 Answers 2023:

#### Q.1. Sumit wants to contact one of his friends, but he remembers only the first 9 of the 10 digits of the contact number. He is sure that the last digit of the contact number is an odd number. He selects an odd number randomly. If the random variable X denotes the last digit of the contact number, then calculate Var(X).

#### Q.2. Suppose X∼Normal(µ,4). For n=20 iid samples of X, the observed sample mean is 5.2. What conclusion would a z-test reach if the null hypothesis assumes µ=5 (against an alternative hypothesis µ≠5) at a significance level of α=0.05? Use F−1z(0.025)=−1.9599

**a. Accept H0**- b. Reject H0

#### Q.3. A box contains 8 items out of which 2 are defective. A sample of 5 items is to be selected randomly (without replacement) from the box. If the random variable X represents the number of defective items in a selection of 5 items, then find E(X).(Enter the answer correct to 2 decimal places)

#### Q.4. Suppose X∼Normal(μ,9). For n=100 iid samples of X, the observed sample mean is 11.8. What conclusion would a z-test reach if the null hypothesis assumes μ=10.5(against an alternative hypothesis μ≠10.5)?

- a. Accept H0 at a significance level of 0.10.
- b. Reject H0 at a significance level of 0.10.
- c. Accept H0 at a significance level of 0.05.
**d. Reject H0 at a significance level of 0.05.**

#### Q.5.** **Let X and Y be two independent random variables with Var(X)=9 and Var(Y)=3, find Var(4X−2Y+6).

- a. 100
- b. 140
**c. 156**- d. None of the above

#### Q.6. The correlation coefficient of two random variable X and Y is 14, their variance is given by 3 and 5. Compute Cov(X,Y).

#### Q.7. When will you reject the Null hypothesis?

- a. p value greater than α
**b. p value less than α**- c. p value equal to α
- d. None of the above

#### Q.8. A sample of N observations are independently drawn from a normal distribution. The sample variance follows

- a. Normal distribution
- b. Chi-square with N degrees of freedom
**c. Chi-square with N−1 degrees of freedom**- d. t-distribution with N−1 degrees of freedom

#### Q.9. A car manufacturer purchases car batteries from two different suppliers. Supplier X provides 55% of the batteries and supplier Y provides the rest. 5% of all batteries from supplier X are defective and 4% of all batteries from supplier Y are defective. You select a battery from the bulk and you found it to be defective. What is the probability that it is from Supplier X?

#### Q.10. Which one of the following is best measure of central tendency for categorical data?

#### About NPTEL Data Science for Engineers Course:

**Learning Objectives :**

- Introduce R as a programming language
- Introduce the mathematical foundations required for data science
- Introduce the first level data science algorithms
- Introduce a data analytics problem solving framework
- Introduce a practical capstone case study

**Learning Outcomes:**

- Describe a flow process for data science problems (Remembering)
- Classify data science problems into standard typology (Comprehension)
- Develop R codes for data science solutions (Application)
- Correlate results to the solution approach followed (Analysis)
- Assess the solution approach (Evaluation)
- Construct use cases to validate approach and identify modifications required (Creating)

##### Course Layout:

**Week 1:**Course philosophy and introduction to R**Week 2:**Linear algebra for data science- 1. Algebraic view – vectors, matrices, product of matrix & vector, rank, null space, solution of over-determined set of equations and pseudo-inverse)
- 2. Geometric view – vectors, distance, projections, eigenvalue decomposition
**Week 3:**Statistics (descriptive statistics, notion of probability, distributions, mean, variance, covariance, covariance matrix, understanding univariate and multivariate normal distributions, introduction to hypothesis testing, confidence interval for estimates)**Week 4:**Optimization**Week 5:**1. Optimization- 2. Typology of data science problems and a solution framework
**Week 6:**1. Simple linear regression and verifying assumptions used in linear regression- 2. Multivariate linear regression, model assessment, assessing importance of different variables, subset selection
**Week 7:**Classification using logistic regression**Week 8:**Classification using kNN and k-means clustering

**CRITERIA TO GET A CERTIFICATE**:

Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course.

Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

**YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.**

