Please enter your details correctly below, you will get detailed explanations of all the questions in the quiz.

Q1. Which of the following is the most appropriate strategy for data cleaning before performing clustering analysis, given less than a desirable number of data points?

Capping and flooring of variables

Removal of outliers

1 only

2 only

both 1 and 2

None

Q2. What is the minimum no. of variables/ features required to perform clustering?

0

1

2

more than 2

Q3. For two runs of K-Mean clustering is it expected to get same clustering results?

Yes

No

Q4. Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Means?

Yes

No

Can't Say

Q5. How can Clustering (Unsupervised Learning) be used to improve the accuracy of Linear Regression model (Supervised Learning):

Creating different models for different cluster groups.

Creating an input feature for cluster ids as an ordinal variable.

Creating an input feature for cluster centroids as a continuous variable.

Creating an input feature for cluster size as a continuous variable

1

1 and 2

1 and 4

3 only

2 and 4

All of the above

Q6. In which of the following cases will K-Means clustering fail to give good results?

Data points with outliers

Data points with different densities

Data points with round shapes

Data points with non-convex shapes

1 and 2

2 and 3

2 and 4

1, 2 and 4

Q7. What should be the best choice of no. of clusters based on the following results:

1

2

3

4

Q8. Which of the following is/are valid iterative strategy for treating missing values before clustering analysis?

Imputation with mean

Nearest Neighbor assignment

Imputation with Expectation Maximization algorithm

All of the above

Q9. If two variables V1 and V2, are used for clustering. Which of the following is true for K means clustering with k =3?

If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line

If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line

1 only

2 only

1 and 2

None

Q10. Feature scaling is an important step before applying k-means algorithm. What is the reason behind this?

In distance calculation it will give the same weights for all features

You always get the same clusters. If you use or don’t use feature scaling

In Manhattan distance it is an important step but in Euclidian it is not

None

Q11. Which of the following method is used for finding optimal # of clusters in k-means algorithm?

Silhoutte score

Gap statistic

Elbow Method

All of the above

Q12. Which of the following are the high and low bounds for the existence of F-Score?

[0,1]

(0,1)

[-1,1]

None

Great job!

Need a detailed solution set? Fill in your details below and you will get an answer key with the detailed solution set absolutely (FREE OF CHARGE! - NO HIDDEN CHARGES).