Q1. Which of the following is the most appropriate strategy for data cleaning before performing clustering analysis, given less than a desirable number of data points?

  1. Capping and flooring of variables

  2. Removal of outliers

Q2. What is the minimum no. of variables/ features required to perform clustering?
Q3. For two runs of K-Mean clustering is it expected to get same clustering results?
Q4. Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Means?
Q5. How can Clustering (Unsupervised Learning) be used to improve the accuracy of Linear Regression model (Supervised Learning):

  1. Creating different models for different cluster groups.

  2. Creating an input feature for cluster ids as an ordinal variable.

  3. Creating an input feature for cluster centroids as a continuous variable.

  4. Creating an input feature for cluster size as a continuous variable

Q6. In which of the following cases will K-Means clustering fail to give good results?

  1. Data points with outliers

  2. Data points with different densities

  3. Data points with round shapes

  4. Data points with non-convex shapes

Q7. What should be the best choice of no. of clusters based on the following results:


Q8. Which of the following is/are valid iterative strategy for treating missing values before clustering analysis?
Q9. If two variables V1 and V2, are used for clustering. Which of the following is true for K means clustering with k =3?

  1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line

  2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line

Q10. Feature scaling is an important step before applying k-means algorithm. What is the reason behind this?
Q11. Which of the following method is used for finding optimal # of clusters in k-means algorithm?
Q12. Which of the following are the high and low bounds for the existence of F-Score?

