Question: 1159


What is the primary purpose of following a machine learning workflow?


  1. To eliminate the need for data preprocessing.

  2. To minimize the amount of data required for training.

  3. To facilitate a systematic approach to model development and evaluation.

  4. To ensure that the model is deployed without testing.


Answer: C


Explanation: The machine learning workflow provides a systematic approach to developing, evaluating, and deploying models. This structure helps ensure that all critical steps are followed, enhancing the quality and reliability of the outcomes.


Question: 1160


When constructing a multi-layer perceptron (MLP), which of the following activation functions is commonly used to introduce non-linearity into the model?


  1. Softmax

  2. Sigmoid

  3. ReLU

  4. Linear


Answer: B,C


Explanation: Both the ReLU (Rectified Linear Unit) and Sigmoid functions introduce non-linearity into the model, which is essential for MLPs to learn complex patterns. ReLU is particularly favored in hidden layers due to its ability to mitigate the vanishing gradient problem, while Sigmoid is often used in the output layer for binary classification tasks.


Question: 1161


Which of the following scenarios would be considered a Multi-Label Classification problem?


  1. Predicting whether a transaction is "Fraudulent" or "Legitimate."

  2. Classifying an image of a fruit as either an "Apple," "Banana," or "Cherry."

  3. Estimating the exact temperature of a room based on sensor data.

  4. Assigning a news article to the categories "Politics," "Economy," and "Middle East" simultaneously.


Answer: D


Explanation: Assigning multiple categories to a single article is a multi-label problem because the categories are not mutually exclusive; an article about a political summit regarding oil prices would naturally fall into both "Politics" and "Economy." Options A and B are binary and multi-class classification, respectively (mutually exclusive). Option D is a regression task.


Question: 1162


In a random forest for multi-class customer segmentation, feature selection benefits are observed. Which benefit is most pronounced when deploying the model in a resource-constrained environment?


  1. Lower memory usage for storing simpler trees

  2. Reduced training time due to fewer features considered per split

  3. Elimination of irrelevant features improving interpretability

  4. Increased robustness to noisy features


Answer: B


Explanation: max_features < total features reduces computations per split significantly (especially with many features), speeding up training and prediction in resource-limited settings while maintaining accuracy via ensemble averaging.


Question: 1163


Following guidelines for building MLPs in a credit risk assessment model using customer financial data, which practices ensure robustness? (Select all that apply)


  1. Hyperparameter tuning via grid search

  2. Early stopping to prevent overfitting

  3. Regularization techniques like L2 penalty

  4. Using fixed learning rates without adaptation


Answer: A,B,C


Explanation: Early stopping halts training when validation performance degrades to prevent overfitting, hyperparameter tuning via grid search optimizes model settings, and regularization techniques like L2 penalty reduce model complexity for better generalization in credit risk models.


Question: 1164


When configuring a machine learning toolset for edge deployment on IoT devices, which hardware requirements are essential?

  1. At least 8GB RAM for model inference

  2. High-core CPU like Ryzen 9 7950X

  3. NVIDIA Jetson with TensorRT support

  4. 64GB VRAM GPUs


Answer: A,C


Explanation: Edge ML demands low-power hardware like Jetson modules with TensorRT for optimized inference on constrained RAM, prioritizing efficiency over high-end server specs unsuitable for battery- powered devices.


Question: 1165


When training a classification model, what is the primary purpose of a "Validation Set" as opposed to a "Test Set"?


  1. The validation set is used to tune hyperparameters, such as the value of 'k' in k-NN or the learning rate.

  2. The validation set is used for final performance reporting after the model is fully trained.

  3. The validation set is used to check for data leakage between the features and the target labels.

  4. The validation set provides the data for the initial gradient descent updates.


Answer: A


Explanation: The training set is used to learn the weights. The validation set is used during the development phase to compare different model configurations and tune hyperparameters without biasing the model toward the test set. The test set is held back until the very end to provide an unbiased estimate of how the final model will perform on completely unseen data.


Question: 1166


Scenario: Healthcare data with patient trajectories forming spiral recovery patterns. k-Means silhouette=0.42 (poor). Build hierarchical agglomerative clustering— which linkage preserves spiral structure best?


  1. Complete linkage

  2. Average linkage

  3. Single linkage

  4. Ward linkage (minimum variance)


Answer: C


Explanation: Single linkage excels at chaining interconnected points in non-convex manifolds like spirals, detecting elongated/density-varying clusters k-Means misses. Ward enforces compact spheres;

complete/average conservative merging splits chains prematurely.


Question: 1167


What is a "Machine Learning Pipeline"?


  1. A series of automated steps that take raw data and turn it into a trained, deployable model.

  2. A specific type of GPU used for training.

  3. A dataset consisting only of images.

  4. A physical pipe that carries data between servers.


Answer: A


Explanation: An ML pipeline automates the workflow of an AI project, typically including data collection, cleaning, feature engineering, model training, evaluation, and deployment. This ensures that the process is repeatable and scalable.


Question: 1168


During the building of an SVM model for regression in a manufacturing scenario to predict machine failure times based on sensor readings, you encounter high-dimensional data with correlated features. What step in the process would involve using principal component analysis (PCA) to reduce dimensionality before fitting the SVR?


  1. During the kernel selection phase to modify the RBF function

  2. As part of feature engineering prior to model training

  3. In the post-processing stage to interpret model coefficients

  4. After hyperparameter tuning but before final evaluation


Answer: B


Explanation: Using principal component analysis (PCA) to reduce dimensionality is a key step in feature engineering prior to model training, especially for high-dimensional data with correlated features in scenarios like predicting machine failure times, to improve computational efficiency and model accuracy in SVM regression.


Question: 1169


Consider a financial institution where an AI model for fraud detection has achieved high accuracy in testing, but the team must finalize it by addressing potential biases identified in the feature importance analysis. What advanced technique should be employed to ensure the model's fairness and reliability before sign-off? (Select the best answer)

  1. Conduct adversarial debiasing by training a secondary model to predict protected attributes and adjust accordingly

  2. Ignore minor biases since high accuracy overrides fairness concerns in production environments

  3. Use only synthetic data generation to balance the dataset without evaluating impact on model performance

  4. Rely solely on post-hoc explanations like SHAP values without modifying the model


Answer: A


Explanation: Conducting adversarial debiasing by training a secondary model to predict protected attributes and adjust accordingly is essential for finalizing a model with fairness, as it actively removes correlations with sensitive features during training, unlike merely interpreting explanations or using unbalanced

synthetic data, which may not fully address embedded biases.


Question: 1170


For a linear model with multiple parameters predicting traffic flow from time of day, weather, and events, using Y = β0 + β1x1 + β2x2 + β3x3 + ε, what challenge emerges with correlated x1 and x2?


  1. Inflated variance of β estimates

  2. Heteroscedasticity

  3. Biased estimates of β

  4. Nonlinearity in residuals


Answer: A


Explanation: Multicollinearity between predictors like time and weather increases the standard errors of coefficients, making it hard to isolate individual effects in urban planning scenarios.


Question: 1171


What does the Mean Absolute Error (MAE) measure in a regression model?


  1. The average of the absolute differences between predicted and actual values.

  2. The ratio of the total errors to the number of observations.

  3. The average of the squared differences between predicted and actual values.

  4. The percentage of variance explained by the model.


Answer: A


Explanation: The Mean Absolute Error (MAE) measures the average absolute differences between predicted values and actual values, providing a straightforward interpretation of prediction accuracy.

Question: 1172


When planning a machine learning workflow, which of the following guidelines should be prioritized? (Select two)


  1. Define clear objectives and success metrics for the project.

  2. Use the most complex algorithms available from the start.

  3. Involve stakeholders to ensure alignment with business goals.

  4. Limit the use of data preprocessing techniques to speed up the process.


Answer: A,C


Explanation: Defining clear objectives and success metrics is crucial for guiding the project, while involving stakeholders ensures that the machine learning initiative aligns with overall business goals. This collaborative approach enhances the likelihood of success.


Question: 1173


In applying hierarchical clustering to a spiral dataset visualized in 2D (5,000 points), the algorithm uses Ward's linkage. How does this differ from k-means on the same data, and what visualization aids interpretation? (Select two)


  1. Handles arbitrary shapes better

  2. Elbow point for cut height

  3. Builds bottom-up mergers unlike k-means partitions

  4. Dendrogram shows merger levels


Answer: A,C,D


Explanation: Hierarchical Clustering merges points bottom-up, capturing spiral structures via linkage like Ward's which minimizes variance increase. Dendrogram visualizes the hierarchy, allowing cuts at heights to form clusters. It handles arbitrary shapes like spirals, where k-means fails due to centroid-based assumptions.


Question: 1174


In a facial recognition system, high bias is observed as the model consistently underperforms on both training and validation sets. What are the potential causes and remedies in a scenario with limited diverse training data? (Select all that apply)


  1. Increasing model complexity with deeper architectures

  2. Too much regularization; reduce it to allow better fitting

  3. Insufficient model capacity; switch to a more complex algorithm

  4. Data augmentation techniques to increase variety


Answer: A,C,D


Explanation: High bias from underperformance on both sets can stem from insufficient model capacity, remedied by switching to a more complex algorithm or increasing model complexity with deeper architectures. Limited diverse data can be addressed with data augmentation techniques to simulate variety. Too much regularization exacerbates bias by overly simplifying the model.


Question: 1175


In the context of building an MLP, which of the following is a common strategy for selecting the number of hidden layers?


  1. Use a fixed number based on the input size

  2. Experiment with different architectures and validate performance

  3. Always use a single hidden layer

  4. Select the number based on the output classes


Answer: B


Explanation: A common strategy for selecting the number of hidden layers in an MLP is to experiment with different architectures and validate their performance on a validation dataset. This empirical approach allows practitioners to find the optimal architecture that balances complexity and generalization.