What is the primary purpose of following a machine learning workflow?
To eliminate the need for data preprocessing.
To minimize the amount of data required for training.
To facilitate a systematic approach to model development and evaluation.
To ensure that the model is deployed without testing.
Explanation: The machine learning workflow provides a systematic approach to developing, evaluating, and deploying models. This structure helps ensure that all critical steps are followed, enhancing the quality and reliability of the outcomes.
When constructing a multi-layer perceptron (MLP), which of the following activation functions is commonly used to introduce non-linearity into the model?
Softmax
Sigmoid
ReLU
Linear
Explanation: Both the ReLU (Rectified Linear Unit) and Sigmoid functions introduce non-linearity into the model, which is essential for MLPs to learn complex patterns. ReLU is particularly favored in hidden layers due to its ability to mitigate the vanishing gradient problem, while Sigmoid is often used in the output layer for binary classification tasks.
Which of the following scenarios would be considered a Multi-Label Classification problem?
Predicting whether a transaction is "Fraudulent" or "Legitimate."
Classifying an image of a fruit as either an "Apple," "Banana," or "Cherry."
Estimating the exact temperature of a room based on sensor data.
Assigning a news article to the categories "Politics," "Economy," and "Middle East" simultaneously.
Explanation: Assigning multiple categories to a single article is a multi-label problem because the categories are not mutually exclusive; an article about a political summit regarding oil prices would naturally fall into both "Politics" and "Economy." Options A and B are binary and multi-class classification, respectively (mutually exclusive). Option D is a regression task.
In a random forest for multi-class customer segmentation, feature selection benefits are observed. Which benefit is most pronounced when deploying the model in a resource-constrained environment?
Lower memory usage for storing simpler trees
Reduced training time due to fewer features considered per split
Elimination of irrelevant features improving interpretability
Increased robustness to noisy features
Explanation: max_features < total features reduces computations per split significantly (especially with many features), speeding up training and prediction in resource-limited settings while maintaining accuracy via ensemble averaging.
Following guidelines for building MLPs in a credit risk assessment model using customer financial data, which practices ensure robustness? (Select all that apply)
Hyperparameter tuning via grid search
Early stopping to prevent overfitting
Regularization techniques like L2 penalty
Using fixed learning rates without adaptation
Explanation: Early stopping halts training when validation performance degrades to prevent overfitting, hyperparameter tuning via grid search optimizes model settings, and regularization techniques like L2 penalty reduce model complexity for better generalization in credit risk models.
When configuring a machine learning toolset for edge deployment on IoT devices, which hardware requirements are essential?
At least 8GB RAM for model inference
High-core CPU like Ryzen 9 7950X
NVIDIA Jetson with TensorRT support
64GB VRAM GPUs
Explanation: Edge ML demands low-power hardware like Jetson modules with TensorRT for optimized inference on constrained RAM, prioritizing efficiency over high-end server specs unsuitable for battery- powered devices.
When training a classification model, what is the primary purpose of a "Validation Set" as opposed to a "Test Set"?
The validation set is used to tune hyperparameters, such as the value of 'k' in k-NN or the learning rate.
The validation set is used for final performance reporting after the model is fully trained.
The validation set is used to check for data leakage between the features and the target labels.
The validation set provides the data for the initial gradient descent updates.
Explanation: The training set is used to learn the weights. The validation set is used during the development phase to compare different model configurations and tune hyperparameters without biasing the model toward the test set. The test set is held back until the very end to provide an unbiased estimate of how the final model will perform on completely unseen data.
Scenario: Healthcare data with patient trajectories forming spiral recovery patterns. k-Means silhouette=0.42 (poor). Build hierarchical agglomerative clustering— which linkage preserves spiral structure best?
Complete linkage
Average linkage
Single linkage
Ward linkage (minimum variance)
Explanation: Single linkage excels at chaining interconnected points in non-convex manifolds like spirals, detecting elongated/density-varying clusters k-Means misses. Ward enforces compact spheres;
complete/average conservative merging splits chains prematurely.
What is a "Machine Learning Pipeline"?
A series of automated steps that take raw data and turn it into a trained, deployable model.
A specific type of GPU used for training.
A dataset consisting only of images.
A physical pipe that carries data between servers.
Explanation: An ML pipeline automates the workflow of an AI project, typically including data collection, cleaning, feature engineering, model training, evaluation, and deployment. This ensures that the process is repeatable and scalable.
During the building of an SVM model for regression in a manufacturing scenario to predict machine failure times based on sensor readings, you encounter high-dimensional data with correlated features. What step in the process would involve using principal component analysis (PCA) to reduce dimensionality before fitting the SVR?
During the kernel selection phase to modify the RBF function
As part of feature engineering prior to model training
In the post-processing stage to interpret model coefficients
After hyperparameter tuning but before final evaluation
Explanation: Using principal component analysis (PCA) to reduce dimensionality is a key step in feature engineering prior to model training, especially for high-dimensional data with correlated features in scenarios like predicting machine failure times, to improve computational efficiency and model accuracy in SVM regression.
Consider a financial institution where an AI model for fraud detection has achieved high accuracy in testing, but the team must finalize it by addressing potential biases identified in the feature importance analysis. What advanced technique should be employed to ensure the model's fairness and reliability before sign-off? (Select the best answer)
Conduct adversarial debiasing by training a secondary model to predict protected attributes and adjust accordingly
Ignore minor biases since high accuracy overrides fairness concerns in production environments
Use only synthetic data generation to balance the dataset without evaluating impact on model performance
Rely solely on post-hoc explanations like SHAP values without modifying the model
Explanation: Conducting adversarial debiasing by training a secondary model to predict protected attributes and adjust accordingly is essential for finalizing a model with fairness, as it actively removes correlations with sensitive features during training, unlike merely interpreting explanations or using unbalanced
synthetic data, which may not fully address embedded biases.
For a linear model with multiple parameters predicting traffic flow from time of day, weather, and events, using Y = β0 + β1x1 + β2x2 + β3x3 + ε, what challenge emerges with correlated x1 and x2?
Inflated variance of β estimates
Heteroscedasticity
Biased estimates of β
Nonlinearity in residuals
Explanation: Multicollinearity between predictors like time and weather increases the standard errors of coefficients, making it hard to isolate individual effects in urban planning scenarios.
What does the Mean Absolute Error (MAE) measure in a regression model?
The average of the absolute differences between predicted and actual values.
The ratio of the total errors to the number of observations.
The average of the squared differences between predicted and actual values.
The percentage of variance explained by the model.
Explanation: The Mean Absolute Error (MAE) measures the average absolute differences between predicted values and actual values, providing a straightforward interpretation of prediction accuracy.
When planning a machine learning workflow, which of the following guidelines should be prioritized? (Select two)
Define clear objectives and success metrics for the project.
Use the most complex algorithms available from the start.
Involve stakeholders to ensure alignment with business goals.
Limit the use of data preprocessing techniques to speed up the process.
Explanation: Defining clear objectives and success metrics is crucial for guiding the project, while involving stakeholders ensures that the machine learning initiative aligns with overall business goals. This collaborative approach enhances the likelihood of success.
In applying hierarchical clustering to a spiral dataset visualized in 2D (5,000 points), the algorithm uses Ward's linkage. How does this differ from k-means on the same data, and what visualization aids interpretation? (Select two)
Handles arbitrary shapes better
Elbow point for cut height
Builds bottom-up mergers unlike k-means partitions
Dendrogram shows merger levels
Explanation: Hierarchical Clustering merges points bottom-up, capturing spiral structures via linkage like Ward's which minimizes variance increase. Dendrogram visualizes the hierarchy, allowing cuts at heights to form clusters. It handles arbitrary shapes like spirals, where k-means fails due to centroid-based assumptions.
In a facial recognition system, high bias is observed as the model consistently underperforms on both training and validation sets. What are the potential causes and remedies in a scenario with limited diverse training data? (Select all that apply)
Increasing model complexity with deeper architectures
Too much regularization; reduce it to allow better fitting
Insufficient model capacity; switch to a more complex algorithm
Data augmentation techniques to increase variety
Explanation: High bias from underperformance on both sets can stem from insufficient model capacity, remedied by switching to a more complex algorithm or increasing model complexity with deeper architectures. Limited diverse data can be addressed with data augmentation techniques to simulate variety. Too much regularization exacerbates bias by overly simplifying the model.
In the context of building an MLP, which of the following is a common strategy for selecting the number of hidden layers?
Use a fixed number based on the input size
Experiment with different architectures and validate performance
Always use a single hidden layer
Select the number based on the output classes
Explanation: A common strategy for selecting the number of hidden layers in an MLP is to experiment with different architectures and validate their performance on a validation dataset. This empirical approach allows practitioners to find the optimal architecture that balances complexity and generalization.