Non-Functional Requirements in AI/ML Applications

airflow, Algorithm, apache, Autonomous, cloud, cpu, Optimization, Platforms, Spark, Workflow

Current image: spiral lights inside a contemporary building

1. Performance in AI/ML

Model Accuracy/Performance Metrics

Specify target metrics like precision (minimizing false positives), recall (minimizing false negatives), F1-score (harmonic mean of precision and recall), AUC (Area Under the ROC Curve for binary classification), RMSE (Root Mean Squared Error for regression), and acceptable error rates. Define how these metrics will be measured (e.g., on specific datasets, with cross-validation) and under what conditions (e.g., varying data distributions, class imbalances).

Example: “The object detection model shall achieve a mean Average Precision (mAP) of at least 0.8 on a held-out validation set with a 95% confidence interval, evaluated using the COCO evaluation metrics.”
Scikit-learn Model Evaluation
Inference Latency

The time it takes for the deployed AI/ML model to generate a prediction or output after receiving input data. Define acceptable latency for different use cases, considering real-time applications (e.g., autonomous driving, fraud detection) vs. near real-time (e.g., recommendations) vs. batch processing. Consider the impact of model complexity, input data size, and underlying hardware (CPU, GPU, specialized accelerators).

Example: “The fraud detection model deployed for real-time transaction monitoring shall have an average inference latency of no more than 50 milliseconds to avoid impacting user experience.”
Training Time

The time required to train or fine-tune the AI/ML model. Specify acceptable training times based on resource availability (computational power, data storage), development cycles, and the frequency of retraining required. Consider the trade-off between training time and model performance.

Example: “The initial training of the large language model on the full multi-billion token dataset shall complete within 7 days using the allocated cluster of 128 GPUs.”
Scalability of Training and Inference

The ability of the system to handle increasing data volumes for training and higher request loads for inference without significant performance degradation or increased cost. Define how the system should scale (horizontally by adding more instances or vertically by increasing resources per instance) and the expected performance (e.g., throughput, latency) under increased load. Consider cloud infrastructure, distributed computing frameworks (e.g., Spark, Dask), and model optimization techniques.

Example: “The image recognition inference service shall be able to scale horizontally to handle a 10x increase in concurrent requests during peak hours with no more than a 20% increase in average latency by automatically provisioning additional GPU-backed instances.”
AWS Machine Learning Scalability
Resource Efficiency (Computational Cost)

The amount of computational resources (CPU, GPU, memory, storage, network bandwidth) required for training and inference. Set limits on resource consumption to manage operational costs, energy consumption, and environmental impact. Consider model optimization techniques (e.g., quantization, pruning, knowledge distillation) and hardware selection.

Example: “The deployed mobile vision model’s inference process shall not exceed an average CPU utilization of 60% and memory footprint of 500MB to ensure smooth operation on target devices.”

2. Reliability in AI/ML

Model Robustness

The ability of the model to maintain performance and provide correct predictions even when faced with noisy, incomplete, out-of-distribution, or adversarial input data. Define the types of data variations or attacks the model should be robust against (e.g., image perturbations, spelling errors, synonym substitutions, specific adversarial attack algorithms) and the acceptable degradation in performance (e.g., drop in accuracy, increase in error rate).

Example: “The autonomous driving perception system shall maintain an object detection accuracy of at least 95% even in adverse weather conditions like heavy rain or fog, and shall be resilient to common sensor noise.”
Adversarial Examples in Machine Learning (Research Paper)
Data Quality and Integrity

Ensuring the training and inference data is accurate, consistent, free from errors and biases, and adheres to defined schemas and formats. Define data validation rules (e.g., range checks, type checks, consistency checks), data lineage requirements (tracking data sources and transformations), and mechanisms for detecting and handling data anomalies (e.g., missing values, outliers).

Example: “The customer churn prediction model shall be trained on data with less than 0.5% missing values, and all customer demographic features shall adhere to predefined valid categories and formats.”
Reproducibility

The ability to consistently obtain similar results (model weights, evaluation metrics) when the training process is repeated with the same data, hyperparameters, and random seeds. Specify the level of reproducibility required, considering factors like the stochastic nature of training algorithms, variations in hardware, and software dependencies. Implement mechanisms for fixing random seeds and managing the training environment.

Example: “The model training process shall be reproducible with a variance of less than 0.1 in the final F1-score across three independent runs conducted on the same hardware and software configuration.”
Concept Drift Handling

The ability of the system to detect and adapt to changes in the underlying data distribution or relationships over time, which can lead to model performance degradation. Define how concept drift will be monitored (e.g., tracking statistical properties of input data and model predictions), the triggers for retraining or model updates (e.g., significant drop in performance metrics, statistical drift exceeding a threshold), and the acceptable performance degradation before adaptation.

Example: “The recommendation system shall continuously monitor user interaction patterns and trigger a model retraining process if the click-through rate drops by more than 5% over a rolling 7-day window.”
TensorFlow Data Validation for Drift Detection

3. Usability in AI/ML

Explainability (Interpretability)

The degree to which humans can understand the reasoning behind the AI/ML model’s predictions or decisions. Specify the level of explainability required for different user groups (e.g., end-users, data scientists, domain experts) and use cases (e.g., low-stakes recommendations vs. high-stakes medical diagnoses). Consider different explainability techniques (e.g., feature importance, local interpretable model-agnostic explanations (LIME), Shapley Additive exPlanations (SHAP), attention mechanisms, counterfactual explanations).

Example: “For high-stakes loan approval decisions, the system shall provide feature importance scores indicating the top three factors that most positively or negatively influenced the approval outcome, along with their values for the specific applicant.”
Interpretable Machine Learning Book
Trust and Transparency

Building user confidence in the AI/ML system by providing insights into its behavior, limitations, and potential biases. Include requirements for clear model documentation, error reporting mechanisms (explaining why a prediction might be wrong), and mechanisms for user feedback and recourse.

Example: “The content moderation system shall provide users with a confidence score associated with each content flag and allow users to appeal decisions with clear explanations of the appeal process.”
Fairness and Bias Mitigation

Ensuring that the AI/ML system does not discriminate unfairly against certain groups based on sensitive attributes (e.g., race, gender, religion). Define fairness metrics (e.g., demographic parity, equal opportunity, predictive parity) and acceptable thresholds for disparity. Specify methods for bias detection (in data and models) and mitigation (e.g., data re-balancing, adversarial debiasing, post-processing).

Example: “The hiring recommendation system shall achieve equal opportunity for gender, with no statistically significant difference in positive recommendation rates between male and female candidates with similar qualifications.”
Fairlearn: A Python Package for Fairness in AI
User Interaction with AI

Designing intuitive and effective user interfaces for interacting with AI-powered features. Consider different interaction modalities (e.g., natural language interfaces, visual dashboards, interactive visualizations of AI outputs). Focus on clarity, ease of use, and providing relevant information to the user. Adhere to user experience (UX) principles and conduct user testing.

Example: “The AI-powered coding assistant shall provide code suggestions in real-time within the IDE and offer clear explanations for the suggested code snippets.”

4. Security in AI/ML

Adversarial Attack Resilience

Protecting the AI/ML model against malicious inputs (adversarial examples) designed to fool it into making incorrect predictions. Define the types of adversarial attacks the model should be resilient to (e.g., fast gradient sign method (FGSM), projected gradient descent (PGD), specific real-world attack scenarios) and the acceptable performance degradation (e.g., maximum allowable drop in accuracy) under such attacks. Implement defense mechanisms (e.g., adversarial training, input sanitization, gradient masking).

Example: “The autonomous vehicle’s object detection system shall maintain a critical object detection accuracy of at least 98% even when subjected to realistic adversarial weather conditions and minor physical perturbations on road signs.”
Google AI Blog on Adversarial Robustness
Data Privacy and Security

Protecting the sensitive data used for training and inference from unauthorized access, disclosure, modification, or destruction, adhering to relevant data privacy regulations (e.g., GDPR, CCPA, HIPAA). Implement data anonymization techniques (e.g., differential privacy, federated learning), encryption (at rest and in transit), strict access controls, and secure data handling practices throughout the AI/ML lifecycle.

Example: “All medical image data used for training the diagnostic AI shall be anonymized using differential privacy with a defined epsilon budget to protect patient confidentiality.”
Data Privacy Studio
Model Security

Protecting the trained AI/ML models (model weights, architecture) from unauthorized access, copying, reverse engineering (to extract sensitive information or intellectual property), or tampering. Implement model encryption, access controls, watermarking techniques, and secure deployment environments.

Example: “The proprietary trading algorithm model shall be encrypted using AES-256 encryption and deployed in a secure enclave with strict access controls to prevent unauthorized access to the model weights.”
Bias Introduction through Attacks

Preventing malicious actors from manipulating training data or model update processes to intentionally introduce harmful biases into the AI/ML system. Implement robust data validation and monitoring mechanisms, anomaly detection for data poisoning attacks, and secure and authenticated model update channels.

5. Maintainability in AI/ML

Model Versioning and Management

Tracking different versions of models, datasets, hyperparameters, training code, and deployment configurations throughout the AI/ML lifecycle. Implement a robust version control system (e.g., Git for code, specialized MLflow tracking for artifacts), along with clear documentation of each version and the rationale for changes. Facilitate easy rollback to previous versions.

Example: “The system shall use MLflow to track all model training runs, automatically logging parameters, metrics, and artifacts, allowing for easy comparison and rollback to previous model versions.”
MLflow: An Open Source Machine Learning Platform
Pipeline Maintainability

Ensuring the end-to-end AI/ML pipeline (data ingestion, preprocessing, feature engineering, model training, evaluation, deployment, monitoring) is well-documented, modular, and easy to understand, modify, and update. Follow software engineering best practices for pipeline development, including clear separation of concerns, reusable components, and automated testing.

Example: “The data preprocessing steps in the NLP pipeline shall be implemented as independent, reusable components using a workflow management system like Apache Airflow, with clear input and output specifications for each component.”
Monitoring and Logging of Model Performance and Behavior

Continuously monitoring the performance of deployed AI/ML models (using relevant metrics defined in performance NFRs), detecting performance degradation, and logging predictions, errors, and relevant input features for debugging and analysis. Implement alerting mechanisms to notify stakeholders of performance issues or anomalies. Utilize specialized AI monitoring tools.

Example: “The fraud detection system shall continuously monitor the model’s precision and recall in production and trigger an alert if either metric drops by more than 10% compared to the baseline performance on a recent validation set.”
whylogs: AI Data Logging and Monitoring
Retrainability

The ease with which the AI/ML model can be retrained with new data, updated with new knowledge, or fine-tuned for specific use cases. Design the training process to be efficient, automated (where possible), and configurable. Consider the infrastructure and resources required for retraining, and the strategies for continuous learning or periodic updates.

Challenges in Defining NFRs for AI/ML Applications

Novelty of the Field: AI/ML is a rapidly evolving domain, and established best practices for defining and measuring NFRs are still maturing.
Complexity and Black-Box Nature: Understanding and quantifying attributes like explainability, robustness, and fairness can be challenging due to the intricate nature of some AI/ML models.
Data Dependency: The performance and behavior of AI/ML systems are heavily dependent on the quality, quantity, and distribution of the data they are trained on, making it difficult to define static NFRs.
Evolving Requirements: As the data changes, new threats emerge, or user expectations evolve, the NFRs for AI/ML systems may need to be dynamically adapted.
Ethical Considerations: Defining and measuring ethical NFRs like fairness, transparency, and accountability involves complex societal and philosophical considerations.
Lack of Standardized Metrics and Tools: Standardized metrics and widely adopted tools for measuring some AI/ML-specific NFRs are still under development.
Interdisciplinary Collaboration: Defining effective NFRs requires close collaboration between data scientists, software engineers, domain experts, ethicists, and users, which can present communication and alignment challenges.

Recent Efforts in Defining NFRs for AI/ML Applications

Focus on Ethical AI and Fairness: Increased attention on defining and measuring fairness (e.g., through metrics like demographic parity, equal opportunity), transparency (e.g., through explainability techniques), and accountability in AI/ML systems, driven by regulatory interest (e.g., EU AI Act) and societal concerns about bias and discrimination.
Standardization Initiatives: Growing recognition of the need for standardized approaches to define and measure AI/ML NFRs. Organizations like ISO/IEC (considering extensions to ISO/IEC 25010), IEEE, and various industry consortia are working on frameworks and guidelines to address the unique quality attributes of AI.
Explainability and Interpretability Research and Tools: Continued advancements in research and development of methods and tools for making AI/ML models more understandable (e.g., SHAP, LIME, attention mechanisms, causal inference techniques), with a focus on practical applications in various domains.
Adversarial Robustness Techniques and Benchmarking: Ongoing research in developing more effective defense mechanisms against adversarial attacks (e.g., robust training methods, certified defenses) and the creation of standardized benchmarks and evaluation metrics for assessing model robustness.
Monitoring and Drift Detection Platforms: Development and adoption of specialized MLOps platforms and tools for comprehensive monitoring of deployed AI/ML models, including sophisticated techniques for detecting various types of data and concept drift, and triggering automated retraining or alerts.
Integration with MLOps Practices and Governance Frameworks: Efforts to seamlessly integrate the definition, measurement, and monitoring of NFRs into the broader MLOps lifecycle and the establishment of governance frameworks for ensuring responsible AI development and deployment.
Human-Centered AI Evaluation: Increased emphasis on evaluating AI/ML systems from a human-centered perspective, including assessing usability, trust, and the impact on human decision-making and well-being. This involves user studies and qualitative feedback.
Development of Metrics and Benchmarks for New NFRs: Ongoing efforts to define and standardize metrics and benchmarks for emerging AI-specific NFRs, such as explainability metrics (e.g., comprehensibility, fidelity), robustness metrics (e.g., robustness curves), and fairness metrics (beyond basic group fairness).

Latest Posts

Non-Functional Requirements in AI/ML Applications

1. Performance in AI/ML

Model Accuracy/Performance Metrics

Inference Latency

Training Time

Scalability of Training and Inference

Resource Efficiency (Computational Cost)

2. Reliability in AI/ML

Model Robustness

Data Quality and Integrity

Reproducibility

Concept Drift Handling

3. Usability in AI/ML

Explainability (Interpretability)

Trust and Transparency

Fairness and Bias Mitigation

User Interaction with AI

4. Security in AI/ML

Adversarial Attack Resilience

Data Privacy and Security

Model Security

Bias Introduction through Attacks

5. Maintainability in AI/ML

Model Versioning and Management

Pipeline Maintainability

Monitoring and Logging of Model Performance and Behavior

Retrainability

Challenges in Defining NFRs for AI/ML Applications

Recent Efforts in Defining NFRs for AI/ML Applications

Like this:

Related Posts

Leave a ReplyCancel reply

Non-Functional Requirements in AI/ML Applications

1. Performance in AI/ML

Model Accuracy/Performance Metrics

Inference Latency

Training Time

Scalability of Training and Inference

Resource Efficiency (Computational Cost)

2. Reliability in AI/ML

Model Robustness

Data Quality and Integrity

Reproducibility

Concept Drift Handling

3. Usability in AI/ML

Explainability (Interpretability)

Trust and Transparency

Fairness and Bias Mitigation

User Interaction with AI

4. Security in AI/ML

Adversarial Attack Resilience

Data Privacy and Security

Model Security

Bias Introduction through Attacks

5. Maintainability in AI/ML

Model Versioning and Management

Pipeline Maintainability

Monitoring and Logging of Model Performance and Behavior

Retrainability

Challenges in Defining NFRs for AI/ML Applications

Recent Efforts in Defining NFRs for AI/ML Applications

Share this:

Like this:

Related Posts

Leave a ReplyCancel reply