close

Be The Match

Apply for this job

AI/ML Engineer (Finance)



Job Description

The AI/ML Engineer will play a key role within the AI Center of Excellence (CoE), focusing on building, scaling, and maintaining robust ML and GenAI operational infrastructure. This position is responsible for developing and automating end-to-end machine learning pipelines, deploying models into production, and ensuring their performance and stability over time. The ideal candidate is a hands-on engineer with strong experience in MLOps, LLMOps, and cloud-native tools, and a passion for reliable, scalable, and efficient AI systems.

ACCOUNTABILITIES: (The primary functions, scope and responsibilities of the role)

Engineering and Operations:

  • Develop, deploy, and maintain production-grade ML/GenAI pipelines using AWS cloud-native and open-source MLOps tools.
  • Automate model training, evaluation, testing, deployment, and monitoring workflows.
  • Implement LLMOps practices for prompt versioning, model tracking, and continuous evaluation of GenAI systems.
  • Integrate ML systems with CI/CD pipelines and infrastructure-as-code tools.
  • Support model inference at scale via APIs, containers, and microservices.
  • Work closely with data engineering to ensure high-quality, real-time, and batch data availability for ML workflows.
  • Ensure high availability, reliability, and performance of AI services in production environments.
  • Maintain robust monitoring and observability on AWS, Snowflake, SalesForce and Oracle ecosystems.
  • Implement feature stores and data versioning systems to ensure reproducible ML experiments and deployments.
  • Deploy and optimize vector databases and embedding models for semantic search and RAG applications.
  • Configure GPU enabled cloud infrastructure and implement monitoring solutions to optimize resource utilization, costs and performance for ML training and inference workloads.
  • Establish automated model validation, testing, and rollback procedures for safe production deployments.

Tooling and Infrastructure:

  • Build and manage model registries, feature stores, and metadata tracking systems.
  • Leverage containerization (e.g., Docker) and orchestration (e.g,Kubernetes, Airflow, Kubeflow) for scalable deployment. Implement role-based access control, auditing, and governance for ML infrastructure.
  • Manage cost-effective cloud infrastructure using AWS.
  • Build and maintain data quality monitoring systems with automated alerting for data drift and anomalies.
  • Implement cost optimization strategies including auto-scaling, spot instances, and resource right-sizing for ML workloads.

Collaboration and Support:

  • Partner with data engineers, data scientists, ML engineers, architects, software engineers, infrastructure and security teams to support scalable and efficient AI/ML workflows.
  • Contribute to incident response, performance tuning, and continuous improvement of ML pipelines.
  • Provide guidance and documentation to promote reproducibility and best practices across teams.
  • Work as part of an agile development team and participate in planning and code reviews.

REQUIRED QUALIFICATIONS: (Minimum qualifications needed for this position including education, experience, certification, knowledge and/or physical requirements)

Knowledge of:

  • Cloud-native AI/ML development with AWS.
  • MLOps/LLMOps frameworks and lifecycle tools on AWS.
  • Monitoring and observability platforms on AWS.
  • ML model deployment strategies (e.g., batch, real-time, streaming).
  • Feature stores and data versioning tools on AWS and Snowflake
  • Model serving frameworks like AWS Sagemaker and AWS Bedrock for scalable inference deployment.
  • Vector databases and embedding deployment (e.g., Pinecone, Weaviate, FAISS, pgvector) for LLM and RAG applications.
  • LLMOps-specific tools including prompt management platforms and LLM serving optimization on AWS.
  • Docker registries and artifact management.

Required Skills and Abilities:

  • Strong Python programming and scripting skills.
  • Hands-on experience deploying and managing ML/GenAI models in production.
  • Experience with Docker, Kubernetes, and workflow orchestration tools like Airflow or Kubeflow.
  • Proficiency in infrastructure-as-code tools (e.g., Terraform, CloudFormation).
  • Ability to debug, troubleshoot, and optimize AI/ML pipelines and systems.
  • Comfortable working in agile teams and collaborating cross-functionally.
  • Proven ability to automate processes and build reusable ML operational frameworks.
  • Experience with A/B testing frameworks and canary deployments for ML models in production environments.
  • Knowledge of GPU resource management and optimization for training and inference workloads.
  • Understanding data pipeline quality monitoring, drift detection, and automated retraining triggers.
  • Experience with secrets management, role-based access control, and secure credential handling for ML systems.

Education and/or Experience:

  • Bachelor's degree in computer science, Engineering, or a related field (master's preferred).
  • 2-3 years of experience in ML engineering, DevOps, or MLOps roles.
  • Demonstrated experience managing production AI/ML workloads and systems.

PREFERRED QUALIFICATIONS: (Additional qualifications that may make a person even more effective in the role, but are not required for consideration)
  • Experience with LLMOps and GenAI pipeline monitoring.
  • Cloud certifications in AWS, Azure, or GCP.
  • Experience supporting AI applications in regulated industries (e.g.,healthcare, finance).
  • Contributions to open-source MLOps tools or infrastructure projects.
  • Experience with edge deployment and model optimization techniques (quantization, pruning, distillation).
  • Knowledge of compliance frameworks (SOC2, GDPR, HIPAA) and security best practices for AI/ML systems.
  • Experience with real-time streaming data pipelines (Kafka, Kinesis) and event-driven ML Architectures.
Responsibilities

The AI/ML Engineer will play a key role within the AI Center of Excellence (CoE), focusing on building, scaling, and maintaining robust ML and GenAI operational infrastructure. This position is responsible for developing and automating end-to-end machine learning pipelines, deploying models into production, and ensuring their performance and stability over time. The ideal candidate is a hands-on engineer with strong experience in MLOps, LLMOps, and cloud-native tools, and a passion for reliable, scalable, and efficient AI systems.

ACCOUNTABILITIES: (The primary functions, scope and responsibilities of the role)

Engineering and Operations:

  • Develop, deploy, and maintain production-grade ML/GenAI pipelines using AWS cloud-native and open-source MLOps tools.
  • Automate model training, evaluation, testing, deployment, and monitoring workflows.
  • Implement LLMOps practices for prompt versioning, model tracking, and continuous evaluation of GenAI systems.
  • Integrate ML systems with CI/CD pipelines and infrastructure-as-code tools.
  • Support model inference at scale via APIs, containers, and microservices.
  • Work closely with data engineering to ensure high-quality, real-time, and batch data availability for ML workflows.
  • Ensure high availability, reliability, and performance of AI services in production environments.
  • Maintain robust monitoring and observability on AWS, Snowflake, SalesForce and Oracle ecosystems.
  • Implement feature stores and data versioning systems to ensure reproducible ML experiments and deployments.
  • Deploy and optimize vector databases and embedding models for semantic search and RAG applications.
  • Configure GPU enabled cloud infrastructure and implement monitoring solutions to optimize resource utilization, costs and performance for ML training and inference workloads.
  • Establish automated model validation, testing, and rollback procedures for safe production deployments.

Tooling and Infrastructure:

  • Build and manage model registries, feature stores, and metadata tracking systems.
  • Leverage containerization (e.g., Docker) and orchestration (e.g,Kubernetes, Airflow, Kubeflow) for scalable deployment. Implement role-based access control, auditing, and governance for ML infrastructure.
  • Manage cost-effective cloud infrastructure using AWS.
  • Build and maintain data quality monitoring systems with automated alerting for data drift and anomalies.
  • Implement cost optimization strategies including auto-scaling, spot instances, and resource right-sizing for ML workloads.

Collaboration and Support:

  • Partner with data engineers, data scientists, ML engineers, architects, software engineers, infrastructure and security teams to support scalable and efficient AI/ML workflows.
  • Contribute to incident response, performance tuning, and continuous improvement of ML pipelines.
  • Provide guidance and documentation to promote reproducibility and best practices across teams.
  • Work as part of an agile development team and participate in planning and code reviews.

REQUIRED QUALIFICATIONS: (Minimum qualifications needed for this position including education, experience, certification, knowledge and/or physical requirements)

Knowledge of:

  • Cloud-native AI/ML development with AWS.
  • MLOps/LLMOps frameworks and lifecycle tools on AWS.
  • Monitoring and observability platforms on AWS.
  • ML model deployment strategies (e.g., batch, real-time, streaming).
  • Feature stores and data versioning tools on AWS and Snowflake
  • Model serving frameworks like AWS Sagemaker and AWS Bedrock for scalable inference deployment.
  • Vector databases and embedding deployment (e.g., Pinecone, Weaviate, FAISS, pgvector) for LLM and RAG applications.
  • LLMOps-specific tools including prompt management platforms and LLM serving optimization on AWS.
  • Docker registries and artifact management.

Required Skills and Abilities:

  • Strong Python programming and scripting skills.
  • Hands-on experience deploying and managing ML/GenAI models in production.
  • Experience with Docker, Kubernetes, and workflow orchestration tools like Airflow or Kubeflow.
  • Proficiency in infrastructure-as-code tools (e.g., Terraform, CloudFormation).
  • Ability to debug, troubleshoot, and optimize AI/ML pipelines and systems.
  • Comfortable working in agile teams and collaborating cross-functionally.
  • Proven ability to automate processes and build reusable ML operational frameworks.
  • Experience with A/B testing frameworks and canary deployments for ML models in production environments.
  • Knowledge of GPU resource management and optimization for training and inference workloads.
  • Understanding data pipeline quality monitoring, drift detection, and automated retraining triggers.
  • Experience with secrets management, role-based access control, and secure credential handling for ML systems.

Education and/or Experience:

  • Bachelor's degree in computer science, Engineering, or a related field (master's preferred).
  • 2-3 years of experience in ML engineering, DevOps, or MLOps roles.
  • Demonstrated experience managing production AI/ML workloads and systems.

PREFERRED QUALIFICATIONS: (Additional qualifications that may make a person even more effective in the role, but are not required for consideration)
  • Experience with LLMOps and GenAI pipeline monitoring.
  • Cloud certifications in AWS, Azure, or GCP.
  • Experience supporting AI applications in regulated industries (e.g.,healthcare, finance).
  • Contributions to open-source MLOps tools or infrastructure projects.
  • Experience with edge deployment and model optimization techniques (quantization, pruning, distillation).
  • Knowledge of compliance frameworks (SOC2, GDPR, HIPAA) and security best practices for AI/ML systems.
  • Experience with real-time streaming data pipelines (Kafka, Kinesis) and event-driven ML Architectures.
Apply
Apply Here done

© 2025 Native American Careers