Providence is seeking an exceptional Principal Cybersecurity Architect to lead the research, development, and deployment of Small Language Models (SLMs) specialized for healthcare and cybersecurity applications. This role combines cutting-edge machine learning research with practical implementation, requiring deep expertise in model architecture design, advanced fine-tuning techniques, graph-based knowledge systems, and vector-based retrieval augmentation.
The ideal candidate will architect and train efficient, domain-specialized language models using state-of-the-art techniques including LoRA/QLoRA, graph neural networks, vector embeddings, and hybrid retrieval systems. This position requires both depth in machine learning and hands-on implementation skills to deliver high-impact solutions for healthcare operations, clinical workflows, and security applications.
-
Design, develop, and optimize Small Language Models for healthcare and cybersecurity domains using advanced architectures (efficient transformers, MoE, sparse attention), compression techniques (quantization: INT8/INT4/GPTQ/AWQ, pruning, knowledge distillation), and emerging SLM architectures (Phi-3, Gemma, Mistral 7B variants) with rigorous ablation studies balancing model size, inference speed, accuracy, and resource requirements
-
Implement parameter-efficient fine-tuning (PEFT) techniques including LoRA, QLoRA, prefix tuning, prompt tuning, and adapter layers; design instruction tuning pipelines aligning SLMs with clinical terminology and security protocols; develop multi-task learning frameworks; and deploy continuous learning strategies using DPO, RLHF, and active learning while optimizing hyperparameters to prevent overfitting on limited healthcare datasets
-
Implement knowledge graphs capturing domain relationships and build graph-augmented language models integrating Neo4j/Cosmos DB with unstructured text processing, leveraging GNN components (PyTorch Geometric, DGL, GraphSAGE) for entity relationship modeling and complex graph traversal reasoning
-
Create entity extraction and relation classification pipelines that automatically populate and maintain knowledge graphs from clinical documents, security logs, and operational data to enable graph-based reasoning across patient journeys, care coordination workflows, and security incident chains
-
Optimize model inference using vLLM, TensorRT-LLM, llama.cpp, GGML, and Optimum for maximum throughput/minimum latency; implement scalable model serving architectures (Triton, TorchServe, FastAPI) with batching, caching, and load balancing; design A/B testing frameworks and champion-challenger patterns for safe production model updates
-
Design and implement vector database architectures for semantic search and RAG applications using specialized embeddings optimized for cyber text, with hybrid search combining dense vector retrieval, sparse BM25 matching, and graph-based context expansion
-
Build advanced RAG architectures implementing multi-vector retrieval, reranking strategies (Cohere, Cross-Encoders), context compression, and query decomposition; optimize vector indexing (HNSW, IVF, Product Quantization) balancing accuracy, latency, and memory footprint; and develop chunking pipelines for diverse cyber data formats
-
Experience in end-to-end data pipelines for cyber text corpora processing, implementing data quality frameworks through de-identification and synthetic data generation (Synthea), bias detection, and representativeness metrics while building experiment tracking systems (MLflow) for versioning and reproducibility
-
Develop distributed training workflows using DeepSpeed, FSDP, or Megatron-LM for efficient multi-GPU/multi-node training on Azure ML; create automated data augmentation pipelines (back-translation, paraphrasing, entity substitution, synthetic generation) to expand limited healthcare training datasets while maintaining clinical validity
-
Design comprehensive evaluation frameworks measuring healthcare-specific task performance (NER, relation extraction, summarization, QA, reasoning) with custom benchmark datasets aligned to clinical accuracy requirements, regulatory compliance, and operational impact; implement automated robustness testing for adversarial attacks, edge cases, OOD detection, and hallucination detection
-
Conduct comparative analysis of SLM variants across accuracy, latency, throughput, memory, and cost metrics; build interpretability tools using attention visualization, SHAP values, and integrated gradients to ensure clinical safety and explainability for healthcare stakeholders and regulatory requirements
-
Understand on monitoring and observability dashboards tracking model performance drift, data distribution shifts, embedding quality degradation, and retrieval accuracy in production; collaborate with MLOps teams on containerization (Docker), Kubernetes/AKS deployment, and CI/CD pipelines for automated testing and deployment workflows
-
Build security-focused SLMs for threat detection, log analysis, vulnerability assessment, incident response automation, and policy compliance checking
-
Partner with clinical stakeholders, security teams, and product managers to identify high-impact use cases and translate domain requirements into technical specifications; mentor junior data scientists on SLM development, fine-tuning techniques, and healthcare AI; contribute to knowledge sharing through documentation, tech talks, and research communities; stay current with emerging research through literature review, conference participation, and experimentation
-
Expert-level proficiency in Python and core ML libraries including PyTorch, Hugging Face Transformers, PEFT, TRL (Transformer Reinforcement Learning), and BitsAndBytes
-
Deep hands-on experience with parameter-efficient fine-tuning: LoRA, QLoRA, AdaLoRA, prefix tuning, prompt tuning, P-Tuning, IA3, and adapter methods
-
Strong knowledge of model quantization techniques and tools: GPTQ, AWQ, GGUF/GGML, bitsandbytes (4-bit/8-bit quantization), and post-training quantization strategies
-
Expertise in transformer architectures including attention mechanisms (multi-head, grouped-query, sliding window), positional encodings (RoPE, ALiBi), and architectural variants (encoder-only, decoder-only, encoder-decoder)
-
Practical experience optimizing model training: mixed precision (FP16, BF16), gradient checkpointing, gradient accumulation, distributed training (DDP, FSDP, DeepSpeed ZeRO), and memory-efficient techniques
-
Strong foundation in machine learning fundamentals including optimization algorithms, regularization techniques, loss functions, evaluation metrics, and experimental design
Graph & Vector Technologies
-
Hands-on experience with graph databases: Neo4j, Amazon Neptune, Azure Cosmos DB (Gremlin API), or TigerGraph including Cypher/Gremlin query languages and graph data modelling
-
Proficiency in graph neural networks using PyTorch Geometric, DGL (Deep Graph Library), Spektral, or GraphSAGE for node classification, link prediction, and graph classification tasks
-
Deep experience with vector databases and similarity search: Pinecone, Weaviate, Qdrant, Milvus, Chroma, FAISS, or Annoy including index optimization and hybrid search implementations
-
Expertise in embedding models and techniques: sentence transformers, bi-encoders, cross-encoders, domain-adapted embeddings, and multi-modal embeddings
-
Strong knowledge of RAG architectures: naive RAG, advanced RAG (with reranking, query transformation), modular RAG, and evaluation frameworks (RAGAS, TruLens)
-
Understanding of knowledge graph construction techniques including entity extraction, relation extraction, entity linking, and ontology alignment
Data & Infrastructure Skills
-
Proficiency with cloud ML platforms, preferably Azure ML, Azure Databricks, or equivalent (AWS SageMaker, Google Vertex AI) for model training and deployment
-
Experience with experiment tracking and model management: MLflow, Weights & Biases, Comet.ml, Neptune.ai, or similar platforms
-
Knowledge of data processing frameworks: Pandas, Polars, Dask, Spark (PySpark) for large-scale data manipulation and feature engineering
-
Understanding of healthcare data standards (HL7, FHIR), clinical terminologies (SNOMED CT, ICD-10, LOINC, RxNorm), and EHR systems (Epic, Cerner, Oracle Health)
-
Understanding of responsible AI principles including bias detection, fairness metrics, model interpretability, and ethical considerations for healthcare AI
Professional Competencies
-
Strong collaboration skills with experience working across data engineering, ML engineering, product, and clinical teams
-
PhD in Machine Learning, Natural Language Processing, Computer Science, or related field with publications in top-tier conferences (NeurIPS, ICML, ACL, EMNLP, ICLR)
-
Research contributions to efficient language models, parameter-efficient fine-tuning, knowledge graphs, or retrieval-augmented generation domains
-
Experience with instruction tuning datasets and methodologies including self-instruct, Alpaca-style datasets, FLAN, and human preference alignment (DPO, PPO)
-
Hands-on experience with inference optimization frameworks: vLLM, TensorRT-LLM, text-generation-inference, CTranslate2, or ONNX Runtime for production deployment
-
Familiarity with agentic AI frameworks including LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI, or Semantic Kernel for building multi-step AI workflows
-
Experience with prompt engineering techniques, prompt optimization frameworks (DSPy, PromptSource), and evaluation of prompt effectiveness
-
Knowledge of clinical NLP benchmarks and datasets: i2b2, MIMIC, n2c2, BioASQ, PubMedQA, and clinical outcome prediction tasks
Success Metrics & Objectives
Performance in this role will be evaluated based on the following key metrics aligned with Providence's AI Data Science and Engineering OKR framework:
Model Performance & Quality
-
Achieve target accuracy, F1 scores, and domain-specific metrics for SLMs on healthcare tasks (e.g., >85% F1 for clinical NER, >90% accuracy for classification tasks)
Collaboration & Knowledge Sharing
This role requires hands-on expertise with the following technology stack:
Core ML Frameworks & Libraries
PyTorch, Hugging Face (Transformers, PEFT, TRL, Accelerate, Datasets, Tokenizers, Diffusers), TensorFlow/Keras, scikit-learn, XGBoost, LightGBM, BitsAndBytes, Optimum, ONNX Runtime
Fine-Tuning & Optimization
LoRA, QLoRA, AdaLoRA, Prefix Tuning, P-Tuning v2, IA3, Adapters, GPTQ, AWQ, GGUF/GGML, AutoGPTQ, llama.cpp, vLLM, TensorRT-LLM, DeepSpeed, FSDP, Flash Attention, Axolotl, UnslothAI
Neo4j, Amazon Neptune, Azure Cosmos DB (Gremlin), PyTorch Geometric, DGL (Deep Graph Library), NetworkX, GraphSAGE, Node2Vec, Graph Attention Networks (GAT), Knowledge Graph Embeddings (TransE, DistMult, ComplEx)
Vector Stores & Embeddings
Pinecone, Weaviate, Qdrant, Milvus, Chroma, FAISS, Annoy, Sentence Transformers, all-MiniLM, BGE Embeddings, E5 Embeddings, Instructor Embeddings, BioBERT, ClinicalBERT, PubMedBERT, SciBERT
RAG & Retrieval Frameworks
LangChain, LlamaIndex, Haystack, RAGAS, TruLens, Cohere Rerank, Cross-Encoders, ColBERT, BM25 (rank-bm25, Elasticsearch), Hybrid Search, Query Expansion, HyDE (Hypothetical Document Embeddings)
LangGraph, LangChain Agents, AutoGen, CrewAI, Semantic Kernel, LlamaIndex Workflows, DSPy, Guidance, LMQL, ReAct (Reasoning + Acting), Chain-of-Thought prompting
Azure ML, Azure Databricks, Azure OpenAI Service, Azure Cognitive Services, Azure Data Lake, Azure AKS (Kubernetes), Snowflake, Docker, Kubernetes, Ray, Terraform
MLOps & Experiment Tracking
MLflow, Weights & Biases, Comet.ml, Neptune.ai, DVC (Data Version Control), ClearML, Kubeflow, Airflow, Prefect, Great Expectations, Evidently AI
Model Serving & Inference
vLLM, TensorRT-LLM, Text Generation Inference (TGI), Triton Inference Server, TorchServe, FastAPI, Ray Serve, BentoML, Seldon Core
spaCy, NLTK, Gensim, Pandas, Polars, Dask, PySpark, Unstructured.io, LangChain Document Loaders, PDFPlumber, Tesseract OCR, Regex, Beautiful Soup
FHIR (Python FHIR Client), HL7 (python-hl7), ClinicalBERT, BioBERT, MedCAT, ScispaCy, QuickUMLS, UMLS APIs, SNOMED CT, ICD-10, LOINC, RxNorm, Synthea (synthetic patient data)