Role Purpose & Context
Role Summary
The Generative AI Engineer is responsible for taking generative AI models from concept to production, making sure they're robust, performant, and deliver real value. You'll be knee-deep in code, building out features and fixing the inevitable quirks that come with working with LLMs. This role sits right at the heart of our product development, translating complex AI research into practical, user-facing applications. When you do this well, our products feel genuinely intelligent and intuitive, which means happier customers and new revenue streams. If it's not done right, we risk shipping features that don't quite work, leading to frustration and wasted effort. The tricky part is keeping up with the insane pace of AI innovation while still delivering stable, reliable systems. But honestly, the reward is seeing your work directly impact how people use our products, making them genuinely better.
Reporting Structure
- Reports to: Senior Generative AI Engineer
- Direct reports:
- Matrix relationships:
AI/ML Engineer (Generative Focus), LLM Developer, Applied AI Engineer (Generative), Machine Learning Engineer (Generative AI),
Key Stakeholders
Internal:
- Product Managers (for feature requirements)
- Software Engineering Teams (for integration)
- Data Scientists (for model insights and evaluation)
- UX/UI Designers (for user experience of AI features)
External:
- Cloud AI Platform Vendors (for technical support)
- Open-source AI Communities (for learning and contribution)
Organisational Impact
Scope: This role directly impacts our product's intelligence and our ability to deliver innovative features. Your work will shape how our customers interact with our platform, driving user engagement and, ultimately, our market position. Get it right, and we're seen as leaders; get it wrong, and we're just another company dabbling in AI.
Performance Metrics
Quantitative Metrics
- Metric: Model Inference Latency (P95)
- Desc: The time it takes for our generative AI models to respond to a request.
- Target: Maintain P95 latency below 500ms for user-facing features.
- Freq: Weekly, monitored via production dashboards.
- Example: If a user asks a question, we want the AI to answer within half a second, 95% of the time. If it's consistently hitting 700ms, that's a problem we need to fix.
- Metric: Task-specific Evaluation Score
- Desc: How accurately and appropriately our models generate content for specific tasks (e.g., summarisation, code generation, question answering).
- Target: Achieve 85% accuracy/relevance on internal evaluation benchmarks for new features.
- Freq: Per feature release and monthly re-evaluation.
- Example: For our new content generation tool, if 8 out of 10 outputs are considered 'good' by human evaluators, we're hitting our target. If it drops to 6, we know something's off.
- Metric: Deployment Frequency
- Desc: How often you're able to push new or updated generative AI features/models to production.
- Target: Deploy new model versions or feature updates at least once every two weeks.
- Freq: Tracked through CI/CD pipelines.
- Example: If we're only pushing code once a month, we're moving too slowly. The goal is to iterate quickly, so getting new improvements out every fortnight is key.
- Metric: Cost-per-inference Optimisation
- Desc: The computational cost associated with each model request.
- Target: Reduce average cost-per-inference by 10% quarter-over-quarter without sacrificing quality.
- Freq: Monthly cost reports.
- Example: If each API call to our LLM costs £0.01, we're looking to get that down to £0.009 next quarter. Small savings add up quickly when you're running millions of inferences.
Qualitative Metrics
- Metric: Code Quality & Maintainability
- Desc: How clean, well-documented, and easy-to-understand your code is, making it simple for others to pick up or debug.
- Evidence: Positive feedback in code reviews, fewer bugs reported in your modules, clear and concise documentation (docstrings, READMEs), and your solutions being adopted as patterns by others.
- Metric: Problem-Solving Effectiveness
- Desc: Your ability to diagnose and fix issues with generative AI models, especially when things go wrong in unexpected ways.
- Evidence: Successfully debugging complex model behaviours (e.g., hallucinations, unexpected outputs), identifying root causes quickly, and proposing robust solutions that prevent recurrence. You're the person others come to when they're stuck on a tricky model issue.
- Metric: Collaboration & Knowledge Sharing
- Desc: How well you work with other teams and share what you've learned, helping everyone else get better at GenAI.
- Evidence: Actively participating in team discussions, offering helpful advice in code reviews, contributing to internal knowledge bases or wikis, and helping junior team members get unstuck. You're seen as someone who makes the team smarter.
- Metric: Proactive Issue Identification
- Desc: Spotting potential problems with models or pipelines before they become big, user-facing issues.
- Evidence: Raising concerns about data drift, potential model biases, or scaling challenges before they impact production. You're not just reacting to fires; you're trying to prevent them.
Primary Traits
- Trait: Systematic Problem-Solver
- Manifestation: You're the sort of person who, when faced with 'the model is acting weird,' doesn't just randomly tweak parameters. Instead, you'll break it down: 'Is it the data? The prompt? The model architecture? The inference settings?' You'll set up controlled experiments, document your findings, and methodically rule out possibilities. If something fails after a long training run, you'll want to know exactly why, not just move on.
- Benefit: Generative AI is a bit of a black box, and it can be incredibly frustrating. If you're not systematic, you'll spend weeks chasing your tail, burning through expensive GPU hours, and never really understanding why something works or doesn't. We need people who can bring order to the chaos and learn from every experiment, whether it succeeds or fails.
- Trait: Pragmatic Precision
- Manifestation: You understand that a model that's 99% accurate in a research paper might be unusable in production if it costs a fortune or takes too long to respond. You'll obsess over the details in the data pipeline because you know 'garbage in, garbage out' is even more true with LLMs. When you write code, it's clean, well-tested, and reproducible, because you know someone else (or future you) will have to read it.
- Benefit: In GenAI, the last 1% of performance can easily cost 90% of the budget and add significant latency. This role isn't about chasing academic perfection; it's about making smart trade-offs. You need a precise understanding of how every component affects quality, speed, and cost to deliver solutions that are actually useful and sustainable for the business.
- Trait: Insatiably Curious & Self-Directed
- Manifestation: You're the one who's already read the latest ArXiv papers on RAG or LoRA, not because you were told to, but because you genuinely wanted to know. You've probably messed around with a new open-source LLM on your personal machine over a weekend. When a new framework comes out, you're figuring out how to use it before it's even on our roadmap.
- Benefit: The generative AI space is moving at an astonishing pace. What's 'cutting-edge' today is old news next month. If you're waiting for someone to tell you what to learn, you'll be left behind. We need people who have an internal drive to constantly explore, experiment, and bring new knowledge to the team without needing constant direction. It's the only way we'll stay competitive.
Supporting Traits
- Trait: Resilience
- Desc: You'll need to bounce back quickly when a promising model architecture fails completely after a 48-hour training run, or when a stakeholder changes their mind about a feature you've just built. It's not for the faint-hearted.
- Trait: Skeptical Optimism
- Desc: You believe in the truly transformative potential of generative AI, but you also maintain a healthy skepticism about the hype. You'll question unsubstantiated claims and push for real evidence, rather than just jumping on every new bandwagon.
- Trait: Collaborative Communicator
- Desc: You can explain complex concepts like 'temperature sampling' or 'context windows' to a product manager or a non-technical colleague without making them feel stupid. You're happy to share your knowledge and help others learn.
- Trait: Adaptable Learner
- Desc: The tools and techniques change constantly. You're comfortable picking up new programming paradigms, frameworks, or cloud services quickly, often on the fly, because that's just the reality of this space.
Primary Motivators
- Motivator: Solving Hard, Novel Technical Problems
- Daily: You'll spend your days grappling with tricky issues like reducing model hallucinations, optimising inference speed for a new architecture, or figuring out how to get a RAG pipeline to work reliably with really messy data. It's a constant puzzle.
- Motivator: Direct Impact on Product & Users
- Daily: Your code and models won't just sit in a research paper; they'll be integrated into our products, and you'll see users interacting with them. You'll get feedback (good and bad) and know your work is making a difference.
- Motivator: Continuous Learning in a Rapidly Evolving Field
- Daily: You'll be expected (and encouraged) to spend time exploring new models, frameworks, and research papers. This isn't a static role; you'll constantly be learning and applying the latest techniques.
Potential Demotivators
Honestly, this role isn't for everyone. If you need a perfectly stable, predictable environment, you'll probably struggle. You'll rerun the same analysis three times because stakeholders keep changing the question. The 'urgent' request that disrupted your Thursday will get deprioritised on Friday. You'll build a beautiful model that never gets deployed because the business moved on, or a new, better open-source model came out last week. If you need to see every piece of work make it to production, or if you get frustrated by ambiguity and constant change, you'll find this tough. We won't pretend it's easy.
Common Frustrations
- The unreasonable effectiveness of 'magic': Stakeholders often treat LLMs as magical black boxes, leading to wildly unrealistic expectations and feature requests that defy the current laws of AI physics.
- The data janitor reality: 80% of building a good RAG system is the unglamorous work of cleaning, chunking, and preparing messy, unstructured source documents.
- Chasing a moving target: You spend three months fine-tuning a model for a specific task, only for GPT-5 to be released, making your work obsolete overnight. The pace is relentless.
- The GPU budget scrutiny: Your requests for more A100/H100 compute are scrutinised by finance like a capital expense, forcing you to justify every pound spent on experimental model training.
What Role Doesn't Offer
- A slow, predictable pace with clearly defined, unchanging requirements.
- A role where you only work on greenfield projects; there's plenty of existing code to maintain and improve.
- A role where you're handed perfectly clean datasets; you'll be doing a lot of data wrangling.
- A role where every single model or feature you build makes it to production; some experiments just won't pan out.
ADHD Positives
- The fast-paced nature and constant novelty of generative AI can be highly engaging for those with ADHD, providing continuous stimulation and new challenges.
- The ability to hyper-focus on complex technical problems, like optimising a model or debugging a tricky RAG pipeline, can be a significant strength.
- The need for rapid iteration and experimentation aligns well with a preference for dynamic, hands-on work rather than long, monotonous tasks.
ADHD Challenges and Accommodations
- Managing multiple concurrent tasks and shifting priorities can be challenging; we use structured project management tools and daily stand-ups to help keep things on track.
- Detailed documentation can feel tedious; we encourage using AI tools for initial drafts and pair programming for review to make it less of a solo burden.
- Maintaining focus during long, uninterrupted coding sessions might be difficult; we support flexible work patterns and regular breaks to help manage energy levels.
Dyslexia Positives
- Strong spatial reasoning and pattern recognition, often associated with dyslexia, are incredibly valuable for understanding complex model architectures and identifying trends in data.
- The emphasis on logical problem-solving and abstract thinking in AI engineering can be a great fit, as these strengths are often pronounced.
- Visual tools for model architecture design (e.g., flowcharts, diagrams) and data visualisation are heavily used, playing to visual processing strengths.
Dyslexia Challenges and Accommodations
- Reading and writing extensive documentation or research papers can be demanding; we provide access to text-to-speech tools and encourage the use of AI summarisation for long documents.
- Careful attention to syntax in code is crucial; we use robust IDEs with strong auto-completion, linting, and pair programming for code reviews to catch errors collaboratively.
- Proofreading written communications (emails, reports) might take more effort; we encourage using grammar checkers and peer review for important documents.
Autism Positives
- A deep focus on logic, systems, and detail is highly beneficial for debugging models, optimising algorithms, and ensuring the precision of AI systems.
- The preference for clear, direct communication and objective data analysis aligns well with the technical nature of the role.
- The opportunity to specialise in complex technical areas, becoming a subject matter expert in specific generative AI techniques, can be very rewarding.
Autism Challenges and Accommodations
- Navigating ambiguous requirements or rapidly changing stakeholder expectations can be difficult; we strive for clear, written specifications and provide a Senior Engineer as a consistent point of contact.
- Unplanned social interactions or noisy open-plan environments can be overwhelming; we offer options for focused work in quieter spaces or remote work, and schedule meetings with clear agendas.
- Interpreting subtle social cues in team dynamics might be a challenge; we foster a culture of direct, respectful feedback and provide clear expectations for collaboration.
Sensory Considerations
Our main office environment is a modern, open-plan space which can sometimes be a bit lively. That said, we have quiet zones, noise-cancelling headphones available, and plenty of flexibility for remote work or working from home a few days a week. We're happy to discuss specific needs to make sure you're comfortable and can do your best work.
Flexibility Notes
We believe in output, not hours. We offer flexible start and end times, and a hybrid working model. If you need specific adjustments, let's talk about them – we're committed to making this a great place to work for everyone.
Key Responsibilities
Experience Levels Responsibilities
- Level: Mid-Level Generative AI Engineer
- Responsibilities: Independently build and deploy core components of our generative AI features, like a new RAG pipeline for customer support or a text summarisation module for our internal tools. You'll own it from start to finish for smaller features.
- Take ownership of optimising existing generative models for performance, latency, and cost. This means digging into inference settings, trying out different PEFT methods (like LoRA), and making sure we're not burning money on GPUs.
- Identify and debug complex issues in our generative AI systems, whether it's models hallucinating, RAG pipelines returning irrelevant context, or unexpected API errors. You'll be the one figuring out 'why did it say that?'
- Propose and implement improvements to our prompt engineering strategies, constantly refining how we talk to our LLMs to get better, more reliable outputs for specific tasks. This is a continuous process.
- Collaborate closely with product managers and other engineering teams to understand requirements and integrate generative AI features seamlessly into our existing products. You'll be the technical voice in those discussions.
- Contribute to our internal documentation and knowledge sharing, making sure that what you've built and learned is clearly explained for others. Yes, it's boring sometimes, but future-you (and everyone else) will thank you.
- Participate in code reviews, offering constructive feedback to peers and learning from their approaches. We all get better together.
- Supervision: You'll typically have weekly check-ins with a Senior Generative AI Engineer or your manager. For routine tasks, you'll work independently, but for anything novel or particularly tricky, you're expected to flag it and get guidance.
- Decision: You'll make routine technical decisions within the scope of your assigned projects, like choosing an appropriate embedding model or fine-tuning technique. Budget decisions above £2,000 or significant architectural changes will need approval from a Senior Engineer or your manager. You'll inform stakeholders about progress and potential roadblocks, but consult on major changes to timelines or scope.
- Success: Success looks like reliably delivering high-quality, performant generative AI features that meet product requirements, proactively identifying and solving technical challenges, and contributing positively to the team's overall knowledge and capabilities. Basically, you're building stuff that works, and you're getting better at it every day.
Decision-Making Authority
- Type: Technical Approach for a New Feature
- Entry: Proposes options, requires full review and approval by Senior Engineer.
- Mid: Proposes and justifies a specific approach, consults with Senior Engineer, proceeds with agreement. Can independently choose specific models/libraries for well-defined tasks.
- Senior: Defines the overall technical approach and architecture, consults with Lead/Staff Engineer for strategic alignment, makes final technical decisions within project scope.
- Type: Production Deployment of a Model
- Entry: Prepares deployment artefacts under close supervision, deployment executed by Senior Engineer.
- Mid: Independently deploys minor model updates or new, small features after peer review and manager approval. Requires sign-off for major changes.
- Senior: Leads deployment strategy for major systems, approves deployment plans for projects, accountable for production stability.
- Type: GPU Resource Allocation for Training
- Entry: Requests specific resources for defined tasks, approved by Senior Engineer.
- Mid: Estimates and requests resources for project-level training runs (e.g., a few A100 hours), requires manager approval for significant or long-running jobs.
- Senior: Manages resource allocation across multiple projects, optimises usage, consults with Lead/Staff on large-scale cluster needs.
- Type: Vendor/Tool Selection (e.g., new Vector DB)
- Entry: Researches options and provides summaries to Senior Engineer.
- Mid: Evaluates 2-3 options for a specific project need, makes a recommendation with a clear justification, requires manager approval.
- Senior: Evaluates and recommends strategic tools/vendors for a workstream, budget approval up to £5K, consults with Lead/Staff for larger commitments.
ID:
Tool: Boilerplate Code Generation
Benefit: Use a code-generation model like GitHub Copilot or similar tools to instantly create data loading scripts, model class skeletons, unit tests, and even complex API integrations. This means less time writing repetitive code and more time on core logic and innovation.
ID:
Tool: Research Paper Summariser
Benefit: Feed the latest ArXiv papers, technical blogs, or internal documentation into an LLM to get concise summaries of key innovations, methodologies, and results. Stay current with the rapidly evolving GenAI landscape without having to read every 20-page paper in full detail.
ID:
Tool: Interactive Debugging Assistant
Benefit: Paste complex error messages, confusing code blocks, or even a tricky prompt into a chat interface and ask an LLM to explain potential causes, suggest fixes, or refactor the code for clarity. It's like having a senior engineer on standby 24/7.
ID: ✍️
Tool: Automated Documentation Writer
Benefit: Use an AI tool to automatically generate docstrings, README files, API documentation, and even internal wikis directly from your code. Ensure your projects are always well-documented without the manual grind, freeing you up for more impactful work.
Roughly 10-15 hours per week on routine tasks.
Weekly time savings potential
You'll have access to our internal AI tools and external subscriptions, typically costing around £50-£150 per month, paid for by us.
Typical tool investment
Competency Requirements
Foundation Skills (Transferable)
Beyond the technical wizardry, we need people who can think clearly, work well with others, and adapt when things inevitably go sideways. These are the bedrock skills that make a great engineer.
- Category: Communication & Collaboration
- Skills: Explaining complex technical concepts clearly to non-technical colleagues (e.g., Product Managers).
- Writing clear, concise technical documentation and code comments.
- Actively participating in code reviews and team discussions, offering constructive feedback.
- Working effectively within an agile team, contributing to sprint planning and daily stand-ups.
- Category: Problem-Solving & Critical Thinking
- Skills: Breaking down ambiguous problems into manageable, testable components.
- Debugging complex systems, identifying root causes rather than just symptoms.
- Evaluating trade-offs between different technical solutions (e.g., performance vs. cost).
- Applying a scientific method to experimentation and model evaluation.
- Category: Adaptability & Learning Agility
- Skills: Quickly learning new programming languages, frameworks, and tools as the industry evolves.
- Adapting to changing project requirements and priorities without getting flustered.
- Being comfortable with ambiguity and iterating on solutions when initial attempts don't work.
- Proactively seeking out new knowledge and best practices in generative AI.
- Category: Ownership & Initiative
- Skills: Taking responsibility for tasks and seeing them through to completion, even when challenging.
- Identifying opportunities for improvement in existing systems or processes.
- Working independently on assigned tasks, knowing when to ask for help but not waiting to be told what to do next.
- Demonstrating a strong sense of accountability for the quality and impact of your work.
Functional Skills (Role-Specific Technical)
This is where the rubber meets the road. You'll need solid hands-on experience with the tools and techniques that make generative AI actually work in a product setting.
Technical Competencies
- Skill: Fine-Tuning Methodologies (PEFT, LoRA)
- Desc: Deep, practical knowledge of various techniques from full fine-tuning to parameter-efficient methods (PEFT) like LoRA, QLoRA, and prompt tuning. You'll need to understand when to apply each based on data availability, compute budget, and performance requirements for a given task.
- Level: Advanced
- Skill: Retrieval-Augmented Generation (RAG) Architecture
- Desc: Expertise in designing, building, and optimising RAG pipelines. This includes understanding chunking strategies, selecting appropriate embedding models, optimising vector stores, and implementing advanced retrieval techniques (e.g., hybrid search, re-ranking).
- Level: Advanced
- Skill: Prompt Engineering & Optimisation
- Desc: A scientific approach to crafting, testing, and refining prompts for specific tasks. Includes knowledge of techniques like chain-of-thought, self-consistency, and tree-of-thought to maximise model performance and reliability in production.
- Level: Advanced
- Skill: Model Evaluation & Hallucination Mitigation
- Desc: Developing robust evaluation frameworks beyond simple accuracy scores. Includes implementing metrics for factuality, toxicity, and bias, and using techniques like RAG and knowledge grounding to reduce model hallucination in real-world applications.
- Level: Intermediate
- Skill: AI Safety & Ethics (Practical Application)
- Desc: Implementing technical safeguards against misuse, including prompt injection defence, content moderation filters, and data anonymisation. Applying frameworks to assess and mitigate ethical risks and model bias in your daily work.
- Level: Intermediate
- Skill: MLOps Principles (Deployment & Monitoring)
- Desc: Understanding the principles of deploying, monitoring, and maintaining machine learning models in production. This includes version control for models, automated testing, and setting up alerts for performance degradation or drift.
- Level: Intermediate
Digital Tools
- Tool: Python (PyTorch, TensorFlow, Hugging Face Transformers)
- Level: Expert
- Usage: Building custom model architectures, fine-tuning LLMs, implementing RAG pipelines, and developing evaluation scripts. You'll be living in Python.
- Tool: LLM Frameworks (LangChain, LlamaIndex)
- Level: Advanced
- Usage: Building applications using these frameworks based on documented patterns, implementing basic RAG pipelines, and developing agentic workflows for specific features.
- Tool: Vector Databases (Pinecone, Weaviate, Milvus)
- Level: Intermediate
- Usage: Performing CRUD operations, similarity searches, and optimising indexing strategies for specific RAG applications. You'll be managing the knowledge base for our LLMs.
- Tool: Cloud AI Platforms (AWS SageMaker, Google Vertex AI, Azure OpenAI Service)
- Level: Intermediate
- Usage: Deploying and running models using managed services, managing fine-tuning jobs, and using platform SDKs for basic MLOps tasks. We tend to use AWS, but familiarity with others is a bonus.
- Tool: Experiment Tracking (Weights & Biases, MLflow)
- Level: Advanced
- Usage: Logging metrics, parameters, and artefacts for your experiments, comparing runs, and reproducing results to ensure scientific rigour in model development.
- Tool: Containerisation (Docker, Kubernetes)
- Level: Intermediate
- Usage: Using Docker to containerise your generative AI applications and manage dependencies. You'll be deploying these containers, probably onto Kubernetes, so understanding how that works is key.
- Tool: Version Control (Git)
- Level: Expert
- Usage: Collaborating on code, managing branches, resolving conflicts, and maintaining a clean and traceable codebase for all AI projects.
Industry Knowledge
- Area: Latest LLM Architectures & Capabilities
- Desc: A good understanding of the current state-of-the-art LLM architectures (e.g., Transformers, MoE) and their respective strengths and weaknesses, including awareness of emerging models and techniques.
- Area: Ethical AI Principles
- Desc: Awareness of the ethical considerations surrounding generative AI, including bias, fairness, transparency, and responsible deployment. You'll need to think about how your models might be misused.
- Area: Cloud Computing Fundamentals
- Desc: Basic understanding of cloud infrastructure concepts (e.g., compute, storage, networking) as they relate to deploying and scaling AI workloads. We're primarily on AWS.
Regulatory Compliance Regulations
- Reg: GDPR (General Data Protection Regulation)
- Usage: Understanding how to handle personal data when training or fine-tuning models, ensuring data anonymisation, and being aware of data residency requirements for AI services. You'll need to know what you can and can't do with data.
- Reg: AI Act (EU - emerging)
- Usage: Keeping an eye on emerging AI regulations, particularly around high-risk AI systems, and understanding the potential implications for model transparency, robustness, and human oversight in our products. This is still evolving, but it's good to be aware.
Essential Prerequisites
- Solid programming skills in Python, including experience with data structures, algorithms, and object-oriented programming.
- A good grasp of machine learning fundamentals: supervised/unsupervised learning, model evaluation metrics, bias-variance trade-off.
- Practical experience with at least one major deep learning framework (PyTorch or TensorFlow) for building and training models.
- Familiarity with cloud platforms (AWS, GCP, or Azure) for deploying and managing ML workloads.
- Experience with version control systems, specifically Git, in a collaborative team environment.
- A genuine curiosity about generative AI and a track record of self-learning in this rapidly evolving field (e.g., personal projects, online courses, contributions to open source).
Career Pathway Context
These are the foundational skills we expect you to bring to the table. If you've got these locked down, you're in a great position to grow into this role and beyond. We're looking for someone who can hit the ground running on the technical side, not someone who needs to learn Python from scratch.
Qualifications & Credentials
Emerging Foundation Skills
- Skill: Advanced Prompt Engineering & Agentic Workflows
- Why: Simply writing a good prompt isn't enough anymore. As LLMs become more capable, the real value comes from orchestrating them into complex agents that can reason, plan, and execute multi-step tasks autonomously. This is where the big productivity gains (and challenges) will be.
- Concepts: [{'concept_name': 'Multi-agent systems', 'description': 'Designing and coordinating multiple LLM agents that specialise in different tasks to achieve a larger goal.'}, {'concept_name': 'Tool use & function calling', 'description': 'Enabling LLMs to interact with external APIs and tools to retrieve information or perform actions.'}, {'concept_name': 'Self-correction & reflection', 'description': 'Building systems where LLMs can evaluate their own outputs and refine their approach.'}, {'concept_name': 'Memory management for agents', 'description': 'Designing how agents retain and recall information over long conversations or tasks.'}]
- Prepare: This month: Experiment with LangChain or LlamaIndex's agent capabilities, build a simple agent that uses a tool (e.g., a search engine).
- Next quarter: Design and implement a multi-step agent for an internal process, focusing on error handling and robustness.
- Month 3-6: Explore advanced agentic patterns like 'Tree of Thought' or 'Self-Refine' in a personal project or a small team initiative.
- Ongoing: Read papers on agent architectures and participate in relevant open-source projects.
- QuickWin: Start by getting comfortable with function calling in OpenAI or similar APIs. It's the first step to building more capable agents.
- Skill: Cost-Aware Model Selection & Optimisation
- Why: As GenAI scales, compute costs can skyrocket. Knowing how to pick the right model (open-source vs. proprietary, small vs. large) for a specific task, and then optimising its inference and training costs, will become absolutely critical for business viability.
- Concepts: [{'concept_name': 'Tokenomics & API pricing models', 'description': 'Understanding how different LLM providers charge for input/output tokens and context windows.'}, {'concept_name': 'Quantisation & distillation', 'description': 'Techniques to reduce model size and inference cost while maintaining performance.'}, {'concept_name': 'Batching & parallelisation', 'description': 'Optimising inference requests to make the most efficient use of GPU resources.'}, {'concept_name': 'Benchmarking cost vs. quality', 'description': 'Developing frameworks to compare different models based on both their performance and their operational cost.'}]
- Prepare: This month: Deep dive into the pricing models of AWS, Azure, and OpenAI. Understand how our current usage translates to costs.
- Next quarter: Take ownership of optimising the cost of one existing production model, experimenting with different inference parameters or smaller models.
- Month 3-6: Research and prototype a quantisation or distillation technique for a specific model to see the real-world impact on cost and performance.
- Ongoing: Advocate for cost-conscious decisions in model selection and deployment within the team.
- QuickWin: Start monitoring the token usage and cost of your development experiments. You'll quickly see where the money goes.
Advancing Technical Skills
- Skill: Advanced RAG Optimisation & Hybrid Search
- Why: Basic RAG is becoming standard, but getting truly reliable, high-quality retrieval requires going beyond simple vector search. Hybrid approaches and more sophisticated re-ranking will be essential to tackle complex information retrieval challenges.
- Concepts: [{'concept_name': 'Hybrid search (vector + keyword)', 'description': 'Combining semantic similarity with traditional keyword search for more robust retrieval.'}, {'concept_name': 'Re-ranking models', 'description': 'Using smaller, specialised models to re-order retrieved documents for better relevance.'}, {'concept_name': 'Graph-based RAG', 'description': 'Integrating knowledge graphs to enhance retrieval and reasoning capabilities.'}, {'concept_name': 'Self-correcting RAG', 'description': 'Systems where the LLM can identify and correct errors in its own retrieval process.'}]
- Prepare: This month: Implement a basic hybrid search (e.g., BM25 + vector search) in one of our existing RAG pipelines.
- Next quarter: Experiment with a re-ranking model to improve the quality of retrieved documents for a tricky use case.
- Month 3-6: Explore the concepts of knowledge graphs and how they could be integrated into our RAG architecture for more complex queries.
- Ongoing: Stay updated on the latest research in information retrieval and RAG architectures.
- QuickWin: Start by adding a simple keyword search component to an existing vector search, then measure if it improves retrieval quality.
- Skill: Multi-modal Generative AI
- Why: The world isn't just text. Models that can understand and generate across text, images, audio, and video are becoming increasingly powerful and will unlock entirely new product capabilities. You'll need to think beyond just words.
- Concepts: [{'concept_name': 'Vision-Language Models (VLMs)', 'description': 'Models that can process and reason about both images and text.'}, {'concept_name': 'Image generation (Stable Diffusion, DALL-E)', 'description': 'Understanding the principles and practical application of generating images from text prompts.'}, {'concept_name': 'Audio generation & speech-to-text', 'description': 'Working with models that can transcribe speech or generate synthetic audio.'}, {'concept_name': 'Cross-modal embeddings', 'description': 'Representing different data types in a shared vector space for unified reasoning.'}]
- Prepare: This month: Experiment with an open-source VLM (e.g., LLaVA) or an image generation model (e.g., Stable Diffusion) in a personal project.
- Next quarter: Propose a small internal project where multi-modal AI could add value (e.g., generating image captions, summarising video content).
- Month 3-6: Dive into the architectures of multi-modal models, understanding how different modalities are integrated and processed.
- Ongoing: Follow research in multi-modal AI and explore relevant datasets.
- QuickWin: Try using an existing multi-modal API (like GPT-4V) to analyse images or videos and extract insights. It's a great way to see the potential.
Future Skills Closing Note
This isn't just about keeping up; it's about staying ahead. We'll support your learning with resources, time, and opportunities to apply these new skills. Your growth here is our growth.
Education Requirements
- Level: Minimum
- Req: A Bachelor's degree in Computer Science, Artificial Intelligence, Machine Learning, Data Science, or a closely related quantitative field.
- Alts: Alternatively, significant demonstrable professional experience (4+ years) in machine learning engineering or software development with a strong AI focus, coupled with relevant certifications or bootcamps, will be considered.
- Level: Preferred
- Req: A Master's degree in Computer Science, Artificial Intelligence, or a related field, especially with a specialisation in natural language processing or deep learning.
- Alts: Not strictly required, but it shows a deeper theoretical grounding. Real-world project experience often trumps an advanced degree if it's hands-on and impactful.
Experience Requirements
You'll need roughly 2-5 years of hands-on experience in machine learning engineering or a related technical role. This should include practical experience building, training, and deploying machine learning models, ideally with at least 1-2 years specifically focused on generative AI, large language models, or natural language processing. We're looking for someone who has moved beyond just following tutorials and has actually shipped AI-powered features.
Preferred Certifications
- Cert: AWS Certified Machine Learning – Specialty
- Prod: Amazon Web Services (AWS)
- Usage: Demonstrates a solid understanding of building, training, and deploying ML models on the AWS platform, which is our primary cloud provider.
- Cert: DeepLearning.AI Generative AI with Large Language Models Specialisation
- Prod: DeepLearning.AI (Coursera)
- Usage: Shows a dedicated effort to understand the core concepts and practical applications of generative AI and LLMs, covering many of the domain skills we look for.
- Cert: TensorFlow Developer Certificate
- Prod: Google
- Usage: Validates your ability to build and deploy ML solutions using TensorFlow, a key framework we use.
Recommended Activities
- Actively participating in online communities (e.g., Hugging Face forums, relevant subreddits, Discord servers) focused on generative AI.
- Contributing to open-source generative AI projects, even small bug fixes or documentation improvements.
- Attending industry conferences or local meetups focused on AI, machine learning, or natural language processing.
- Completing advanced online courses or specialisations in areas like MLOps, advanced deep learning architectures, or multi-modal AI.
- Maintaining a personal portfolio of generative AI projects on GitHub, showcasing your hands-on skills and curiosity.
Career Progression Pathways
Entry Paths to This Role
- Path: Associate Generative AI Engineer (L1)
- Time: 1-2 years
- Path: Machine Learning Engineer (from a different specialism)
- Time: Transition typically takes 6-12 months of focused learning
- Path: Software Engineer (with AI interest)
- Time: Transition typically takes 1-2 years of dedicated effort
Career Progression From This Role
- Pathway: Senior Generative AI Engineer (L3)
- Time: 2-3 years in the Generative AI Engineer role
Long Term Vision Potential Roles
- Title: Staff Generative AI Engineer (L4)
- Time: 5-8 years from current role
- Title: Principal Generative AI Engineer (L5)
- Time: 8-12 years from current role
- Title: Director of Generative AI (L6)
- Time: 10-15 years from current role
Sector Mobility
The skills you'll gain in this role are highly transferable across almost any industry. Generative AI is transforming everything from finance and healthcare to media and manufacturing. You'll be a sought-after expert in a rapidly expanding field, with opportunities in product companies, research labs, or even starting your own venture.
How Zavmo Delivers This Role's Development
DISCOVER Phase: Skills Gap Analysis
Zavmo maps your current competencies against all requirements in this job description through conversational assessment. We evaluate your foundation skills (communication, strategic thinking), functional skills (CRM expertise, negotiation), and readiness for career progression.
Output: Personalised skills gap heat map showing strengths and priorities, estimated time to competency, neurodiversity accommodations.
DISCUSS Phase: Personalised Learning Pathway
Based on your DISCOVER results, Zavmo creates a personalised learning plan prioritised by impact: foundation skills first, then functional skills. We adapt to your learning style, pace, and neurodiversity needs (ADHD, dyslexia, autism).
Output: Week-by-week schedule, each module linked to specific job responsibilities, checkpoints and milestones.
DELIVER Phase: Conversational Learning
Learn through conversation, not boring modules. Zavmo uses 10 conversation types (Socratic dialogue, role-play, coaching, case studies) to build competence. Practice difficult QBR presentations, negotiate tough renewals, and handle churn conversations in a safe AI environment before facing real clients.
Example: "For 'Stakeholder Mapping', Zavmo will guide you through analysing a complex enterprise account, identifying key decision-makers, and building an engagement strategy."
DEMONSTRATE Phase: Competency Assessment
Zavmo automatically builds your evidence portfolio as you learn. Every conversation, practice scenario, and application example is captured and mapped to NOS performance criteria. When ready, your portfolio supports OFQUAL qualification claims and demonstrates competence to employers.
Output: Competency matrix, evidence portfolio (downloadable), qualification readiness, career progression score.