Mid-Level (2-5 years)

Cloud Support Analyst

As a Cloud Support Analyst, you're the person who steps in when something goes wrong in our AWS, Azure, or GCP environments. You'll be the first line of defence for our internal teams and sometimes external clients, making sure their cloud services are running smoothly. This isn't just about fixing things; it's about understanding why they broke and stopping it from happening again. You'll spend your days troubleshooting, diagnosing, and resolving issues, making sure our cloud infrastructure is reliable and available. It’s a busy role, but you’ll learn loads and really make a difference to how our systems perform.

Job ID
JD-TECH-TSCL-002
Department
Technical Roles
NOS Level
Level 5-6
OFQUAL Level
Level 5-6
Experience
Mid-Level (2-5 years)

Role Purpose & Context

Role Summary

The Cloud Support Analyst is responsible for keeping our cloud services (AWS, Azure, GCP) up and running, which directly impacts our customers' ability to use our products and our internal teams' productivity. You'll work at the intersection of our infrastructure and our users, translating technical problems into clear solutions that help everyone get back on track. When this role is done well, our systems are stable, and our users are happy because their issues are resolved quickly and thoroughly. When it's not, well, things break, people get frustrated, and we lose money and trust. The challenge is that cloud environments are complex and always changing, so you're constantly learning and adapting. The reward is the satisfaction of being the person who saves the day, often under pressure, and knowing you're building a more resilient system for the future.

Reporting Structure

Key Stakeholders

Internal:

External:

Organisational Impact

Scope: This role is absolutely critical for maintaining our operational uptime and service reliability. You're directly responsible for resolving incidents that could otherwise halt business operations or impact customer experience. Your quick thinking and problem-solving skills mean our developers can keep building, our sales team can keep selling, and our customers can keep using our products without interruption. Essentially, you help keep the lights on and the business moving forward, which is a pretty big deal.

Performance Metrics

Quantitative Metrics

  1. Metric: Time to First Response (TTFR)
  2. Desc: How quickly you acknowledge an incoming ticket.
  3. Target: < 30 minutes for P2 tickets, < 15 minutes for P1 tickets
  4. Freq: Daily, reviewed weekly
  5. Example: You pick up a P2 ticket about a slow application within 20 minutes of it being logged, letting the user know you're on it.
  6. Metric: Ticket Resolution within SLA
  7. Desc: Percentage of tickets you resolve before their service level agreement expires.
  8. Target: > 95% of tickets resolved within SLA
  9. Freq: Weekly, reviewed monthly
  10. Example: If a P3 ticket has a 24-hour SLA, you close it within that timeframe, not letting it breach.
  11. Metric: Customer Satisfaction (CSAT)
  12. Desc: How satisfied users are with your support, based on post-resolution surveys.
  13. Target: > 4.5/5 average score
  14. Freq: Per ticket, reviewed monthly
  15. Example: A user gives you a 5/5 rating because you not only fixed their issue but also explained what happened in a way they understood.
  16. Metric: Knowledge Base Contribution
  17. Desc: Number of new articles or significant updates you add to our internal knowledge base (Confluence).
  18. Target: > 2 new or updated runbooks per quarter
  19. Freq: Quarterly
  20. Example: You document a new troubleshooting process for a recurring database connection issue, saving future analysts time.

Qualitative Metrics

  1. Metric: Proactive Problem Identification
  2. Desc: Your ability to spot potential issues before they become critical incidents.
  3. Evidence: You flag an unusual spike in error logs during a routine check, raising it to the team before users report an outage. You notice a 'flapping alert' and investigate the root cause, rather than just dismissing it.
  4. Metric: Effective Communication During Incidents
  5. Desc: How clearly and calmly you communicate during stressful situations, both technically and to non-technical users.
  6. Evidence: You provide concise, factual updates in the incident Slack channel. You can explain a complex network issue to a Product Manager without using jargon, helping them understand the impact.
  7. Metric: Collaboration and Mentorship
  8. Desc: How well you work with your team and help less experienced colleagues.
  9. Evidence: You actively participate in 'ticket swarms' to help resolve complex issues. You patiently walk a new joiner through a troubleshooting process, rather than just giving them the answer.
  10. Metric: Root Cause Analysis Quality
  11. Desc: The thoroughness and accuracy of your investigation into why an incident occurred.
  12. Evidence: Your RCA goes beyond the immediate fix, identifying the underlying system or process flaw. You suggest preventative measures that actually get implemented.

Primary Traits

Supporting Traits

Primary Motivators

  1. Motivator: Solving Puzzles
  2. Daily: You get a real buzz from taking a complex, ambiguous problem – like 'the app is slow' – and breaking it down, using logs and metrics, until you find the exact cause and fix it. It's like being a detective every day.
  3. Motivator: Being the Hero
  4. Daily: You enjoy the feeling of restoring service during an outage or unblocking a critical project for a frustrated colleague. You like being the person who can step in and make things right, often under pressure.
  5. Motivator: Continuous Learning
  6. Daily: The cloud landscape changes constantly, and you love that. You're always keen to pick up new services, learn new troubleshooting techniques, and deepen your understanding of how things work. Stagnation is your enemy.

Potential Demotivators

Honestly, this role isn't for everyone. You'll get vague tickets that make you want to pull your hair out, and you'll spend hours troubleshooting something only to find it was a simple typo. You'll sometimes feel like you're just putting out fires, rather than preventing them. Not every fix is glamorous, and some days are just about grinding through the queue.

Common Frustrations

  1. The Vague Ticket: Getting a P2 ticket that just says 'The website is broken' with no URL, no error message, no user ID, and no screenshot. It's like finding a needle in a haystack, blindfolded.
  2. Out-of-Date Runbooks: Wasting 45 minutes following a Confluence guide, only to discover the entire process was changed two months ago and never documented. It's infuriating.
  3. The 'Shadow IT' Problem: Spending hours troubleshooting a system only to find out the user deployed it themselves, it violates every company standard, and they have no idea how it's configured. It's a nightmare.
  4. Alert Fatigue: Being on-call and getting woken up at 3 AM by a flurry of non-actionable, low-priority alerts that drown out the one truly critical notification. It's exhausting.
  5. The 'It's Urgent' Fallacy: Stakeholders who mark every single request as 'Urgent,' devaluing the meaning of priority and making it impossible to focus on what's actually a P1. It's a constant battle.

What Role Doesn't Offer

  1. A predictable, quiet day where nothing breaks. Truth is, something always breaks.
  2. The chance to build brand new systems from scratch every day. You're more about maintaining and fixing existing ones.
  3. An escape from documentation. Yes, it's part of the job.
  4. Complete autonomy to ignore processes. We have them for a reason, especially during incidents.

ADHD Positives

  1. The fast-paced, incident-driven nature of support can be really engaging, providing constant novelty and quick problem-solving opportunities that suit a high-energy, quick-thinking mind.
  2. The need to context-switch between different tickets and problems can be a strength, as you'll often be juggling multiple investigations simultaneously.
  3. Hyperfocus can be a huge asset when deep-diving into a complex log file or a tricky network issue, allowing you to block out distractions and find the solution.

ADHD Challenges and Accommodations

  1. Maintaining focus on repetitive documentation tasks or long-term problem management can be challenging; we can help by breaking these down into smaller, more engaging chunks or pairing you with a colleague.
  2. Organising and prioritising a constantly changing ticket queue can be overwhelming; we use structured ticketing systems and daily stand-ups to help manage this, and your manager will provide clear guidance on priorities.
  3. We can offer noise-cancelling headphones for focused work and encourage regular breaks to help manage energy levels.

Dyslexia Positives

  1. Strong spatial reasoning skills, often found in individuals with dyslexia, are excellent for visualising complex cloud architectures and network topologies.
  2. Great problem-solving abilities, especially for non-linear thinking, can help you find creative solutions to tricky technical issues that others might miss.
  3. Excellent verbal communication skills are often a strength, which is vital for explaining complex technical issues clearly to both technical and non-technical stakeholders during incidents.

Dyslexia Challenges and Accommodations

  1. Reading and writing extensive documentation, log files, or incident reports can be more demanding; we encourage the use of screen readers, dictation software, and tools like Grammarly, and we're happy to review your written work.
  2. Parsing dense error messages or configuration files might take a bit longer; we can provide tools that highlight syntax and offer clearer visual formatting, and we prioritise visual dashboards and alerts where possible.
  3. We support the use of assistive technologies and offer flexible approaches to documentation, such as verbal explanations or diagram-first approaches.

Autism Positives

  1. A strong attention to detail is invaluable for spotting subtle anomalies in logs, configurations, or monitoring dashboards that others might overlook, preventing major incidents.
  2. A methodical and logical approach to problem-solving, often a strength, is perfectly suited to systematically troubleshooting complex cloud issues and following runbooks precisely.
  3. A preference for clear, direct communication is highly valued, especially during incident response where ambiguity can cause delays and confusion.

Autism Challenges and Accommodations

  1. Navigating unexpected changes in priority or dealing with highly ambiguous, unstructured problems can be challenging; we provide clear prioritisation frameworks, structured incident response processes, and a supportive team to help deconstruct complex issues.
  2. Sensory overload from a busy incident 'swarm' or open-plan office can be an issue; we offer quiet zones, noise-cancelling headphones, and encourage asynchronous communication where appropriate to reduce constant interruptions.
  3. Social interactions, particularly with frustrated users, can be draining; we provide training on empathetic communication and clear guidelines for managing challenging conversations, and you'll have managers who can step in when needed.

Sensory Considerations

Our office environment is typically open-plan, which can sometimes be a bit noisy, especially during busy periods or incident calls. However, we do have dedicated quiet zones and meeting rooms you can use for focused work. Visually, our dashboards and tools are generally well-designed, but you'll be looking at a lot of text (logs, code) throughout the day. Socially, it's a collaborative team, with daily stand-ups and regular 'ticket swarms', so expect a fair amount of interaction, but we also respect focused individual work time.

Flexibility Notes

We believe in flexibility where it makes sense. We're open to discussing adjusted working hours, remote work options (though some on-call presence is required), and personalised workspace setups to ensure you're comfortable and productive. We're more interested in your ability to solve problems and contribute to the team than strict adherence to a traditional setup.

Key Responsibilities

Experience Levels Responsibilities

  1. Level: Mid-Level Professional (2-5 years)
  2. Responsibilities: Independently pick up and resolve L1 and L2 cloud support tickets from our queue, from initial diagnosis to final resolution. That means you'll own the problem end-to-end, usually without much hand-holding.
  3. Troubleshoot common issues across AWS, Azure, and GCP services, like 'why can't this EC2 instance connect to that database?' or 'why is this S3 bucket access denied?' You'll use monitoring tools and cloud consoles to dig into the problem.
  4. Contribute to our internal knowledge base (Confluence) by writing clear, step-by-step runbooks for recurring issues. If you fix it, document it, so the next person (or future you) doesn't have to start from scratch.
  5. Participate in incident response for P1 and P2 events, helping to diagnose the issue, communicate updates, and restore service. You'll be part of the 'ticket swarm' when things get really hairy.
  6. Identify patterns in recurring issues and propose solutions or preventative measures to your manager or the wider engineering team. Don't just fix it; think about how to stop it happening again.
  7. Provide informal guidance and support to newer team members or Associate Cloud Support Technicians. You'll be the person they come to when they're a bit stuck on a routine problem.
  8. Keep our monitoring dashboards (Grafana, Datadog) in check, making sure alerts are firing correctly and that you can quickly pull relevant metrics when troubleshooting. If an alert is 'flapping', you'll investigate why.
  9. Supervision: You'll have weekly check-ins with your Cloud Support Manager to discuss your workload, any blockers, and your development. For routine tasks, you're expected to work independently, but for novel or complex issues, you'll consult with your manager or a Senior Analyst.
  10. Decision: You can make routine technical decisions within established guidelines, like choosing the best troubleshooting steps or applying known fixes. Any changes that impact production systems, require significant resource allocation (e.g., spinning up new, expensive VMs), or deviate from standard operating procedures need manager approval. You'll escalate anything outside your comfort zone or defined scope.
  11. Success: You're doing well if you consistently meet your TTFR and SLA targets, your CSAT scores are high, and you're actively contributing to our knowledge base. We'll also be looking at your ability to independently resolve issues and your proactive approach to identifying and suggesting improvements.

Decision-Making Authority

Save 10-15 hours weekly with AI-powered Cloud Support tools

Imagine having a super-smart assistant that helps you cut through the noise, diagnose problems faster, and even draft your incident reports. That's what AI is bringing to cloud support, and we're embracing it. We're not talking about replacing you; we're talking about making your job less about the 'toil' and more about solving the really interesting, complex problems.

ID:

Tool: Automated Ticket Triage & Routing

Benefit: Imagine AI scanning incoming tickets, understanding 'EC2 instance down' or 'can't access S3 bucket', and then automatically setting the priority, categorising it (Compute, Storage, IAM), and assigning it to the right queue or even a specific person. This means less time wasted on manual sorting and more time actually fixing things.

ID:

Tool: Log Anomaly Detection & Correlation

Benefit: Our monitoring tools, powered by AI, can now sift through millions of log lines and metrics in real-time. They'll spot unusual patterns that often happen just before a major incident, like a sudden spike in CPU correlated with a specific error message. This helps you find the 'signal in the noise' much faster, often preventing outages before they even fully manifest.

ID:

Tool: Intelligent Runbook & Knowledge Search

Benefit: Picture an internal AI agent that can search across all our Confluence documentation, every past Jira ticket, and even relevant Slack conversations. You type in an error message, and it instantly gives you the most relevant troubleshooting steps, even suggesting similar past incidents. No more endless searching for that one obscure article.

ID: ✍️

Tool: First-Draft RCA & Incident Summaries

Benefit: After a major incident, AI can take all the raw data – the incident timeline from Jira, alert data from Datadog, key decisions from Slack – and generate a structured first draft of a Root Cause Analysis (RCA) or a clear, stakeholder-facing summary. This frees you up from the tedious documentation, letting you focus on the preventative actions.

10-15 hours per week Weekly time savings potential
£20-50/month (per user, for advanced AI tools) Typical tool investment
Explore AI Productivity for Cloud Support Analyst →

12-15 specific tools & techniques with implementation guides

Competency Requirements

Foundation Skills (Transferable)

These are the bedrock skills that let you do the job well, no matter how technical it gets. They're about how you think, how you talk, and how you deal with the unexpected. You'll need to be sharp, clear, and able to roll with the punches.

Functional Skills (Role-Specific Technical)

These are the specific technical skills and knowledge you'll need to hit the ground running in our cloud environments. It's a mix of hands-on tool use, understanding cloud concepts, and knowing the frameworks that keep us organised.

Technical Competencies

Digital Tools

Industry Knowledge

Regulatory Compliance Regulations

Essential Prerequisites

Career Pathway Context

These prerequisites mean you're not starting from zero. You've got some miles on the clock and understand the basics of cloud support. This role builds on that foundation, pushing you to take more ownership and tackle more complex problems. If you're coming from a traditional IT support role, you'll need to demonstrate your cloud-specific experience. If you're fresh out of a bootcamp, you might find our Associate Cloud Support Technician role (L1) a better fit to build up this experience first.

Qualifications & Credentials

Emerging Foundation Skills

Advancing Technical Skills

Future Skills Closing Note

The goal here isn't to turn you into a full-blown Cloud Architect overnight. It's about equipping you with the skills to tackle more complex support challenges, contribute to preventative work, and understand the bigger picture of our cloud operations. These skills will open doors to more senior roles, whether you stay in support or move into engineering.

Education Requirements

Experience Requirements

You'll need roughly 2-5 years of hands-on experience in a dedicated cloud support, DevOps support, or technical operations role. This means you've spent time actively troubleshooting issues in AWS, Azure, or GCP environments, not just using them at a basic level. We're looking for someone who has moved beyond just following a script and can independently diagnose and resolve common cloud infrastructure problems. Experience participating in incident response for P1/P2 events is a big plus.

Preferred Certifications

Recommended Activities

Career Progression Pathways

Entry Paths to This Role

Career Progression From This Role

Long Term Vision Potential Roles

Sector Mobility

The skills you'll gain as a Cloud Support Analyst are highly transferable across almost any industry that uses cloud computing. Whether you want to stay in tech, move into finance, healthcare, or e-commerce, your expertise in AWS, Azure, or GCP will be in high demand. Cloud is everywhere, and so will your opportunities be.

How Zavmo Delivers This Role's Development

DISCOVER Phase: Skills Gap Analysis

Zavmo maps your current competencies against all requirements in this job description through conversational assessment. We evaluate your foundation skills (communication, strategic thinking), functional skills (CRM expertise, negotiation), and readiness for career progression.

Output: Personalised skills gap heat map showing strengths and priorities, estimated time to competency, neurodiversity accommodations.

DISCUSS Phase: Personalised Learning Pathway

Based on your DISCOVER results, Zavmo creates a personalised learning plan prioritised by impact: foundation skills first, then functional skills. We adapt to your learning style, pace, and neurodiversity needs (ADHD, dyslexia, autism).

Output: Week-by-week schedule, each module linked to specific job responsibilities, checkpoints and milestones.

DELIVER Phase: Conversational Learning

Learn through conversation, not boring modules. Zavmo uses 10 conversation types (Socratic dialogue, role-play, coaching, case studies) to build competence. Practice difficult QBR presentations, negotiate tough renewals, and handle churn conversations in a safe AI environment before facing real clients.

Example: "For 'Stakeholder Mapping', Zavmo will guide you through analysing a complex enterprise account, identifying key decision-makers, and building an engagement strategy."

DEMONSTRATE Phase: Competency Assessment

Zavmo automatically builds your evidence portfolio as you learn. Every conversation, practice scenario, and application example is captured and mapped to NOS performance criteria. When ready, your portfolio supports OFQUAL qualification claims and demonstrates competence to employers.

Output: Competency matrix, evidence portfolio (downloadable), qualification readiness, career progression score.

Discover Your Skills Gap Explore Learning Paths