Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cloud Trace: ACE Exam Study Guide

Cloud Trace

Image source: Google Cloud Documentation

1. Overview

Cloud Trace is a managed distributed tracing service that collects latency data from your applications and visualizes it in the Google Cloud Console.

Primary Purpose: Understand application performance and identify latency bottlenecks in microservices architectures.

How it Works: Tracks how a single request travels through various services (frontend, backend, database) and records the time taken at each step.

2. Key Concepts

ConceptDescription
TraceComplete path (end-2-end) of a single request through your system
SpanSingle operation within a trace (e.g., RPC call, database query) with start/end timestamps
Root SpanFirst span in a trace, representing the initial request
Trace IDUnique identifier propagated between services via HTTP headers
Latency ProfileWaterfall chart showing where time was spent

3. Service Integration

Auto-Instrumented (No Setup Required)

  • App Engine (Standard and Flexible)
  • Cloud Run (basic tracing enabled by default)
  • Cloud Functions (basic tracing enabled by default)

Manual Instrumentation Required

  • Compute Engine VMs
  • GKE clusters
  • Internal Load Balancers (configurable)

Recommended SDK: OpenTelemetry - sends data to Trace API, supports multi-cloud (AWS, Azure).

4. Trace Context Propagation

When a request crosses service boundaries, the trace context must be propagated:

  • Header: X-Cloud-Trace-Context
  • Format: TRACE_ID/SPAN_ID;o=TRACE_TRUE
  • The receiving service continues the trace instead of starting a new one

5. Features and Analysis

  • Trace Explorer: Search and visualize individual traces. Filter by URI, latency, or status code.
  • Analysis Reports: Periodic reports comparing performance across versions or time periods.
  • Bottleneck Detection: Identifies which operation causes the most delay.
  • Waterfall Charts: Displays sequence and duration of spans.
  • Screenshots: Capture trace views for documentation.

6. Retention and Limits

SettingValue
Data retention30 days (default) / 90 days (extended)
Free tier10 traces/second
Sampling rateConfigurable to control costs

7. Cloud Trace vs Other Cloud Operations Tools

ServiceQuestion AnsweredData Type
Cloud Logging“What happened?”Text events, logs
Cloud Monitoring“How is the system performing?”Numerical metrics
Cloud Trace“Where is the delay?”Latency across services
Cloud Profiler“Which code causes latency?”CPU/memory within a service

Key Distinction:

  • Trace = Latency between services (request flow)
  • Profiler = Latency within a service (code-level)

8. When to Use Cloud Trace

Use Cloud Trace when:

  • Troubleshooting latency across microservices
  • Identifying which service in a chain is slowing down requests
  • Comparing performance between deployments
  • Monitoring distributed tracing in production

Do NOT use Cloud Trace when:

  • Single monolithic application (use Cloud Profiler instead)
  • Real-time alerting needed (use Cloud Monitoring)
  • Log analysis required (use Cloud Logging)

9. Security and IAM

RolePermission
roles/cloudtrace.adminFull control over Cloud Trace resources
roles/cloudtrace.userSend trace data to the API (for applications)
roles/cloudtrace.viewerView trace data and reports in console

10. Essential gcloud Commands

  • Check API Status

    gcloud services list --enabled | grep cloudtrace
    
  • List recent traces (alpha)

    gcloud alpha trace slices list --project=[PROJECT_ID]
    

11. Quick Reference Summary

FeatureValue
TraceComplete request path through services
SpanSingle operation with timestamps
Propagation headerX-Cloud-Trace-Context
Auto-instrumentedApp Engine, Cloud Run, Cloud Functions
Manual setup neededCompute Engine, GKE
Recommended SDKOpenTelemetry
Data retention30 days (default)
Answers the question“Where is the delay?”

12. Comparison Diagram

Cloud Trace vs Cloud Logging vs Cloud Monitoring

                          ┌──────────────────────────────────┐
                          │        Cloud Operations Suite    │
                          │   (Observability Stack in GCP)   │
                          └──────────────────────────────────┘
                                           │
       ┌───────────────────────────────────┼───────────────────────────────────┐
       │                                   │                                   │
       ▼                                   ▼                                   ▼
┌──────────────────────────┐  ┌──────────────────────────┐  ┌──────────────────────────┐
│      Cloud Logging       │  │     Cloud Monitoring     │  │        Cloud Trace       │
└──────────────────────────┘  └──────────────────────────┘  └──────────────────────────┘
│ What it captures:        │  │ What it captures:        │  │ What it captures:        │
│ • Text logs              │  │ • Metrics (CPU, RAM,     │  │ • Latency of requests    │
│ • Structured logs        │  │   QPS, errors, custom)   │  │ • Request flow across    │
│ • Application events     │  │ • SLOs, SLIs, alerts     │  │   microservices          │
│ • Error messages         │  │ • Dashboards             │  │ • Spans & trace IDs      │
└──────────────────────────┘  └──────────────────────────┘  └──────────────────────────┘
│ Answers the question:    │  │ Answers the question:    │  │ Answers the question:    │
│ “What happened?”         │  │ “How is the system       │  │ “Where is the delay?”    │
│                          │  │ performing?”             │  │                          │
└──────────────────────────┘  └──────────────────────────┘  └──────────────────────────┘
│ Typical use cases:       │  │ Typical use cases:       │  │ Typical use cases:       │
│ • Debugging errors       │  │ • Alerting on high CPU   │  │ • Troubleshooting slow   │
│ • Viewing logs per       │  │ • Monitoring uptime      │  │   requests               │
│   service or request     │  │ • SLO compliance         │  │ • Identifying bottleneck │
│ • Log-based metrics      │  │ • Trend analysis         │  │   microservices          │
└──────────────────────────┘  └──────────────────────────┘  └──────────────────────────┘
                                             │
                         ┌───────────────────┼───────────────────┐
                         │                   │                   │
                         ▼                   ▼                   ▼
                   ┌──────────────────────────────────────────────────┐
                   │   Combined View: Observability Workflow in GCP   │
                   └──────────────────────────────────────────────────┘
                   │ Logs show **what happened**                      │
                   │ Metrics show **system health**                   │
                   │ Traces show **where latency occurs**             │
                   └──────────────────────────────────────────────────┘