Metric Collection Strategies¶

Overview¶

This document explains the different strategies used for collecting metrics efficiently while respecting API limits and data freshness requirements.

Update Tiers¶

FAST Tier (60 seconds)¶

Purpose: Real-time or near real-time metrics that change frequently.

Metrics: - MT sensor readings (temperature, humidity, CO2, etc.) - Environmental conditions that need quick alerting

Strategy:

@register_collector(UpdateTier.FAST)
class MTSensorCollector(MetricCollector):
    """Collects frequently-changing sensor data."""

Rationale: - Sensor data can trigger alerts (e.g., temperature threshold) - 60s aligns with typical monitoring dashboards - API can handle this frequency for sensor endpoints

MEDIUM Tier (300 seconds / 5 minutes)¶

Purpose: Operational metrics aligned with Meraki's data aggregation.

Metrics: - Device status and availability - Client counts and usage - Network health metrics - Wireless performance data - Alert states

Strategy:

@register_collector(UpdateTier.MEDIUM)
class DeviceCollector(MetricCollector):
    """Collects device operational metrics."""

Rationale: - Meraki aggregates data in 5-minute windows - Balances freshness with API efficiency - Most operational decisions work with 5-minute granularity

SLOW Tier (900 seconds / 15 minutes)¶

Purpose: Configuration and slowly-changing administrative data.

Metrics: - License information - Configuration change tracking - API usage statistics - Organization settings

Strategy:

@register_collector(UpdateTier.SLOW)
class ConfigCollector(MetricCollector):
    """Collects configuration metrics."""

Rationale: - Configuration rarely changes - Reduces unnecessary API calls - Still fresh enough for compliance monitoring

Collection Patterns¶

1. Batch Collection Pattern¶

Used when collecting metrics for multiple items of the same type.

async def _collect_impl(self) -> None:
    """Batch collection example."""
    organizations = await self._fetch_organizations()

    # Process in batches to avoid overwhelming API
    for i in range(0, len(devices), self.settings.api.batch_size):
        batch = devices[i:i + self.settings.api.batch_size]

        # Process batch concurrently
        async with ManagedTaskGroup("device_batch") as group:
            for device in batch:
                await group.create_task(
                    self._collect_device_metrics(device)
                )

        # Delay between batches
        if i + self.settings.api.batch_size < len(devices):
            await asyncio.sleep(self.settings.api.batch_delay)

2. Hierarchical Collection Pattern¶

Used when data has parent-child relationships.

async def _collect_impl(self) -> None:
    """Hierarchical collection example."""
    # Level 1: Organizations
    for org in organizations:
        # Level 2: Networks
        networks = await self._fetch_networks(org["id"])

        for network in networks:
            # Level 3: Devices
            devices = await self._fetch_devices(network["id"])

            # Collect metrics at appropriate level
            self._set_network_metrics(network, len(devices))

3. Aggregation Pattern¶

Used when API provides pre-aggregated data.

async def collect_client_overview(self, org_id: str) -> None:
    """Use pre-aggregated data from API."""
    # API returns aggregated client counts
    overview = await self.api.organizations.getOrganizationClientsOverview(
        org_id,
        timespan=300  # Last 5 minutes
    )

    # Direct mapping to metrics
    self._clients_count.labels(
        org_id=org_id,
        client_type="wireless"
    ).set(overview["counts"]["wireless"])

4. Time-Series Collection Pattern¶

Used for historical data with time windows.

async def collect_usage_history(self, serial: str) -> None:
    """Collect time-series data."""
    # Get last 5 minutes of data
    usage = await self.api.devices.getDeviceUsageHistory(
        serial,
        timespan=300
    )

    # Process latest data point
    if usage:
        latest = usage[-1]  # Most recent
        self._set_usage_metrics(serial, latest)

Optimization Strategies¶

1. Caching for Slowly-Changing Data¶

class DeviceCollector:
    def __init__(self):
        self._device_cache: dict[str, Device] = {}
        self._cache_timestamp = 0

    async def _get_devices(self, org_id: str) -> list[Device]:
        # Cache for 5 minutes
        if time.time() - self._cache_timestamp < 300:
            return list(self._device_cache.values())

        devices = await self._fetch_devices(org_id)
        self._update_cache(devices)
        return devices

2. Conditional Collection¶

Skip collection when data won't have changed:

async def collect_licenses(self, org_id: str) -> None:
    """Only collect if sufficient time has passed."""
    last_check = self._last_license_check.get(org_id, 0)

    # Skip if checked recently (within 1 hour)
    if time.time() - last_check < 3600:
        logger.debug("Skipping license check", org_id=org_id)
        return

    # Proceed with collection
    licenses = await self._fetch_licenses(org_id)
    self._last_license_check[org_id] = time.time()

3. Partial Failure Handling¶

Continue collection even if some items fail:

async def collect_all_devices(self) -> None:
    """Collect with partial failure tolerance."""
    success_count = 0
    error_count = 0

    for device in devices:
        try:
            await self._collect_device_metrics(device)
            success_count += 1
        except Exception as e:
            error_count += 1
            logger.warning(
                "Failed to collect device metrics",
                serial=device["serial"],
                error=str(e)
            )

    logger.info(
        "Device collection complete",
        success=success_count,
        errors=error_count
    )

Choosing the Right Strategy¶

Use FAST Tier When:¶

Data changes rapidly (< 5 minutes)
Real-time alerting is needed
API endpoint supports high frequency

Use MEDIUM Tier When:¶

Data aligns with 5-minute aggregation
Operational monitoring use case
Balance between freshness and efficiency

Use SLOW Tier When:¶

Data rarely changes
Configuration or administrative data
API calls are expensive

Use Batch Collection When:¶

Many similar items to process
Independent operations
Need to manage API rate limits

Use Hierarchical Collection When:¶

Data has natural parent-child relationships
Need organizational context
Metrics aggregate up the hierarchy

Performance Considerations¶

API Rate Limits: Meraki allows 10 requests/second per org
Memory Usage: Large batches consume more memory
Timeout Risk: Long-running collections may timeout
Error Propagation: Partial failures shouldn't stop all collection

Best Practices¶

Always use error handling decorators
Log collection summaries at INFO level
Track API calls for rate limit monitoring
Validate API responses before processing
Use appropriate batch sizes (10-50 items)
Add delays between batches
Consider caching for expensive operations
Monitor collector performance metrics