<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>danhenderson.cloud</title><description>Enterprise Architecture &amp; Technology Blog</description><link>https://danhenderson.cloud/</link><item><title>Natural Language Analytics on Google Cloud: BigQuery, Cloud Run, and Gemini Flash</title><link>https://danhenderson.cloud/blog/bigquery-cloud-run-gemini-natural-language-analytics/</link><guid isPermaLink="true">https://danhenderson.cloud/blog/bigquery-cloud-run-gemini-natural-language-analytics/</guid><description>A hands-on guide to building a production natural language analytics API on Google Cloud — BigQuery, Cloud Run, Terraform IaC, Workload Identity, and Gemini Flash for NL-to-SQL translation.</description><pubDate>Sat, 06 Jun 2026 23:00:00 GMT</pubDate><content:encoded>Most data teams spend more time writing SQL than analysing results. Large language models have made NL-to-SQL viable enough to build services around — the interesting problem is no longer whether it works, it is how to build it so you can operate, secure, and extend it.

This post walks through a natural language analytics API on Google Cloud: a FastAPI service on Cloud Run that accepts a plain-English question, translates it to BigQuery SQL using Gemini Flash, runs a dry-run cost gate before executing, and returns structured JSON. All infrastructure in Terraform. Identity through Workload Identity. Zero idle cost.

Four constraints shaped every decision: no plaintext secrets in code or containers, all infrastructure changes through Terraform, IAM bindings as narrow as possible, and zero cost when idle.

---

## The Analytics API

![End-to-end architecture — NL question enters Cloud Run, Gemini Flash translates it to SQL, BigQuery validates and executes it, structured JSON returns](./nl-analytics-architecture.svg)

The service exposes a single `/query` POST endpoint. A question arrives, Gemini Flash translates it to SQL using a system instruction containing the full table schema, BigQuery runs a dry-run to validate syntax and price the query before executing, and the result returns as JSON with the generated SQL included.

The target dataset is `bigquery-public-data.thelook_ecommerce` — a public Google-maintained e-commerce dataset with orders, products, users, and order items. Because it is public, no dataset-level IAM permissions are needed on the service account.

---

## Translating Questions to SQL

The quality of generated SQL depends on how precisely you describe the schema. Full table definitions live in the system instruction — the user message stays as the raw question:

```python
DATASET = &quot;bigquery-public-data.thelook_ecommerce&quot;

SYSTEM_INSTRUCTION = f&quot;&quot;&quot;You are a BigQuery SQL expert. Translate plain-English questions into valid BigQuery SQL queries.

Dataset: `{DATASET}`

Tables:
- `orders` — order_id INT, user_id INT, status STRING, gender STRING, created_at TIMESTAMP, returned_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, num_of_item INT
- `order_items` — id INT, order_id INT, user_id INT, product_id INT, status STRING, created_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, returned_at TIMESTAMP, sale_price FLOAT
- `products` — id INT, cost FLOAT, category STRING, name STRING, brand STRING, retail_price FLOAT, department STRING, sku STRING
- `users` — id INT, first_name STRING, last_name STRING, age INT, gender STRING, state STRING, city STRING, country STRING, created_at TIMESTAMP

Rules:
- Use fully qualified table names: `{DATASET}.table_name`
- Return only the SQL — no markdown, no explanation, no code fences
- Use LIMIT 100 for row-returning queries; omit it for aggregations
- Use standard SQL only (not legacy SQL)
&quot;&quot;&quot;
```

`translate_to_sql` calls `generate_content` and defensively strips any Markdown code fence the model wraps around the output, which happens even with an explicit instruction not to:

```python
def translate_to_sql(question: str) -&gt; str:
    response = gemini_model.generate_content(question)
    sql = response.text.strip()
    sql = re.sub(r&quot;^```(?:sql)?\n?&quot;, &quot;&quot;, sql, flags=re.IGNORECASE)
    sql = re.sub(r&quot;\n?```$&quot;, &quot;&quot;, sql)
    return sql.strip()
```

---

## Querying BigQuery Safely

Before every execution, the service runs a dry run. BigQuery validates syntax, resolves table references, and returns the bytes the query would scan — without executing it or incurring any cost:

```python
def run_query(sql: str) -&gt; tuple[list[dict], int]:
    dry_run_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    dry_job = bq_client.query(sql, job_config=dry_run_config)
    bytes_to_scan = dry_job.total_bytes_processed

    if bytes_to_scan &gt; 1_000_000_000:  # 1 GB ceiling
        raise ValueError(
            f&quot;Query would scan {bytes_to_scan / 1e9:.1f} GB — exceeds the 1 GB limit&quot;
        )

    job = bq_client.query(sql)
    rows = [dict(row) for row in job.result()]
    return rows, bytes_to_scan
```

The 1 GB ceiling catches cross-joins or missing date filters before cost is incurred, and acts as a sanity check on the translation. `dict(row)` is required — BigQuery `Row` objects are not directly JSON-serialisable.

---

## The FastAPI Service

Both clients initialise once at startup through the lifespan context manager:

```python
@asynccontextmanager
async def lifespan(app: FastAPI):
    global bq_client, gemini_model
    genai.configure(api_key=os.environ[&quot;GEMINI_API_KEY&quot;])
    bq_client = bigquery.Client()
    gemini_model = genai.GenerativeModel(
        model_name=&quot;gemini-1.5-flash&quot;,
        system_instruction=SYSTEM_INSTRUCTION,
    )
    yield

app = FastAPI(title=&quot;BigQuery Natural Language API&quot;, lifespan=lifespan)
```

`bigquery.Client()` uses Application Default Credentials — on Cloud Run it resolves to the Workload Identity service account; locally it uses `gcloud auth application-default login`. `genai.configure` reads `GEMINI_API_KEY`, which Cloud Run injects from Secret Manager at container startup.

The query endpoint wires the two functions together:

```python
class QueryRequest(BaseModel):
    question: str

class QueryResponse(BaseModel):
    question: str
    sql: str
    rows: list[dict]
    bytes_scanned: int
    latency_ms: int

@app.post(&quot;/query&quot;, response_model=QueryResponse)
async def query(request: QueryRequest) -&gt; QueryResponse:
    if len(request.question) &gt; 500:
        raise HTTPException(status_code=400, detail=&quot;Question exceeds 500 character limit&quot;)

    start = time.time()
    sql = translate_to_sql(request.question)

    try:
        rows, bytes_scanned = run_query(sql)
    except ValueError as exc:
        raise HTTPException(status_code=400, detail=str(exc))
    except Exception:
        log_structured(&quot;ERROR&quot;, &quot;Query execution failed&quot;, sql=sql)
        raise HTTPException(status_code=500, detail=&quot;Query execution failed&quot;)

    latency_ms = round((time.time() - start) * 1000)
    log_structured(
        &quot;INFO&quot;, &quot;Query complete&quot;,
        question_length=len(request.question),
        bytes_scanned=bytes_scanned,
        row_count=len(rows),
        latency_ms=latency_ms,
    )
    return QueryResponse(
        question=request.question,
        sql=sql,
        rows=rows,
        bytes_scanned=bytes_scanned,
        latency_ms=latency_ms,
    )

def log_structured(severity: str, message: str, **kwargs) -&gt; None:
    print(json.dumps({&quot;severity&quot;: severity, &quot;message&quot;: message, **kwargs}))
```

The 500-character limit reduces prompt injection surface. `ValueError` from the dry-run gate surfaces as a 400 with a useful message; unexpected execution failures return 500 without internal detail. `log_structured` is one line — Cloud Run captures stdout and Cloud Logging parses every JSON field as structured, queryable data.

&lt;blockquote class=&quot;question-box&quot;&gt;
📸 Screenshot: the service running locally with &lt;code&gt;uvicorn main:app --reload&lt;/code&gt; — POST &lt;code&gt;{&quot;question&quot;: &quot;What are the top 5 product categories by total revenue?&quot;}&lt;/code&gt; to &lt;code&gt;/query&lt;/code&gt; and see the generated SQL and result rows in the JSON response.
&lt;/blockquote&gt;

---

## The GCP Infrastructure with Terraform

Infrastructure before code — run `terraform apply` before the first container image exists.

API enablement first:

```hcl
resource &quot;google_project_service&quot; &quot;run&quot;              { service = &quot;run.googleapis.com&quot;              }
resource &quot;google_project_service&quot; &quot;bigquery&quot;         { service = &quot;bigquery.googleapis.com&quot;         }
resource &quot;google_project_service&quot; &quot;secretmanager&quot;    { service = &quot;secretmanager.googleapis.com&quot;    }
resource &quot;google_project_service&quot; &quot;artifactregistry&quot; { service = &quot;artifactregistry.googleapis.com&quot; }
resource &quot;google_project_service&quot; &quot;cloudbuild&quot;       { service = &quot;cloudbuild.googleapis.com&quot;       }
```

The Gemini API key goes into Secret Manager before Cloud Run is deployed. The value is passed at apply time and never appears in the Terraform configuration as plaintext:

```hcl
resource &quot;google_secret_manager_secret&quot; &quot;gemini_key&quot; {
  secret_id = &quot;gemini-api-key&quot;
  replication { auto {} }
}

resource &quot;google_secret_manager_secret_version&quot; &quot;gemini_key&quot; {
  secret      = google_secret_manager_secret.gemini_key.id
  secret_data = var.gemini_api_key
}
```

The Cloud Run service wires identity, container, and secret injection together:

```hcl
resource &quot;google_cloud_run_v2_service&quot; &quot;analytics_api&quot; {
  name     = &quot;nl-analytics-api&quot;
  location = var.region

  template {
    service_account = google_service_account.analytics_api.email

    containers {
      image = &quot;${var.region}-docker.pkg.dev/${var.project_id}/nl-analytics/app:${var.image_tag}&quot;

      env {
        name = &quot;GEMINI_API_KEY&quot;
        value_source {
          secret_key_ref {
            secret  = google_secret_manager_secret.gemini_key.secret_id
            version = &quot;latest&quot;
          }
        }
      }
    }

    scaling {
      min_instance_count = 0
      max_instance_count = 10
    }
  }
}
```

`min_instance_count = 0` means no idle cost. `secret_key_ref` resolves the key from Secret Manager at startup — it never appears in `terraform.tfstate`, the Cloud Run console view, or deployment logs.

```bash
terraform init -backend-config=&quot;bucket=YOUR_PROJECT_ID-tfstate&quot;
terraform apply -var=&quot;project_id=YOUR_PROJECT_ID&quot; \
                -var=&quot;region=us-central1&quot; \
                -var=&quot;image_tag=initial&quot; \
                -var=&quot;gemini_api_key=YOUR_KEY&quot;
```

&lt;blockquote class=&quot;question-box&quot;&gt;
📸 Screenshot: &lt;code&gt;terraform apply&lt;/code&gt; output showing the resources created — service account, IAM bindings, secret, Cloud Run service — then the Cloud Run service in the GCP console with the service account visible in the service configuration.
&lt;/blockquote&gt;

---

## IAM and Workload Identity

![IAM binding model — the service account has two bindings, one at project scope for BigQuery jobs and one at resource scope for the Secret Manager secret](./nl-analytics-iam.svg)

The service account has two IAM bindings:

```hcl
resource &quot;google_service_account&quot; &quot;analytics_api&quot; {
  account_id   = &quot;nl-analytics-sa&quot;
  display_name = &quot;NL Analytics API&quot;
}

# Submit and read BigQuery query jobs — must be at project level
resource &quot;google_project_iam_member&quot; &quot;bq_job_user&quot; {
  project = var.project_id
  role    = &quot;roles/bigquery.jobUser&quot;
  member  = &quot;serviceAccount:${google_service_account.analytics_api.email}&quot;
}

# Read one specific secret — scoped to the resource, not the project
resource &quot;google_secret_manager_secret_iam_member&quot; &quot;gemini_key_accessor&quot; {
  secret_id = google_secret_manager_secret.gemini_key.secret_id
  role      = &quot;roles/secretmanager.secretAccessor&quot;
  member    = &quot;serviceAccount:${google_service_account.analytics_api.email}&quot;
}
```

`roles/bigquery.jobUser` must be at project level — BigQuery jobs belong to the project, not to a dataset or table, so there is no narrower resource to bind it to. The role allows creating and reading your own query jobs, nothing else — no dataset access, no storage, no other GCP services. `roles/secretmanager.secretAccessor` is scoped to the specific secret resource.

Workload Identity means the Cloud Run service runs *as* `nl-analytics-sa` with no credential file. The GCP metadata server issues short-lived tokens automatically, refreshing every hour. `bigquery.Client()` reads them transparently — there is no JSON key file to provision, store, or risk committing to version control.

&lt;blockquote class=&quot;question-box&quot;&gt;
📸 Screenshot: the IAM page in Cloud Console showing &lt;code&gt;nl-analytics-sa&lt;/code&gt; with exactly two roles — BigQuery Job User (project scope) and Secret Manager Secret Accessor (scoped to the specific secret resource).
&lt;/blockquote&gt;

---

## Automating Deployment with Cloud Build

Four steps: test, build, push, deploy. Every push to main runs all four.

```yaml
steps:
  - name: python:3.12
    entrypoint: bash
    args:
      - &apos;-c&apos;
      - &apos;pip install -r requirements.txt -r requirements-dev.txt &amp;&amp; python -m pytest tests/ -v&apos;

  - name: gcr.io/cloud-builders/docker
    args: [&apos;build&apos;, &apos;-t&apos;, &apos;$_IMAGE&apos;, &apos;.&apos;]

  - name: gcr.io/cloud-builders/docker
    args: [&apos;push&apos;, &apos;$_IMAGE&apos;]

  - name: hashicorp/terraform:1.8
    entrypoint: sh
    args:
      - &apos;-c&apos;
      - |
        terraform -chdir=terraform init \
          -backend-config=&quot;bucket=${PROJECT_ID}-tfstate&quot; \
          -backend-config=&quot;prefix=nl-analytics&quot;
        terraform -chdir=terraform apply -auto-approve \
          -var=&quot;project_id=${PROJECT_ID}&quot; \
          -var=&quot;region=${_REGION}&quot; \
          -var=&quot;image_tag=${SHORT_SHA}&quot;

substitutions:
  _IMAGE: ${_REGION}-docker.pkg.dev/${PROJECT_ID}/nl-analytics/app:${SHORT_SHA}
  _REGION: europe-west1

options:
  logging: CLOUD_LOGGING_ONLY
```

`SHORT_SHA` tags each image with the commit that produced it, making every Cloud Run revision traceable to the exact source. The Gemini API key is not passed to the pipeline — it was provisioned into Secret Manager by the initial `terraform apply`. The deploy step only updates the image tag on the existing service.

&lt;blockquote class=&quot;question-box&quot;&gt;
📸 Screenshot: Cloud Build history showing a successful pipeline run — four steps, all green, with the build duration and trigger commit visible.
&lt;/blockquote&gt;

---

## Running and Verifying

Health check first:

```bash
curl https://YOUR_CLOUD_RUN_URL/health
# → {&quot;status&quot;: &quot;ok&quot;}
```

Then a natural language query:

```bash
curl -X POST https://YOUR_CLOUD_RUN_URL/query \
  -H &quot;Content-Type: application/json&quot; \
  -d &apos;{&quot;question&quot;: &quot;What are the top 5 product categories by total revenue?&quot;}&apos;
```

The response includes the generated SQL alongside the results:

```json
{
  &quot;question&quot;: &quot;What are the top 5 product categories by total revenue?&quot;,
  &quot;sql&quot;: &quot;SELECT p.category, SUM(oi.sale_price) AS total_revenue\nFROM `bigquery-public-data.thelook_ecommerce.order_items` oi\nJOIN `bigquery-public-data.thelook_ecommerce.products` p\nON oi.product_id = p.id\nWHERE oi.status NOT IN (&apos;Cancelled&apos;, &apos;Returned&apos;)\nGROUP BY p.category\nORDER BY total_revenue DESC\nLIMIT 5&quot;,
  &quot;rows&quot;: [
    { &quot;category&quot;: &quot;Outerwear &amp; Coats&quot;,             &quot;total_revenue&quot;: 2847391.23 },
    { &quot;category&quot;: &quot;Jeans&quot;,                         &quot;total_revenue&quot;: 2614820.57 },
    { &quot;category&quot;: &quot;Suits &amp; Sport Coats&quot;,           &quot;total_revenue&quot;: 2398104.91 },
    { &quot;category&quot;: &quot;Swim&quot;,                          &quot;total_revenue&quot;: 2144739.84 },
    { &quot;category&quot;: &quot;Fashion Hoodies &amp; Sweatshirts&quot;, &quot;total_revenue&quot;: 1987623.19 }
  ],
  &quot;bytes_scanned&quot;: 41943040,
  &quot;latency_ms&quot;: 1843
}
```

Pull structured logs to verify observability is working:

```bash
gcloud logging read \
  &apos;resource.type=&quot;cloud_run_revision&quot; AND jsonPayload.message=&quot;Query complete&quot;&apos; \
  --limit=5 --format=json
```

&lt;blockquote class=&quot;question-box&quot;&gt;
📸 Screenshot: the JSON response from Cloud Run showing the generated SQL and result rows, the BigQuery job history in Cloud Console with bytes billed visible, and the Cloud Logging entry with &lt;code&gt;bytes_scanned&lt;/code&gt; and &lt;code&gt;latency_ms&lt;/code&gt; as structured fields.
&lt;/blockquote&gt;

---

## Where to Take It Next

**Replace the API key with Vertex AI** — swap `google.generativeai` for `vertexai.generative_models.GenerativeModel`, initialised with `vertexai.init(project=os.environ[&quot;PROJECT_ID&quot;], location=&quot;us-central1&quot;)`. Authentication moves entirely to ADC: the same Workload Identity token that authenticates BigQuery calls also authenticates Vertex AI. Remove the Secret Manager secret and `GEMINI_API_KEY` from the Cloud Run definition, add `roles/aiplatform.user` to the service account. No API key to provision, rotate, or track.

**Runtime schema introspection** — query `INFORMATION_SCHEMA.COLUMNS` at service startup to build the schema context from the live dataset definition. The system instruction stays accurate as tables or columns change, without a code deployment to update the prompt.

---

## References

[^1]: Google Cloud. *BigQuery public datasets*. https://cloud.google.com/bigquery/public-data
[^2]: Google Cloud. *BigQuery dry run queries*. https://cloud.google.com/bigquery/docs/dry-run-queries
[^3]: Google Cloud. *BigQuery pricing and the free tier*. https://cloud.google.com/bigquery/pricing
[^4]: Google. *Gemini 1.5 Flash model documentation*. https://ai.google.dev/gemini-api/docs/models/gemini
[^5]: HashiCorp. *Terraform Google Cloud Provider — google_cloud_run_v2_service*. https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/cloud_run_v2_service
[^6]: Google Cloud. *Workload Identity for Cloud Run*. https://cloud.google.com/run/docs/securing/service-identity
[^7]: Google Cloud. *Understanding BigQuery roles*. https://cloud.google.com/bigquery/docs/access-control
[^8]: Google AI Studio. *Get a Gemini API key*. https://aistudio.google.com</content:encoded></item><item><title>From Assistant to Agent: What Antigravity Means for Enterprise Development Governance</title><link>https://danhenderson.cloud/blog/antigravity-enterprise-agentic-development/</link><guid isPermaLink="true">https://danhenderson.cloud/blog/antigravity-enterprise-agentic-development/</guid><description>Enterprise coding governance was built for humans. Antigravity lets an AI agent plan, execute, and commit code autonomously. This pivotal shift necessitates a new architectural blueprint.</description><pubDate>Sat, 23 May 2026 23:00:00 GMT</pubDate><content:encoded>Google unveiled further Antigravity updates at I/O this week. From what I&apos;ve seen, most coverage frames Antigravity as a faster, smarter coding assistant, a natural successor to Gemini CLI (more on that shortly). While that&apos;s certainly true, I feel it misses the more compelling story.

We&apos;re now moving from AI assisted development, where a human still authors every line, to AI agentic development, where an agent plans the work, writes the code, calls tools, and commits. While the human in the loop still reviews, they no longer write the code themselves.

This distinction is crucial for enterprise architects because every governance model for software development code review, IP ownership, secrets management, audit trails, compliance assumes human authorship. None of those frameworks were designed with an autonomous agent in mind.

&lt;blockquote class=&quot;question-box&quot;&gt;
If an AI agent can now plan, write, and commit code autonomously within your enterprise environment, how do we redesign our development governance to treat the agent as a full participant in the SDLC, rather than just another tool?
&lt;/blockquote&gt;

## What Antigravity Actually Is

Antigravity is not simply a better coding assistant. It is an agent first development platform with three key components: a desktop application (now 2.0, as announced at I/O) with a multi-agent orchestration UI, Antigravity CLI rebuilt in Go that will replace Gemini CLI, and an SDK for building custom agents.[^1] All three share the same execution harness, so improvements apply across all of them. An agent that can plan multi step tasks, invoke tools, spawn subagents, and execute across a full workspace is a meaningfully different proposition from a mere code suggestion engine. For engineering teams who get the governance right, this represents a significant unlock.

Google has set a deadline for transitioning from Gemini CLI to Antigravity CLI, with a firm date of June 18, 2026, for Google free tier, AI Pro and Ultra subscription users.[^2] For enterprises still running Gemini CLI, however, that deadline appears to be on hold for now, with Google committing to keep Gemini CLI accessible via paid Gemini and Gemini Enterprise Agent Platform API keys for the foreseeable future.

Nonetheless, engineering teams are already beginning this transition, which means the governance conversation is already lagging behind the adoption curve.

The desktop application offers some distinct advantages. Unlike the CLI, it provides a visual agent manager and background task scheduling, capabilities that significantly lower the barrier to starting an agentic session. This is incredibly useful. It also means agentic execution can reach a much broader set of users, including those who might never have engaged with the CLI.

The Antigravity SDK is what makes this architecturally significant. It allows organisations to build agents on the same execution harness, integrated with Google Cloud projects.[^3] This is the surface where production workloads will eventually run, and it is the one that demands the most governance clarity before it is enabled. Crucially, it is also the one with the most potential once that clarity exists.

Google has built platform level security into Antigravity: terminal sandboxing, credential masking, hardened Git policies, and a permission system that controls what the agent can do. However, none of that is a substitute for comprehensive enterprise governance. Sandboxing limits accidental leakage within a session. It does not define who owns agent-generated code, where inference runs for regulated workloads, or how agent actions are audited across the organisation. 


## The Enterprise Governance Problem

Enterprise software governance was designed for humans. Every control assumes that a person made a decision: a developer wrote the code, a reviewer approved it, a team owns the change. Engineering teams are beginning to migrate to Antigravity, regardless of whether that conversation has happened. When an agent is doing the writing, many of those traditional controls simply do not apply. They were built for a world where every decision had a named human behind it.

![The governance gap , an ungoverned agent reads the full codebase and routes reasoning externally with no controls](./antigravity-governance-gap.png)
*Figure 1: Antigravity ships three surfaces. Each surface has a different governance gap enterprises need to close.*

The most immediate concern is IP and secrets. An agent reads the full repository context to do its job, and enterprise codebases contain far more than just application code: credentials, internal API contracts, architecture decisions, and proprietary business logic. Feeding such sensitive data into an external inference endpoint requires a robust data handling policy that most organisations simply do not yet possess. This is not a gap that sits quietly; it is the question a procurement team or a legal review will ask on day one, and it is considerably more comfortable to answer before the agents are running than after.

Directly connected is the authorship problem. An agent that generates code which passes review and ships to production creates a chain of accountability that nobody has yet worked out how to close. Compliance teams will not ignore this; the audit will certainly find it.

Less visible but equally consequential is data residency. When an agent reasons over enterprise code, that reasoning happens somewhere. Antigravity routes inference through Google&apos;s infrastructure.[^4] For regulated industries, where that processing occurs is a governance requirement, not a preference, and it needs to be defined before agents are enabled. In practice, most organisations discover this requirement during the first audit rather than before it.

## The Agentic SDLC Framework

The governance framework exists to enable Antigravity adoption safely, not to slow it down. Teams that define it before rollout are in a position to expand their use of the platform quickly and with confidence. In my view, this framework has four key components:

![Agentic SDLC governance framework — four layers: scope, inference boundary, review gate, and observability](./antigravity-sdlc-governance.png)
*Figure 2: The agentic SDLC governance framework: scope, inference boundary, review gate, and observability.*

The first decision is **scope**: what the agent can access. This covers repository permissions, credential access, and cloud resource scope. The principle of least privilege applies to agents just as it does to service accounts; an agent working on one service has no business reading the full codebase. 

Directly connected to scope is the **inference boundary**, which determines where the reasoning actually happens. Which workloads are cleared to route inference through external infrastructure? What is the policy for repositories with regulated data? These answers need to exist before any agent session starts. They are incredibly difficult to define retrospectively once a pipeline is running.

The **review gate** determines how agent-generated code enters the SDLC. Crucially, the gate should not change simply because the author is not human. Agent-generated code should go through the exact same rigorous review and testing pipeline as human-authored code. Its job is to verify that the standard is met, full stop; it is not to flag the authorship.

Underpinning all of this is **observability**: a log of every agent action with enough fidelity to reconstruct what the agent did, why it did it, and what it touched. This is not optional in compliance environments. It is also the most useful signal for understanding where agent autonomy is working effectively and where it is not, and that signal is only available if the logging is in place from the very start. Most teams discover this when they need it for an audit and realise they haven&apos;t got it.

![Agent governance controls mapped to the Antigravity development lifecycle](./antigravity-agent-sdlc-controls.svg)
*Figure 3: Where each governance control sits in an Antigravity agent session, from workspace access through to human review.*

Scope and the inference boundary are pre session decisions: by the time an agent starts, both must already be in effect. The review gate and observability need to be wired into the pipeline before agents reach production. Antigravity&apos;s built in sandboxing and credential masking operate at the session level and do not substitute for these four crucial controls.

## What Architects Should Do Now

*   **Decouple the migration from agent enablement.** The June 18, 2026 deadline requires moving engineering teams from Gemini CLI to Antigravity CLI for Google AI Pro and Ultra and free tier users, although enterprises have a little while longer to migrate. It does not, however, mandate enabling autonomous agent execution. Use that window to define scope per surface, establish inference boundaries per workload, and wire in the audit trail *before* the capability is widely enabled.

*   **Pilot in a greenfield service, not the core platform.** The CLI, desktop app, and SDK each have different access patterns and risk profiles. A scope policy designed for CLI use will not map cleanly to a custom SDK agent running in production, and the desktop app may well reach engineering workstations before IT has any policy for it. The only way to uncover governance gaps before they incur significant cost is to run real engineering teams in a strictly constrained environment.

*   **Define the governance framework before the platform matures.** Authorship policy, data residency, scope constraints, and observability all need to be in place before agents reach production. Policy written after agents are running is a retrofit, not thoughtful architecture.

The migration is already underway. Those who define the framework first will not move more slowly; they will move more safely, and ultimately, considerably faster.

## References

[^1]: Google Developers Blog. *Build with Google Antigravity — Our New Agentic Development Platform*. https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/
[^2]: Google Developers Blog. *Transitioning Gemini CLI to Antigravity CLI*. https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/
[^3]: Google Cloud Blog. *Choosing Antigravity or Gemini CLI*. https://cloud.google.com/blog/topics/developers-practitioners/choosing-antigravity-or-gemini-cli
[^4]: Google. *I/O 2026 Developer Highlights: Antigravity, Gemini API, AI Studio*. https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/</content:encoded></item><item><title>The Agentic Mesh: Google Cloud&apos;s Control Plane for Enterprise AI</title><link>https://danhenderson.cloud/blog/agentic-mesh-google-cloud-enterprise-ai/</link><guid isPermaLink="true">https://danhenderson.cloud/blog/agentic-mesh-google-cloud-enterprise-ai/</guid><description>A2A, the Agentic Mesh, and the governance layer inside the Gemini Enterprise Agent Platform: an enterprise architect&apos;s perspective from Google Cloud Next &apos;26.</description><pubDate>Mon, 11 May 2026 23:00:00 GMT</pubDate><content:encoded>Enterprise architecture has spent two decades solving the same underlying problem: systems become difficult to scale, govern, and evolve when too much responsibility is concentrated in a single service boundary. That is what drove the shift from monoliths to SOA, to microservices, and to platform engineering. Enterprise AI is now following the same trajectory, and faster than most organisations have been prepared for.

Early generative AI deployments treated the LLM as a monolithic endpoint: a single model responsible for reasoning across multiple domains simultaneously. That works adequately for productivity tooling and conversational assistants. It breaks down quickly once AI systems are operating inside security boundaries, regulated data, and autonomous task execution.

The architectural response is disaggregation: responsibilities distributed across specialised agents, each focused on a narrower enterprise domain:

![Specialised enterprise agent domains](./specialised-agent-domains.svg)
*Figure 1: The disaggregation of enterprise AI into specialised domain agents.*

Once systems become distributed, coordination becomes one of the hardest problems to solve well.

&lt;blockquote class=&quot;question-box&quot;&gt;
How do organisations securely orchestrate specialised AI agents at scale without creating tightly coupled, operationally fragile systems?
&lt;/blockquote&gt;

For me, the architectural significance here is less about any individual protocol and more about what this shift represents: **Enterprise AI is starting to look less like an application feature and more like a platform engineering problem.**

---

## What Problem A2A Solves

Traditional integration contracts (REST, gRPC, OpenAPI) assume a predictable interaction model where schemas are predefined, responses are structured, service behaviour is bounded.

Autonomous agents introduce a different set of requirements. An orchestrating agent may need to pass intent, context, confidence levels, and delegation constraints, not just exchange structured data.

**This is where A2A becomes architecturally significant rather than just another integration standard.**

Originally announced at Google Cloud Next in April 2025 [^1], Agent2Agent Protocol (A2A) has moved well beyond an emerging specification. As of this year, organisations including Microsoft, AWS, Salesforce, SAP, and ServiceNow have it in production. A2A functions as an interoperability contract for autonomous systems: how agents delegate work, share context, and coordinate without tightly coupling their implementations.

### A2A and MCP: Complementary Layers

A2A often gets discussed alongside Anthropic&apos;s Model Context Protocol (MCP) [^2], so it is worth being clear on where each one fits. The distinction is straightforward: MCP handles how an individual agent connects to tools and data sources. A2A handles how agents communicate with *each other* across organisational and platform boundaries.

**They are complementary layers, not competing standards.** Most serious enterprise multi agent deployments will eventually involve both.

![A2A and MCP Protocol Layers](./a2a_mcp_complementary_layers.svg)
*Figure 2: A2A and MCP operating as complementary protocol layers.*

### Delegation and State

In practice, A2A moves agent interaction beyond simple function calling and towards distributed workflow coordination. A central orchestration layer can delegate a forecasting task to an analytics agent while passing operational constraints, contextual memory references, and governance policies. Without a consistent delegation model, agent integrations quickly become brittle point to point workflows.

State management is the immediate second challenge. Passing full conversational history between agents introduces severe latency and token overhead. Architects should be solving this with shared memory layers, ensuring agents reference context indirectly rather than retransmitting large payloads.

---

## The Agentic Mesh Pattern

As agent ecosystems scale, enterprises need a dedicated control layer for orchestration, identity propagation, policy enforcement, and distributed state management. This is where the Agentic Mesh pattern becomes useful.

Conceptually it resembles a service mesh, adapted for non deterministic workloads. In a traditional service mesh, routing decisions are based on service discovery and traffic policy. In an Agentic Mesh, routing becomes semantic, based on domain expertise, contextual relevance, operational policy, and trust boundaries. The mesh abstracts that complexity away from individual agents, which is critical: agents should not each need to independently solve discovery, identity, policy enforcement, and observability.

![Agentic Mesh Architecture](./agentic-mesh-architecture.svg)
*Figure 3: Logical routing of user intent through an Agentic Mesh control plane.*

What Google announced at Next &apos;26 is the clearest signal yet of where this pattern is solidifying at the platform level.

---

## Google&apos;s Control Plane: The Gemini Enterprise Agent Platform

At Google Cloud Next &apos;26, Google announced the Gemini Enterprise Agent Platform [^3], framed as the evolution of Vertex AI and the foundation layer for the agentic enterprise. What struck me as architecturally significant is not the product itself, but what Google chose to make first class platform capabilities rather than afterthoughts.

The platform ships with dedicated components for Agent Identity, Agent Gateway, Agent Registry, and Agent Observability. These are the exact concerns the Agentic Mesh pattern addresses. Governance, identity, and observability are not features to bolt on after deployment. **They are the control plane.** That is a meaningful architectural position, and one that separates this from previous generations of AI tooling that left governance as a future consideration.

![Gemini Enterprise Agent Platform components](./gemini-agent-platform-components.svg)
*Figure 4: Core components of the Gemini Enterprise Agent Platform.*

Two details from the technical sessions stood out for me. First, Agent Development Kit (ADK) now supports a graph based framework for organising agents into networks of sub agents, giving developers explicit, auditable control over how agents collaborate. Second, Agent Gateway natively understands both A2A and MCP as first class protocols, applying policy enforcement and identity verification consistently across both. That unification at the gateway level is, in my view, one of the more quietly significant architectural decisions Google have made.

---

## The Operational Reality: Identity, Observability, and Policy

Understanding the platform is one thing. Getting it into production is another. Three things trip up almost every team at this stage.

### Identity and the Blast Radius

Static service accounts were not designed for dynamic delegation. When a primary orchestrator hands off to a Finance, HR, or Security sub agent, it needs to pass a constrained identity context that limits what that agent can access and execute.

Agent Identity handles this via SPIFFE IDs: cryptographic identity that travels with the delegation chain. Agent Gateway is the enforcement point; it verifies that identity before any cross-agent call is permitted. If a sub agent behaves unexpectedly, its blast radius is bounded by the scope of the identity it carries.

**Least privilege identity propagation is not optional. It is the first thing to get right.**

### Traditional Monitoring Won&apos;t Cut It

**Monitoring latency is straightforward. Monitoring reasoning is not.**

When a multi agent workflow does fail, engineering teams need to trace prompt variations, intermediate reasoning steps, and delegation confidence scores, not just HTTP 500s. That means AI native observability needs to be designed into the architecture from the start. Conventional APM tooling was built for deterministic systems. It does not know what to do with reasoning steps, token usage, or confidence drift.

### Governance Can&apos;t Be Hardcoded

Enterprise governance requirements such as DLP, PII masking and regulatory controls cannot be hardcoded into individual agents. The mesh must inject Policy Enforcement Points into the network path, evaluating sensitive context before it crosses trust boundaries. Agent Gateway is positioned to serve this role in Google&apos;s architecture, enforcing governance policy consistently across both A2A and MCP traffic while working alongside Model Armor to inspect prompt content, block injection attempts, and mask PII before they reach the model.

![Identity Propagation and Trust Boundaries](./identity-propagation-agentic-meshv2.svg)
*Figure 5: Dynamic identity propagation and trust boundaries across an Agentic Mesh.*

Getting these three foundations right is what determines whether a multi agent system is genuinely production ready or simply impressive in a controlled environment.

---

## Getting the Foundations Right

**The transition to multi agent systems is structural, not cyclical.** What Google demonstrated at Next &apos;26 is that identity, observability, and policy are now first class platform capabilities rather than engineering exercises left to the implementer.

For architects deciding where to start: most workflows still only require a single, well optimised agent. The Agentic Mesh pattern earns its complexity when use cases genuinely need specialised domain reasoning, parallel execution, or strict security isolation. Get the identity and observability foundations in place first. Validate in lower risk workflows before extending into regulated or high stakes processes. The architectural patterns are clear and the platform is ready to build on. Organisations that get the foundations right now will not be retrofitting governance into production systems six months from now.

---

## References

[^1]: Google Developers Blog. *Announcing the Agent2Agent Protocol (A2A)*. April 9, 2025. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
[^2]: Anthropic. *Model Context Protocol (MCP)*. 2024. https://modelcontextprotocol.io
[^3]: Google Cloud. *Introducing Gemini Enterprise Agent Platform*. April 22, 2026. https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform</content:encoded></item></channel></rss>