Natural Language Analytics on Google Cloud: BigQuery, Cloud Run, and Gemini Flash
← Back to Articles

Natural Language Analytics on Google Cloud: BigQuery, Cloud Run, and Gemini Flash


Most data teams spend more time writing SQL than analysing results. Large language models have made NL-to-SQL viable enough to build services around — the interesting problem is no longer whether it works, it is how to build it so you can operate, secure, and extend it.

This post walks through a natural language analytics API on Google Cloud: a FastAPI service on Cloud Run that accepts a plain-English question, translates it to BigQuery SQL using Gemini Flash, runs a dry-run cost gate before executing, and returns structured JSON. All infrastructure in Terraform. Identity through Workload Identity. Zero idle cost.

Four constraints shaped every decision: no plaintext secrets in code or containers, all infrastructure changes through Terraform, IAM bindings as narrow as possible, and zero cost when idle.


The Analytics API

End-to-end architecture — NL question enters Cloud Run, Gemini Flash translates it to SQL, BigQuery validates and executes it, structured JSON returns

The service exposes a single /query POST endpoint. A question arrives, Gemini Flash translates it to SQL using a system instruction containing the full table schema, BigQuery runs a dry-run to validate syntax and price the query before executing, and the result returns as JSON with the generated SQL included.

The target dataset is bigquery-public-data.thelook_ecommerce — a public Google-maintained e-commerce dataset with orders, products, users, and order items. Because it is public, no dataset-level IAM permissions are needed on the service account.


Translating Questions to SQL

The quality of generated SQL depends on how precisely you describe the schema. Full table definitions live in the system instruction — the user message stays as the raw question:

DATASET = "bigquery-public-data.thelook_ecommerce"

SYSTEM_INSTRUCTION = f"""You are a BigQuery SQL expert. Translate plain-English questions into valid BigQuery SQL queries.

Dataset: `{DATASET}`

Tables:
- `orders` — order_id INT, user_id INT, status STRING, gender STRING, created_at TIMESTAMP, returned_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, num_of_item INT
- `order_items` — id INT, order_id INT, user_id INT, product_id INT, status STRING, created_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, returned_at TIMESTAMP, sale_price FLOAT
- `products` — id INT, cost FLOAT, category STRING, name STRING, brand STRING, retail_price FLOAT, department STRING, sku STRING
- `users` — id INT, first_name STRING, last_name STRING, age INT, gender STRING, state STRING, city STRING, country STRING, created_at TIMESTAMP

Rules:
- Use fully qualified table names: `{DATASET}.table_name`
- Return only the SQL — no markdown, no explanation, no code fences
- Use LIMIT 100 for row-returning queries; omit it for aggregations
- Use standard SQL only (not legacy SQL)
"""

translate_to_sql calls generate_content and defensively strips any Markdown code fence the model wraps around the output, which happens even with an explicit instruction not to:

def translate_to_sql(question: str) -> str:
    response = gemini_model.generate_content(question)
    sql = response.text.strip()
    sql = re.sub(r"^```(?:sql)?\n?", "", sql, flags=re.IGNORECASE)
    sql = re.sub(r"\n?```$", "", sql)
    return sql.strip()

Querying BigQuery Safely

Before every execution, the service runs a dry run. BigQuery validates syntax, resolves table references, and returns the bytes the query would scan — without executing it or incurring any cost:

def run_query(sql: str) -> tuple[list[dict], int]:
    dry_run_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    dry_job = bq_client.query(sql, job_config=dry_run_config)
    bytes_to_scan = dry_job.total_bytes_processed

    if bytes_to_scan > 1_000_000_000:  # 1 GB ceiling
        raise ValueError(
            f"Query would scan {bytes_to_scan / 1e9:.1f} GB — exceeds the 1 GB limit"
        )

    job = bq_client.query(sql)
    rows = [dict(row) for row in job.result()]
    return rows, bytes_to_scan

The 1 GB ceiling catches cross-joins or missing date filters before cost is incurred, and acts as a sanity check on the translation. dict(row) is required — BigQuery Row objects are not directly JSON-serialisable.


The FastAPI Service

Both clients initialise once at startup through the lifespan context manager:

@asynccontextmanager
async def lifespan(app: FastAPI):
    global bq_client, gemini_model
    genai.configure(api_key=os.environ["GEMINI_API_KEY"])
    bq_client = bigquery.Client()
    gemini_model = genai.GenerativeModel(
        model_name="gemini-1.5-flash",
        system_instruction=SYSTEM_INSTRUCTION,
    )
    yield

app = FastAPI(title="BigQuery Natural Language API", lifespan=lifespan)

bigquery.Client() uses Application Default Credentials — on Cloud Run it resolves to the Workload Identity service account; locally it uses gcloud auth application-default login. genai.configure reads GEMINI_API_KEY, which Cloud Run injects from Secret Manager at container startup.

The query endpoint wires the two functions together:

class QueryRequest(BaseModel):
    question: str

class QueryResponse(BaseModel):
    question: str
    sql: str
    rows: list[dict]
    bytes_scanned: int
    latency_ms: int

@app.post("/query", response_model=QueryResponse)
async def query(request: QueryRequest) -> QueryResponse:
    if len(request.question) > 500:
        raise HTTPException(status_code=400, detail="Question exceeds 500 character limit")

    start = time.time()
    sql = translate_to_sql(request.question)

    try:
        rows, bytes_scanned = run_query(sql)
    except ValueError as exc:
        raise HTTPException(status_code=400, detail=str(exc))
    except Exception:
        log_structured("ERROR", "Query execution failed", sql=sql)
        raise HTTPException(status_code=500, detail="Query execution failed")

    latency_ms = round((time.time() - start) * 1000)
    log_structured(
        "INFO", "Query complete",
        question_length=len(request.question),
        bytes_scanned=bytes_scanned,
        row_count=len(rows),
        latency_ms=latency_ms,
    )
    return QueryResponse(
        question=request.question,
        sql=sql,
        rows=rows,
        bytes_scanned=bytes_scanned,
        latency_ms=latency_ms,
    )

def log_structured(severity: str, message: str, **kwargs) -> None:
    print(json.dumps({"severity": severity, "message": message, **kwargs}))

The 500-character limit reduces prompt injection surface. ValueError from the dry-run gate surfaces as a 400 with a useful message; unexpected execution failures return 500 without internal detail. log_structured is one line — Cloud Run captures stdout and Cloud Logging parses every JSON field as structured, queryable data.

📸 Screenshot: the service running locally with uvicorn main:app --reload — POST {"question": "What are the top 5 product categories by total revenue?"} to /query and see the generated SQL and result rows in the JSON response.

The GCP Infrastructure with Terraform

Infrastructure before code — run terraform apply before the first container image exists.

API enablement first:

resource "google_project_service" "run"              { service = "run.googleapis.com"              }
resource "google_project_service" "bigquery"         { service = "bigquery.googleapis.com"         }
resource "google_project_service" "secretmanager"    { service = "secretmanager.googleapis.com"    }
resource "google_project_service" "artifactregistry" { service = "artifactregistry.googleapis.com" }
resource "google_project_service" "cloudbuild"       { service = "cloudbuild.googleapis.com"       }

The Gemini API key goes into Secret Manager before Cloud Run is deployed. The value is passed at apply time and never appears in the Terraform configuration as plaintext:

resource "google_secret_manager_secret" "gemini_key" {
  secret_id = "gemini-api-key"
  replication { auto {} }
}

resource "google_secret_manager_secret_version" "gemini_key" {
  secret      = google_secret_manager_secret.gemini_key.id
  secret_data = var.gemini_api_key
}

The Cloud Run service wires identity, container, and secret injection together:

resource "google_cloud_run_v2_service" "analytics_api" {
  name     = "nl-analytics-api"
  location = var.region

  template {
    service_account = google_service_account.analytics_api.email

    containers {
      image = "${var.region}-docker.pkg.dev/${var.project_id}/nl-analytics/app:${var.image_tag}"

      env {
        name = "GEMINI_API_KEY"
        value_source {
          secret_key_ref {
            secret  = google_secret_manager_secret.gemini_key.secret_id
            version = "latest"
          }
        }
      }
    }

    scaling {
      min_instance_count = 0
      max_instance_count = 10
    }
  }
}

min_instance_count = 0 means no idle cost. secret_key_ref resolves the key from Secret Manager at startup — it never appears in terraform.tfstate, the Cloud Run console view, or deployment logs.

terraform init -backend-config="bucket=YOUR_PROJECT_ID-tfstate"
terraform apply -var="project_id=YOUR_PROJECT_ID" \
                -var="region=us-central1" \
                -var="image_tag=initial" \
                -var="gemini_api_key=YOUR_KEY"
📸 Screenshot: terraform apply output showing the resources created — service account, IAM bindings, secret, Cloud Run service — then the Cloud Run service in the GCP console with the service account visible in the service configuration.

IAM and Workload Identity

IAM binding model — the service account has two bindings, one at project scope for BigQuery jobs and one at resource scope for the Secret Manager secret

The service account has two IAM bindings:

resource "google_service_account" "analytics_api" {
  account_id   = "nl-analytics-sa"
  display_name = "NL Analytics API"
}

# Submit and read BigQuery query jobs — must be at project level
resource "google_project_iam_member" "bq_job_user" {
  project = var.project_id
  role    = "roles/bigquery.jobUser"
  member  = "serviceAccount:${google_service_account.analytics_api.email}"
}

# Read one specific secret — scoped to the resource, not the project
resource "google_secret_manager_secret_iam_member" "gemini_key_accessor" {
  secret_id = google_secret_manager_secret.gemini_key.secret_id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${google_service_account.analytics_api.email}"
}

roles/bigquery.jobUser must be at project level — BigQuery jobs belong to the project, not to a dataset or table, so there is no narrower resource to bind it to. The role allows creating and reading your own query jobs, nothing else — no dataset access, no storage, no other GCP services. roles/secretmanager.secretAccessor is scoped to the specific secret resource.

Workload Identity means the Cloud Run service runs as nl-analytics-sa with no credential file. The GCP metadata server issues short-lived tokens automatically, refreshing every hour. bigquery.Client() reads them transparently — there is no JSON key file to provision, store, or risk committing to version control.

📸 Screenshot: the IAM page in Cloud Console showing nl-analytics-sa with exactly two roles — BigQuery Job User (project scope) and Secret Manager Secret Accessor (scoped to the specific secret resource).

Automating Deployment with Cloud Build

Four steps: test, build, push, deploy. Every push to main runs all four.

steps:
  - name: python:3.12
    entrypoint: bash
    args:
      - '-c'
      - 'pip install -r requirements.txt -r requirements-dev.txt && python -m pytest tests/ -v'

  - name: gcr.io/cloud-builders/docker
    args: ['build', '-t', '$_IMAGE', '.']

  - name: gcr.io/cloud-builders/docker
    args: ['push', '$_IMAGE']

  - name: hashicorp/terraform:1.8
    entrypoint: sh
    args:
      - '-c'
      - |
        terraform -chdir=terraform init \
          -backend-config="bucket=${PROJECT_ID}-tfstate" \
          -backend-config="prefix=nl-analytics"
        terraform -chdir=terraform apply -auto-approve \
          -var="project_id=${PROJECT_ID}" \
          -var="region=${_REGION}" \
          -var="image_tag=${SHORT_SHA}"

substitutions:
  _IMAGE: ${_REGION}-docker.pkg.dev/${PROJECT_ID}/nl-analytics/app:${SHORT_SHA}
  _REGION: europe-west1

options:
  logging: CLOUD_LOGGING_ONLY

SHORT_SHA tags each image with the commit that produced it, making every Cloud Run revision traceable to the exact source. The Gemini API key is not passed to the pipeline — it was provisioned into Secret Manager by the initial terraform apply. The deploy step only updates the image tag on the existing service.

📸 Screenshot: Cloud Build history showing a successful pipeline run — four steps, all green, with the build duration and trigger commit visible.

Running and Verifying

Health check first:

curl https://YOUR_CLOUD_RUN_URL/health
# → {"status": "ok"}

Then a natural language query:

curl -X POST https://YOUR_CLOUD_RUN_URL/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the top 5 product categories by total revenue?"}'

The response includes the generated SQL alongside the results:

{
  "question": "What are the top 5 product categories by total revenue?",
  "sql": "SELECT p.category, SUM(oi.sale_price) AS total_revenue\nFROM `bigquery-public-data.thelook_ecommerce.order_items` oi\nJOIN `bigquery-public-data.thelook_ecommerce.products` p\nON oi.product_id = p.id\nWHERE oi.status NOT IN ('Cancelled', 'Returned')\nGROUP BY p.category\nORDER BY total_revenue DESC\nLIMIT 5",
  "rows": [
    { "category": "Outerwear & Coats",             "total_revenue": 2847391.23 },
    { "category": "Jeans",                         "total_revenue": 2614820.57 },
    { "category": "Suits & Sport Coats",           "total_revenue": 2398104.91 },
    { "category": "Swim",                          "total_revenue": 2144739.84 },
    { "category": "Fashion Hoodies & Sweatshirts", "total_revenue": 1987623.19 }
  ],
  "bytes_scanned": 41943040,
  "latency_ms": 1843
}

Pull structured logs to verify observability is working:

gcloud logging read \
  'resource.type="cloud_run_revision" AND jsonPayload.message="Query complete"' \
  --limit=5 --format=json
📸 Screenshot: the JSON response from Cloud Run showing the generated SQL and result rows, the BigQuery job history in Cloud Console with bytes billed visible, and the Cloud Logging entry with bytes_scanned and latency_ms as structured fields.

Where to Take It Next

Replace the API key with Vertex AI — swap google.generativeai for vertexai.generative_models.GenerativeModel, initialised with vertexai.init(project=os.environ["PROJECT_ID"], location="us-central1"). Authentication moves entirely to ADC: the same Workload Identity token that authenticates BigQuery calls also authenticates Vertex AI. Remove the Secret Manager secret and GEMINI_API_KEY from the Cloud Run definition, add roles/aiplatform.user to the service account. No API key to provision, rotate, or track.

Runtime schema introspection — query INFORMATION_SCHEMA.COLUMNS at service startup to build the schema context from the live dataset definition. The system instruction stays accurate as tables or columns change, without a code deployment to update the prompt.


References