Databricks Integration Guide
Connect Terapage to Databricks so researchers can securely import eligible participants from approved catalogs, schemas, tables, or views, map Databricks columns to Terapage participant fields, and keep research samples synced automatically.
What it does
The Databricks integration lets a Terapage workspace connect to a client-owned Databricks SQL Warehouse, discover permitted data sources, map participant columns, preview records, and sync eligible contacts into studies or tasks.
Who uses it
Research operations teams, data teams, CX teams, enterprise insight teams, and agencies whose participant, panel, customer, or product-user data already lives in Databricks.
Best for
Importing large B2B or customer panels, syncing opted-in contacts from a governed data lakehouse, and turning warehouse data into recruitable Terapage research participants.
Core idea: the client connects their own Databricks workspace. Terapage discovers accessible catalogs, schemas, tables, views, and columns, then syncs permitted participant records into Terapage. Terapage does not need to own the client’s Databricks data or pay for the client’s warehouse compute.
Before you start
- You must be a Terapage workspace admin or have permission to manage participant imports and integrations.
- Your organisation should provide a Databricks workspace URL and a SQL Warehouse that Terapage can query.
- Use a restricted read-only identity, service principal, OAuth connection, or personal access token according to your organisation’s policy.
- Only grant Terapage access to the approved catalog, schema, table, or view that contains eligible research participants.
- For best practice, expose a dedicated view such as main.research.eligible_research_participants_view instead of giving broad access to raw CRM tables.
Start from a Terapage study workspace and create a research activity before importing or syncing external participants.
1. Choose the right Terapage workflow
Open the relevant Terapage project or study workspace.
Go to Participants, Import / Sync Participants, or create a research activity if the import is tied to a specific activity.
Select Databricks as the participant source. Terapage will guide the researcher through connection, source discovery, mapping, preview, and activation.
The same import pattern can support file uploads, video meeting imports, and structured data integrations such as Databricks.
2. Connect Databricks
The researcher or workspace admin enters the Databricks workspace host, SQL Warehouse details, and permitted authentication method. Terapage should test the connection before any source discovery or sync is enabled.
A clear connection screen helps admins confirm the warehouse, identity, access scope, and sync mode before data is read.
Recommended security posture: ask the client to create a read-only identity with access only to the approved participant view. Terapage should store connection details encrypted and should never store tokens in plain text.
3. Discover catalogs, schemas, tables, views, and columns
After the connection test passes, Terapage should discover only the objects that the connected identity is permitted to access. For Unity Catalog environments, this can be driven by Databricks metadata such as information schema views.
| Discovery step | What Terapage shows | Why it matters |
|---|---|---|
| Catalog and schema discovery | Available catalogs and schemas, such as main.research. | Prevents the researcher from manually typing technical source names. |
| Table and view discovery | Approved sources such as participants, panel_members, customers, or eligible_research_participants_view. | Lets the client expose a safe, curated participant dataset. |
| Column discovery | Columns such as participant_id, email, first_name, phone, segment, consent_status, and last_updated_at. | Enables a dynamic mapping UI without hardcoding the client’s schema. |
4. Map Databricks columns to Terapage participant fields
Terapage should provide a reusable mapping engine so researchers can connect standard participant fields and optional custom attributes without needing engineering support.
Mapping should support required participant fields, custom attributes, consent filters, source IDs, and last-updated columns.
Core participant fields
First name, last name, email, phone, company, country, city, job title, segment, consent status, external source ID, and last updated timestamp.
Custom attributes
Researchers can map columns such as customer tier, product used, persona, region, account type, subscription plan, or panel group into Terapage attributes.
5. Build a safe query without exposing raw SQL
For the MVP, Terapage should not expose a raw SQL editor to researchers. Instead, the interface should let them choose the source, columns, filters, consent field, external ID, last updated field, and destination study or task. Terapage then generates a safe read-only query.
SELECT participant_id, first_name, last_name, email, phone, company,
country, city, segment, consent_status, last_updated_at
FROM main.research.eligible_research_participants_view
WHERE email IS NOT NULL
AND consent_status = 'Opted In'
AND last_updated_at > ?
ORDER BY last_updated_at ASC
LIMIT 10000;
- Allow SELECT queries only.
- Use only approved catalogs, schemas, tables, views, and discovered columns.
- Block UPDATE, DELETE, INSERT, MERGE, DROP, ALTER, and COPY.
- Apply row limits, pagination, logging, and retry controls.
6. Preview records, choose sync behaviour, and activate
Before activation, Terapage should show a preview of matched rows and explain exactly what will happen when new eligible participants are found.
- Import only — add records to Terapage without inviting anyone.
- Import into participant pool — add eligible participants to the workspace pool.
- Import and invite to selected study — add participants and send the study invitation.
- Import and invite to selected task — add participants directly to a research activity or task.
- Manual, hourly, or daily sync — choose how often Terapage checks Databricks for new or updated rows.
Terapage can follow the same simple import pattern: source, options, confirmation, and processing status.
7. Hourly sync and deduplication
Once activated, Terapage should run a scheduled sync job that connects to the client’s SQL Warehouse, loads the saved mapping, queries changed records, validates required fields, applies consent rules, deduplicates, imports or updates participants, optionally sends invites, and writes a sync log.
Every sync should leave an audit trail so admins can see imports, updates, skipped records, consent blocks, errors, and invitation activity.
| Deduplication order | Rule |
|---|---|
| 1 | Databricks connection ID + table/view + external ID column. |
| 2 | Email address. |
| 3 | Phone number. |
8. Best-practice client setup
For the cleanest and safest implementation, ask clients to expose a dedicated eligible participant view in Databricks.
CREATE VIEW main.research.eligible_research_participants_view AS
SELECT
customer_id AS participant_id,
first_name,
last_name,
email,
phone,
company,
country,
city,
segment,
consent_status,
last_updated_at
FROM main.crm.customers
WHERE consent_status = 'Opted In'
AND email IS NOT NULL;
Why this works well: the client controls exactly what Terapage can see, the researcher gets a cleaner import experience, and Terapage can safely automate participant updates without browsing sensitive raw tables.
What Terapage can support
| Capability | Description | Status |
|---|---|---|
| Databricks SQL Warehouse connection | Connect using a client-provided workspace host, warehouse details, and permitted credentials. | Recommended MVP |
| Catalog, schema, table, and view discovery | Show only sources accessible to the connected read-only identity. | Recommended MVP |
| Dynamic field mapping | Map Databricks columns to standard and custom Terapage participant attributes. | Recommended MVP |
| Safe SELECT query builder | Generate controlled queries from UI selections rather than exposing raw SQL. | Recommended MVP |
| Hourly sync | Import new and updated eligible participants automatically. | Recommended MVP |
| Auto-invite | Send study or task invitations after consent confirmation and eligibility checks. | Recommended MVP |
| OAuth / service principal support | Support more enterprise-grade authentication options. | Advanced |
Important: the client remains responsible for ensuring that imported participants can be contacted for the relevant research purpose. Terapage should require a confirmation before auto-invite is enabled.