SEO Services
Financial Services Industry / Technical Guide
Automate KYC/AML Customer Onboarding for Financial Services with n8n + Claude Step-by-Step Guide
A complete walkthrough for building an n8n + Claude pipeline that captures onboarding submissions, verifies identity documents with Claude vision, runs sanctions and PEP screening through ComplyAdvantage, scores customer risk, surfaces SAR triggers for the BSA officer, and writes everything to a tamper-evident audit log — FINRA Rule 3310 aware, end to end in under 4 hours.
Onboarding form + doc upload
ID extraction (Claude vision)
Sanctions / PEP screening
Risk scoring (Claude rubric)
SAR trigger detection
Human review queue
Salesforce + Plaid sync
Tamper-evident audit log
2. System Architecture
3. Intake & Doc Upload
4. ID Extraction (Vision)
5. Sanctions / PEP Screen
6. Risk Scoring Rubric
7. SAR Trigger Detection
8. Human Review Queue
9. Salesforce + Plaid Sync
10. Audit Log
11. Common Failures
12. FINRA / BSA / GLBA
13. Measured Results
14. Timeline & Cost
15. FAQ
1. The Problem — Why KYC Onboarding Eats Compliance Teams Alive
Every regulated financial institution hits the same operational ceiling once new-account volume crosses ~400 per month: the BSA officer’s queue stops moving. ID images sit in a shared drive waiting for a human to read them. ComplyAdvantage hits get re-screened manually because nobody trusts the false-positive rate. The 30-day Customer Identification Program window starts ticking the moment the customer first interacts, and most teams burn through 7 of those 30 days on data entry alone. Meanwhile FINRA examiners ask for the audit trail and the team can’t reconstruct who approved what, when, or why.
Real numbers from a mid-size broker-dealer (350 new accounts/month)
| New account submissions (per month) | ~350 |
| Median onboarding turnaround | 7.2 days |
| Sanctions/PEP false positives requiring analyst review | 38% |
| Onboarding abandonments at the document step | 22% |
| Time-to-reconstruct an audit trail (per request) | 3.5 hrs |
The brutal asymmetry: about 4% of submissions actually need human compliance attention (real PEP matches, blurry IDs, structured-deposit patterns). The other 96% are clean and could clear in minutes. The system needs to widen the gap between those two queues, fast on the clean ones, deeply attentive on the rest, with a defensible record of who decided what and what evidence drove the decision.
What “automation” means here (and doesn’t)
Automating KYC/AML is not letting an LLM make compliance decisions. It is letting an LLM do the high-volume, low-judgement work — OCR on a passport, fuzzy-name normalization, rubric-based risk tier assignment, evidence packaging — so the BSA officer reviews only the cases where judgement is actually required:
- Auto-clear (low risk): domestic individual, clean ID, no sanctions/PEP hits, expected activity within bands, no adverse media. Account opens, audit log written, customer notified.
- Enhanced due diligence queue (medium): high-risk geography, complex entity structure, occupation flags, partial document mismatch. Goes to compliance analyst with Claude’s evidence pack pre-attached.
- Senior review / SAR trigger (high): sanctions confirmed match, PEP confirmed, structuring pattern, identity-doc fraud signals. Routes to BSA officer, SAR drafting workflow opens, file is locked from auto-progression.
2. System Architecture
Eight components, each replaceable. The orchestration layer is self-hosted n8n inside the institution’s VPC so the security team can audit every external call. Postgres holds the case file, the evidence packs, and the WORM-style audit log partitioned by month. Encrypted S3 holds the raw documents. Nothing about a customer leaves the perimeter except hashed identifiers to ComplyAdvantage and the redacted prompt body to Claude (with zero-data-retention enabled).
The stack
Cost estimate (1,200 onboardings/month)
| Claude Sonnet vision (1.2k ID extractions, ~3,200 tok in / 600 tok out) | ~$190 |
| Claude Sonnet risk reasoning (1.2k scoring + 4.8k re-screens) | ~$280 |
| ComplyAdvantage (sanctions + PEP + adverse media, 1.2k searches) | ~$1,440 |
| Persona ID capture (1.2k verifications with liveness) | ~$1,800 |
| Plaid (funding + transaction history) | ~$420 |
| VM + Postgres + S3 (single-tenant, in-VPC) | ~$165 |
| Total / month | ~$4,295 |
Compared to two full-time KYC analysts (~$140k/yr loaded each), the stack costs roughly 18% of the labor it displaces, and it works every weekend, every holiday, at peak quarter-end without overtime. The same orchestration approach plugs into our broader AI automation services.
Intake Form & Document Upload
Onboarding starts with a structured intake — name, DOB, SSN/ITIN (or non-US equivalent), address, occupation, source of funds, expected activity, and entity type. The customer also uploads a primary government ID and a proof-of-address document. Persona handles the front-end with liveness detection so a stolen ID image plus a static selfie cannot pass; on a successful capture Persona webhooks an encrypted bundle to n8n.
Intake fields the pipeline expects
- CIP fields: legal name, DOB, residential address, SSN/ITIN/passport. These satisfy 31 CFR 1023.220.
- Risk-relevant context: occupation + employer, source-of-funds narrative, expected monthly activity in dollars, expected counterparties (countries, instrument types).
- Entity flags: is this a beneficial-ownership-rule entity (FinCEN BO Rule)? List the 25%+ owners and the control prong.
- PEP self-declaration: the customer is asked, but the screening provider also runs an external check; mismatches are an EDD signal in their own right.
n8n webhook node — Persona payload
A single n8n Webhook node accepts the Persona payload. HMAC verification using the shared signing secret runs before the workflow does anything else; an invalid signature short-circuits to a 401 and writes a security-event row. The body contains the case ID, the structured fields, and pre-signed S3 URLs for each uploaded artifact.
ID Extraction with Claude Vision
Persona returns a liveness score and a verification verdict, but the institution still has to extract and normalize the data on the document for matching against the intake form and against the screening providers. Claude Sonnet’s vision capability reads the ID, returns structured fields, and flags suspicious artifacts (mismatched fonts, missing security features, edited regions). The extraction runs in parallel with the address-doc OCR.
What the extractor returns
passport / drivers_license / state_id / national_id with issuing country.
Full name, DOB, document number, expiration, issue date, MRZ if present.
Match between MRZ and visible field zones; flag mismatches.
Font kerning anomalies, ghosted edges, inconsistent lighting, missing security holograms.
Optional secondary check; primary match comes from Persona’s biometric layer.
Vision system prompt
Tested on a labeled sample of 4,200 historical IDs (clean, blurry, expired, edited) — pulled with privacy-team approval and re-redacted before any prompt iteration. The prompt explicitly forbids inventing fields not visible on the document.
n8n HTTP request — vision call
Sanctions, PEP & Adverse-Media Screening
Once the identity is normalized, n8n posts the canonical name (plus DOB + country to disambiguate) to ComplyAdvantage or Refinitiv World-Check. The provider screens against OFAC SDN, the EU consolidated list, UK HMT, UN, and dozens of national lists, plus PEP databases and adverse-media indexing. The raw response is verbose and noisy by design — fuzzy matching pulls in many false positives. Claude reads each hit and explains in plain English why it is or isn’t a real match.
The screening pipeline
- Name normalization — strip honorifics, transliterate non-Latin scripts, generate aliases for diminutives (“Bill” / “William”) with a deterministic library, not the LLM.
- Primary screen — ComplyAdvantage with fuzziness 0.8, returns up to 200 candidate matches.
- Disambiguation pass — Claude scores each candidate against DOB, nationality, occupation, and address. Returns confirmed_match / unlikely_match / requires_human.
- Adverse media filter — keyword + named-entity filter on the article snippets. Material categories: financial crime, terrorism, sanctions evasion, fraud convictions. Tabloid noise is dropped.
- Tier assignment — confirmed sanction or PEP routes to senior review; multiple adverse-media hits with material categories goes to EDD; no hits or only-noise hits clear.
Disambiguation prompt
Risk Scoring Rubric
Risk tier comes from a written rubric, not a black-box scorer. The institution’s MLRO defines the inputs, the weights, and the thresholds in plain English; Claude applies them. Examiners can read the same prompt the model reads. When the rubric changes (a new high-risk geography, a tightened source-of-funds policy), one file changes and a CI deploy ships it — no model re-training, no vendor escalation.
The 6 risk dimensions
FATF grey/black-list countries, sanctioned jurisdictions, high-corruption indices.
Retail individual, complex entity, trust, NGO, money-services business, cash-intensive industry.
Brokerage, options, margin, international wires, crypto on/off-ramp.
Expected activity vs declared income, structured-deposit signals, rapid pass-through indicators.
Outcome of step 3: confirmed PEP, adverse media, or clean.
Tampering signals from step 2, missing supporting docs, expired ID.
Risk-tier prompt
The reasoning string lands in the case file as the model-generated portion of the risk assessment narrative. The reviewing analyst can adopt, edit, or reject it; the audit log captures both versions. Examiners get a clean side-by-side view of what the model said and what the human decided.
SAR Trigger Detection
A SAR is filed by the BSA officer, never by an automated system. What the pipeline does is surface the patterns and the evidence in the form FinCEN expects, so the officer’s review time goes to judgement, not data assembly. The SAR-trigger detector runs against the case at onboarding and again on every monitored transaction batch.
SAR-relevant patterns at onboarding
- Identity inconsistency: name on ID does not match the intake form, multiple identities tied to one phone or device fingerprint, addresses linked to known mail-drop services.
- Source-of-funds implausibility: declared income inconsistent with funding amount, vague narrative (“savings”), unwillingness to clarify.
- Structuring indicators: initial funding pattern crafted to stay under 10k reporting thresholds, deposits arrayed across multiple accounts within 24h.
- Beneficial-owner opacity: declared owners refuse to provide ID, ownership chain runs through nominee services or shell-like entities.
- Geographic risk concentration: all funding from a high-risk jurisdiction with weak counterparty disclosure.
Trigger detector — JSON output
Human Review Queue
Three queues, three SLAs. Auto-clear cases never touch a queue at all — they get a confirmation email, a Salesforce record write, and an audit-log entry. Medium-tier (EDD) routes to compliance analysts with the model’s evidence pack pre-attached. High-tier and SAR-trigger cases route to the BSA officer with the file locked from any further automated action until they release it.
Routing matrix
| Tier | Reviewer | SLA | Customer state |
|---|---|---|---|
| Low (auto-clear) | None — system approves | Under 4 hours | Account live, monitoring active |
| Medium (EDD) | Compliance analyst (round-robin) | 2 business days | Account pending, customer notified |
| High | Senior compliance analyst | 1 business day | Account pending, no funding allowed |
| Escalate (SAR trigger) | BSA officer | Same day | Account locked, no customer-facing notice |
Slack alert payload (high tier)
Salesforce + Plaid Sync
Salesforce Financial Services Cloud is the source of truth for the customer relationship, the household, the rep, and the compliance object that holds risk tier and review history. The n8n workflow upserts the customer, attaches the related contacts (beneficial owners, authorized signatories), updates risk tier on the compliance object, and writes a relationship to the case file. Plaid is called in parallel for funding-source verification and for a 24-month transaction-history pull that feeds the expected-activity baseline.
Salesforce upsert (Composite API)
Plaid funding verification
Plaid returns the verified bank account ownership (name on the account must match the customer), institution metadata, and a 24-month transaction history. The history is summarized into expected-activity bands and written back to the compliance object so that day-one transaction monitoring has a baseline. Customers without a Plaid-supported bank fall back to micro-deposit verification with a 1-3 business day delay; the case stays in pending state until the verification clears.
Tamper-Evident Audit Log
The BSA requires records of customer identification and verification be retained for 5 years after the account closes; SAR records have their own retention regime. More importantly for examiner readiness, every event in the case lifecycle needs to be reconstructible — what the model said, what the analyst overrode, who clicked approve, what evidence was attached. A standard append-only Postgres table with hash-chained rows gives a tamper-evident log without the cost of a full WORM appliance.
Audit log schema
What gets logged
- Case opened: intake fields hash, customer-facing IP, Persona inquiry id.
- ID extracted: model version, prompt hash, structured output, confidence, quality flags.
- Screening run: provider request id, candidate-hit count, disambiguator output for each.
- Risk tier assigned: rubric prompt hash, dimension scores, final tier, EDD flag.
- SAR trigger evaluated: categories detected, severity, FinCEN red-flag references cited.
- Human review: reviewer id, decision, override-of-model flag, justification text.
- Customer state changed: any transition between pending / approved / locked / closed.
Common Failures & Fixes
Four failure modes show up in nearly every deployment. Plan for them on day one — they are cheap to design around and ruinous to retrofit.
Failure 1: PEP false positives flooding the analyst queue
Symptom: The PEP list is broad — local council members, senior university officials, retired diplomats. A common name pulls 40 candidate hits and the analyst spends an hour clearing them manually.
Fix: Run the disambiguator (step 3) before the human ever sees the queue. Auto-clear hits with DOB mismatch greater than 5 years and country mismatch on the same record. Surface to the analyst only the candidates Claude could not rule out, with the discriminating fields highlighted. Track precision monthly to make sure the auto-clear isn’t drifting.
Failure 2: Vision extraction hallucinates a clean field on a tampered ID
Symptom: A subtly altered date of birth on a passport extracts cleanly, the screening clears, the customer onboards, the discrepancy is caught months later in a manual file review.
Fix: Always cross-check the MRZ band against the visible field zone; mismatches are an automatic EDD trigger. Run the extraction twice with different prompt phrasings and require agreement on the high-stakes fields (name, DOB, document number). Keep a sample of cleared documents in a monthly QA review where a human re-extracts a 3% sample and compares to the model output.
Failure 3: Re-screening drift on existing customers
Symptom: A customer was clean at onboarding. Two years later they’re elected to a foreign legislature, ComplyAdvantage updates the PEP record, but nothing in the institution’s stack reacts because the original screening was a one-shot call.
Fix: Subscribe to ComplyAdvantage’s monitored-list webhooks for every onboarded customer. The provider notifies on any list change touching a name in your portfolio; n8n re-runs the disambiguator and re-tier. Material changes route to the BSA officer on the same day. The audit log captures the delta so the next examiner sees a continuous record, not a point-in-time snapshot.
Failure 4: Treating model-vs-analyst disagreement as a defect
Symptom: The team starts adjusting the prompt every time an analyst overrides Claude’s tier, chasing 100% agreement. The prompt becomes a Frankenstein of edge-case overrides and stops generalizing.
Fix: Disagreement is the feature, not the bug. Track agreement rate as a metric, but do not target 100%. Use overrides to find systematic gaps (a high-risk geography missing from the rubric, a customer-type weight that’s off) and patch the rubric in the prompt; do not patch the prompt to match a single analyst’s instinct. Review the override log monthly with the MLRO present.
Compliance: FINRA Rule 3310, BSA, FinCEN & GLBA
A KYC pipeline lives inside a dense regulatory perimeter. FINRA Rule 3310 sets the AML program standard for broker-dealers; the BSA and its FinCEN regulations define recordkeeping, CTR, and SAR obligations; the GLBA (with the FTC Safeguards Rule and the SEC’s Regulation S-P) governs how non-public personal information is stored and shared. State-level regimes (NYDFS Part 500 in particular) layer additional cybersecurity and incident-reporting requirements. Treat compliance as architecture, not a checklist.
What Claude sees (and doesn’t)
- Sees: name as printed on the ID, DOB, document number, address, declared occupation, declared activity bands, screening hits, document image (only on the vision call, not on the risk-scoring call).
- Doesn’t see: SSN/ITIN in full (ever — only the last-4 mask), bank account numbers, transaction details beyond aggregated bands, internal Salesforce IDs, employee names beyond the reviewing analyst.
FINRA Rule 3310 specifics
Rule 3310 requires a written AML program, a designated AML compliance officer, ongoing training, and an annual independent test. The pipeline supports each: the prompts and the rubric live in version control as the written program, the audit log evidences ongoing monitoring, the override-tracking log evidences that the AMLCO is engaged, and the independent test is run by extracting a stratified sample from the audit log and re-performing the controls.
BSA and FinCEN obligations
- CIP (31 CFR 1023.220): identity collection and verification within a reasonable time after first interaction. The pipeline targets sub-4-hour completion.
- CDD beneficial ownership rule: 25%+ owners and one control-prong individual on entity accounts. The intake form enforces these fields; the screening runs on each.
- SAR filing: within 30 days of detection, 60 if a subject is unknown. Trigger detection, evidence assembly, and BSA-officer routing happen on day one of detection.
- Recordkeeping: 5 years post-relationship for CIP, 5 years for SAR-related records, indefinite for OFAC matches under enforcement. Audit log partitioned monthly with retention policy enforced at the partition level.
GLBA + Reg S-P safeguards
- Encryption at rest: Postgres TDE, S3 with KMS keys held inside the institution’s account; vendor never holds raw documents.
- Encryption in transit: TLS 1.3 only, mutual TLS for vendor APIs where supported.
- Access controls: RBAC on n8n credentials, short-lived role assumption for the AI service, MFA on every human access path, just-in-time elevation for SAR-tier records.
- Incident response: 30-day GLBA breach window per FTC Safeguards (and the 72-hour NYDFS clock for covered entities) is a hard deadline; the audit log feeds the discovery query.
Measured Results — 90 Days In
Numbers from a real implementation at a U.S. mid-size broker-dealer (350 new accounts/month, a 4-person compliance team, one BSA officer) after the first full quarter on the new pipeline. No change in onboarding volume or risk appetite during the test — the lift comes from prioritization, evidence pre-assembly, and the audit-log discipline.
The headline metric inside the compliance team is auditor-readiness. Pulling a complete file for an examiner used to take three to four hours per case; with the audit log it’s sub-30-second SQL. The next FINRA cycle examination concluded with no findings on the AML program, the first year that’s happened in this firm’s history.
Implementation Timeline & Cost
- n8n VPC deploy + queue mode + KMS wiring: 12–18 hrs
- Persona webhook + HMAC + S3 ingestion: 8–12 hrs
- Claude vision prompt + double-extraction QA: 16–24 hrs
- ComplyAdvantage integration + disambiguator: 18–24 hrs
- Risk rubric prompt + MLRO sign-off cycle: 14–20 hrs
- SAR trigger detector + FinCEN red-flag library: 12–18 hrs
- Salesforce FSC compliance object + Plaid sync: 14–20 hrs
- Hash-chained audit log + examiner-read role: 10–14 hrs
- WSP + analyst training + tabletop exercise: 16–30 hrs
- Week 1-2: Discovery + MLRO rubric workshop + WSP draft
- Week 3: VPC infra + Persona/ComplyAdvantage integrations
- Week 4: Vision + risk + SAR-trigger prompts, backtest on 6mo of cases
- Week 5: Salesforce FSC + Plaid + hash-chained audit log
- Week 6: Pilot with 50 live cases, BSA-officer review loop
- Week 7: Full cutover, parallel manual review for 30 days
- Week 8: Independent control test + examiner-walkthrough rehearsal
- Includes: written program updates, override-rate dashboarding, monthly precision report
FAQ
Want this built for your KYC operation?
SEOKRU deploys this exact system in 8 weeks. We start with an MLRO rubric workshop, backtest the prompts against 6 months of your historical cases, wire your screening provider and Salesforce FSC, build the hash-chained audit log inside your VPC, and run a 30-day parallel review before cutover. You keep ownership of every component — workflows, prompts, Postgres, KMS keys, the lot.
Talk to a financial services automation engineer