Data migrations have a way of looking fine right up until they are not. The new system accepts the import without errors, the row count matches, and a spot-check of ten records looks clean. Then three weeks after go-live someone discovers that all phone numbers were stripped of their leading plus sign, or that a free-text field was silently truncated at 255 characters, or that 400 records tied to a deleted foreign key were quietly dropped rather than migrated. A CSV diff check catches these issues before they become production incidents.
Why Migrations Go Wrong
- –Character encoding mismatches — UTF-8 source data imported into a Latin-1 table mangles accented characters and emoji.
- –Field length truncation — a migration script that maps a VARCHAR(500) source to a VARCHAR(255) destination will silently cut off long values.
- –Type coercion bugs — dates stored as strings may be reformatted or zeroed out when cast to a DATE column.
- –Missing related records — records that depend on a parent row that failed to migrate are often silently skipped rather than raising an error.
- –Default value substitution — nullable fields mapped to NOT NULL destinations may be filled with 0 or empty string instead of being flagged.
- –Soft-delete confusion — records with a deleted_at timestamp are sometimes excluded from the migration when they should be preserved.
The Export-Diff-Verify Workflow
The core idea is straightforward: export a snapshot from the source system before the migration runs, run the migration, export the same dataset from the destination system, then diff the two exports. Any discrepancy between source and destination is a potential bug. This sounds simple, but the execution details matter a lot.
Step 1 — Export From the Source
Export all columns you care about from the source system. Include the primary key explicitly — you will need it as a match key. Use UTF-8 encoding if possible, and note the encoding if not. Avoid transforming data at export time; you want raw values. Save the file immediately before the migration begins so the snapshot is accurate.
Step 2 — Run the Migration
Run your migration script or ETL process as normal. Do not modify the source export after this point — it is your ground truth.
Step 3 — Export From the Destination
Export the same table from the destination system using the same column list. Use the same encoding. If the destination system renames columns (e.g., firstName becomes first_name), either rename them back in the export or note the mapping — you will need column names to match for an automated diff, or you can alias them in your SELECT.
Step 4 — Diff the Two Files
Upload both CSVs to csvdiff.app, select your primary key column, and run the diff. A clean migration will show zero changes. In practice, you will almost always see something — and that is exactly the point.
What to Look For in the Diff Output
- –Unexpected removed rows — records present in the source but absent in the destination. These were dropped and need to be investigated immediately.
- –Unexpected added rows — records in the destination that were not in the source. Could indicate a merge with existing data or a script error that inserted duplicates.
- –Modified rows with truncated values — look for text fields that end abruptly at a round character count like 255 or 500. Classic sign of a field length mismatch.
- –Modified rows with changed special characters — é becoming è, or characters replaced with question marks, signals an encoding problem.
- –Modified rows with date reformatting — 2024-01-15 becoming 15/01/2024 or 1705276800 means the destination is storing dates differently than expected.
- –Numeric fields with added decimal places or rounding — currency values are especially prone to floating-point drift during type casting.
| Status | id | note | created |
|---|---|---|---|
| Modified | 4112 | Customer requested expedited shipping due to upcoming product launch on May 12→Customer requested expedited shipping due to upcoming… | 2024-01-15 |
| Modified | 4118 | Café owner — prefers email contact→Caf? owner ? prefers email contact | 2024-01-16 |
| Modified | 4131 | Renewal pending review | 2024-01-15→15/01/2024 |
| Removed | 4147 | Soft-deleted account, 2-year retention | 2023-11-02 |
Handling Large Tables
For very large tables, diffing the entire dataset may be impractical. A pragmatic approach is to sample strategically: diff the 1,000 most recently modified records (likely to surface recent activity patterns), the 1,000 oldest records (likely to surface legacy edge cases), and a random 1,000 from the middle. If all three samples are clean, confidence is reasonably high. If any sample shows issues, you have found the pattern to investigate.
Row count matching is necessary but not sufficient. A migration that drops 50 rows and creates 50 new ones will show a matching count but a disastrous diff. Always diff, never just count.
Using csvdiff.app for Migration Validation
csvdiff.app processes both files entirely in the browser — your pre-migration and post-migration exports never leave your machine. This matters for sensitive data: customer PII, financial records, health information. You get a detailed row-level diff with old and new values side by side, filterable by status, and exportable as a CSV change log that you can attach to your migration sign-off documentation.
Making It Repeatable
For migrations that happen in stages or that need to be rolled back and re-run, save your export scripts and diff workflow as documented steps. A two-minute diff check before each stage sign-off turns migration validation from a stressful manual review into a predictable, evidence-based gate — and gives you an audit trail if questions arise later.