All posts
Use case6 min read

Validating a Data Migration with a CSV Diff Check

Before you go live, export a before/after snapshot and diff them. Here's the exact workflow that catches migration bugs in minutes.

Data migrations have a way of looking fine right up until they are not. The new system accepts the import without errors, the row count matches, and a spot-check of ten records looks clean. Then three weeks after go-live someone discovers that all phone numbers were stripped of their leading plus sign, or that a free-text field was silently truncated at 255 characters, or that 400 records tied to a deleted foreign key were quietly dropped rather than migrated. A CSV diff check catches these issues before they become production incidents.

Why Migrations Go Wrong

  • Character encoding mismatches — UTF-8 source data imported into a Latin-1 table mangles accented characters and emoji.
  • Field length truncation — a migration script that maps a VARCHAR(500) source to a VARCHAR(255) destination will silently cut off long values.
  • Type coercion bugs — dates stored as strings may be reformatted or zeroed out when cast to a DATE column.
  • Missing related records — records that depend on a parent row that failed to migrate are often silently skipped rather than raising an error.
  • Default value substitution — nullable fields mapped to NOT NULL destinations may be filled with 0 or empty string instead of being flagged.
  • Soft-delete confusion — records with a deleted_at timestamp are sometimes excluded from the migration when they should be preserved.

The Export-Diff-Verify Workflow

The core idea is straightforward: export a snapshot from the source system before the migration runs, run the migration, export the same dataset from the destination system, then diff the two exports. Any discrepancy between source and destination is a potential bug. This sounds simple, but the execution details matter a lot.

Step 1 — Export From the Source

Export all columns you care about from the source system. Include the primary key explicitly — you will need it as a match key. Use UTF-8 encoding if possible, and note the encoding if not. Avoid transforming data at export time; you want raw values. Save the file immediately before the migration begins so the snapshot is accurate.

Step 2 — Run the Migration

Run your migration script or ETL process as normal. Do not modify the source export after this point — it is your ground truth.

Step 3 — Export From the Destination

Export the same table from the destination system using the same column list. Use the same encoding. If the destination system renames columns (e.g., firstName becomes first_name), either rename them back in the export or note the mapping — you will need column names to match for an automated diff, or you can alias them in your SELECT.

Step 4 — Diff the Two Files

Upload both CSVs to csvdiff.app, select your primary key column, and run the diff. A clean migration will show zero changes. In practice, you will almost always see something — and that is exactly the point.

What to Look For in the Diff Output

  • Unexpected removed rows — records present in the source but absent in the destination. These were dropped and need to be investigated immediately.
  • Unexpected added rows — records in the destination that were not in the source. Could indicate a merge with existing data or a script error that inserted duplicates.
  • Modified rows with truncated values — look for text fields that end abruptly at a round character count like 255 or 500. Classic sign of a field length mismatch.
  • Modified rows with changed special characters — é becoming è, or characters replaced with question marks, signals an encoding problem.
  • Modified rows with date reformatting — 2024-01-15 becoming 15/01/2024 or 1705276800 means the destination is storing dates differently than expected.
  • Numeric fields with added decimal places or rounding — currency values are especially prone to floating-point drift during type casting.
Statusidnotecreated
Modified4112Customer requested expedited shipping due to upcoming product launch on May 12Customer requested expedited shipping due to upcoming…2024-01-15
Modified4118Café owner — prefers email contactCaf? owner ? prefers email contact2024-01-16
Modified4131Renewal pending review2024-01-1515/01/2024
Removed4147Soft-deleted account, 2-year retention2023-11-02
A real migration diff exposing three classic bugs: truncation, encoding, and date reformatting.

Handling Large Tables

For very large tables, diffing the entire dataset may be impractical. A pragmatic approach is to sample strategically: diff the 1,000 most recently modified records (likely to surface recent activity patterns), the 1,000 oldest records (likely to surface legacy edge cases), and a random 1,000 from the middle. If all three samples are clean, confidence is reasonably high. If any sample shows issues, you have found the pattern to investigate.

Row count matching is necessary but not sufficient. A migration that drops 50 rows and creates 50 new ones will show a matching count but a disastrous diff. Always diff, never just count.

Using csvdiff.app for Migration Validation

csvdiff.app processes both files entirely in the browser — your pre-migration and post-migration exports never leave your machine. This matters for sensitive data: customer PII, financial records, health information. You get a detailed row-level diff with old and new values side by side, filterable by status, and exportable as a CSV change log that you can attach to your migration sign-off documentation.

Making It Repeatable

For migrations that happen in stages or that need to be rolled back and re-run, save your export scripts and diff workflow as documented steps. A two-minute diff check before each stage sign-off turns migration validation from a stressful manual review into a predictable, evidence-based gate — and gives you an audit trail if questions arise later.

Try it yourself

Ready to diff your files?

Upload two CSV files and see the differences in seconds. 100% client-side — your data never leaves the browser.

Start comparing →

Ready to diff?

Drop your files, see the deltas, export the merge. Takes 30 seconds.