Name guide

How SSA Baby Name Data Works

Where the data comes from, what it captures, and what it doesn't.

Key Takeaway

SSA name data comes from Social Security card applications, covering ~98% of US births since 1880. Names with fewer than 5 occurrences per year per state are excluded for privacy. The data tracks first names only and reflects registrations, not necessarily births in that calendar year.

The Data Source

Every year, the Social Security Administration publishes a dataset of baby name frequencies derived from SSN applications. When parents apply for their child's Social Security number — which most do shortly after birth — the child's first name is recorded. The SSA aggregates millions of these records into public datasets showing how many babies received each name, broken down by year, state, and gender.

This makes SSA data the most comprehensive source of baby name information in the United States. NameAlmanac uses this data to show trending names, historical popularity charts, and state-by-state name patterns.

What the Data Shows

For each name, NameAlmanac provides:

  • Annual frequency: How many babies received the name each year, going back to 1880.
  • National rank: Where the name stands relative to all other names in a given year.
  • State popularity: Which states use the name most frequently, revealing regional naming preferences.
  • Trend direction: Whether the name is rising, falling, or stable in popularity.
  • Gender distribution: Some names are used for both genders — the data shows the split.
  • Decade patterns: How naming preferences shift across generations.

Browse all names or check specific years to explore the data.

Understanding the Limitations

SSA data is excellent but has limitations worth understanding:

  • Privacy threshold: Names given to fewer than 5 babies in a year/state combination are excluded. This means very rare names simply don't appear in the data.
  • First names only: Middle names, hyphenated names, and suffixes are not tracked in the public dataset.
  • Spelling matters: "Catherine," "Katherine," and "Kathryn" are counted as three separate names. Combined, they might rank much higher than any individual spelling.
  • Historical coverage: Pre-1935 data is less complete because Social Security didn't exist yet. Those records come from people who applied for SSNs retroactively.
  • Registration lag: A baby's SSN application may be filed weeks or months after birth. The year recorded is the birth year, not the application year, but some edge cases exist.

How NameAlmanac Uses This Data

NameAlmanac processes the raw SSA datasets into searchable, visual formats. Every name page shows the complete historical arc — from first appearance to current rank. State pages reveal regional preferences: names popular in the South may be uncommon in the Northeast. Decade pages capture generational shifts — the names that defined the 1980s are very different from those dominating the 2020s.

Worked example: how state-files differ from the national file

File Years covered Rows in 2023 release Suppression floor
National1880–2024~2.05M5/year
State (combined)1910–2024~6.20M5/state-year
Territory (PR, etc.)2000–2024excludedn/a
"A name does not "exist" in SSA data unless five babies received it in the same year. Below the floor, the dataset is structurally blind."
— Kiznis Studio editorial, drawing on SSA Baby Names data

Spelling vs. phonetic identity

SSA records exact spelling. Sophia and Sofia are tracked as separate names with separate ranks, even though they sound identical. Aiden, Aydan, Aydyn, and Ayden each have their own rank curve. To compare "phonetic popularity" you must aggregate spelling families yourself — the dataset itself does not.

Why pre-1937 data is partial

SSA records are based on Social Security Number applications, not birth certificates. The Social Security program launched in 1936; names from 1880-1936 were retroactively added when older Americans applied for cards in adulthood. This means pre-1937 counts under-represent people who died young or never registered for Social Security. The 1880s curve is a meaningful trend line, not a complete birth census.

Frequently Asked Questions

Where does SSA baby name data come from?

The data comes from Social Security card applications. When parents apply for a Social Security number for their newborn, the application includes the child's name. The SSA aggregates these applications into annual name frequency counts by state and gender, published as public datasets.

Why does the SSA only include names with 5+ occurrences?

To protect privacy. Names given to fewer than 5 babies in a given year and state are excluded from the public dataset. This means very rare names, unique spellings, and names from small-population states may not appear. The threshold applies independently to each year and each state.

Does SSA data include all babies born in the US?

Nearly all, but not quite 100%. SSA data covers babies who received Social Security numbers — approximately 98%+ of all births. A small number of families (for religious or personal reasons) do not apply for SSNs immediately after birth. The data is comprehensive enough for trend analysis but not an exact birth count.

How far back does SSA name data go?

The national dataset goes back to 1880. State-level data starts in 1910. NameAlmanac includes the complete historical dataset, allowing you to trace name popularity over more than 140 years. Early years have lower coverage because Social Security didn't exist until 1935 — pre-1935 data comes from retroactive SSN applications.

Does the SSA track middle names?

No. The public dataset includes only first names. Middle names are collected on the SSN application but are not included in the published name frequency data. This means a child named "Mary Elizabeth" would only appear as "Mary" in the dataset.

Why do name counts seem to decrease in recent years?

It's not that fewer babies are being born — it's that names are becoming more diverse. In the 1950s, the top 10 names accounted for 30%+ of all births. Today, the top 10 account for less than 10%. Parents are choosing from a wider variety of names, so each individual name has a lower count even as total births remain steady.

Sources

  • Social Security Administration — Baby Names Dataset (ssa.gov/oact/babynames/)
  • SSA — Beyond the Top 1000 Names

Understanding the Data

The information presented throughout this guide is informed by publicly available public records published by federal and state government agencies. Our database aggregates and standardizes these records to make them more accessible and easier to interpret for general audiences. When we reference specific statistics or trends, they are drawn directly from these authoritative sources unless explicitly noted otherwise.

It is important to understand the limitations of any large-scale data dataset. Records may contain errors from the original data collection process, some fields may be incomplete for older entries, and classification systems may have changed over time. Our analysis accounts for these factors by clearly labeling data vintage, flagging records with missing critical fields, and noting when temporal comparisons span methodology changes in the source data.

For readers who want to conduct their own research, we recommend going directly to the source whenever possible. federal and state government agencies provides detailed documentation on collection methodology, sampling frames, and known data quality issues. Our goal is not to replace primary sources but to make them more approachable and to highlight patterns that may not be immediately obvious when browsing raw records.

How We Analyze Data Records

Our analytical approach involves several steps designed to surface meaningful insights from large datasets. First, we clean and standardize the raw data, handling variations in naming conventions, date formats, and categorical labels. Then we compute summary statistics, distributions, and comparative benchmarks across relevant dimensions such as geography, time period, and category type.

Key metrics we examine include statistical records, geographic distributions, temporal trends. These indicators provide a multi-dimensional view of each entity in our database, allowing users to understand not just individual records but how they compare to peers, regional averages, and national benchmarks. We believe this contextual approach is far more valuable than presenting raw numbers in isolation.