How SSA Baby Name Data Works, Data Source Guide

The Data Source

Every year, the Social Security Administration publishes a dataset of baby name frequencies derived from SSN applications. When parents apply for their child's Social Security number, which most do shortly after birth, the child's first name is recorded. The SSA aggregates millions of these records into public datasets showing how many babies received each name, broken down by year, state, and gender.

This makes SSA data the most comprehensive source of baby name information in the United States. NameAlmanac uses this data to show trending names, historical popularity charts, and state-by-state name patterns.

What the Data Shows

For each name, NameAlmanac provides:

Annual frequency: How many babies received the name each year, going back to 1880.
National rank: Where the name stands relative to all other names in a given year.
State popularity: Which states use the name most frequently, revealing regional naming preferences.
Trend direction: Whether the name is rising, falling, or stable in popularity.
Gender distribution: Some names are used for both genders, the data shows the split.
Decade patterns: How naming preferences shift across generations.

Browse all names or check specific years to explore the data.

Understanding the Limitations

SSA data is excellent but has limitations worth understanding:

Privacy threshold: Names given to fewer than 5 babies in a year/state combination are excluded. This means very rare names simply don't appear in the data.
First names only: Middle names, hyphenated names, and suffixes are not tracked in the public dataset.
Spelling matters: "Catherine," "Katherine," and "Kathryn" are counted as three separate names. Combined, they might rank much higher than any individual spelling.
Historical coverage: Pre-1935 data is less complete because Social Security didn't exist yet. Those records come from people who applied for SSNs retroactively.
Registration lag: A baby's SSN application may be filed weeks or months after birth. The year recorded is the birth year, not the application year, but some edge cases exist.

How NameAlmanac Uses This Data

NameAlmanac processes the raw SSA datasets into searchable, visual formats. Every name page shows the complete historical arc, from first appearance to current rank. State pages reveal regional preferences: names popular in the South may be uncommon in the Northeast. Decade pages capture generational shifts, the names that defined the 1980s are very different from those dominating the 2020s.

Worked example: how state-files differ from the national file

File	Years covered	Rows in 2023 release	Suppression floor
National	1880–2025	~2.05M	5/year
State (combined)	1910–2025	~6.20M	5/state-year
Territory (PR, etc.)	2000–2024	excluded	n/a

"A name does not "exist" in SSA data unless five babies received it in the same year. Below the floor, the dataset is structurally blind."

- NameAlmanac editorial, drawing on SSA Baby Names data

Spelling vs. phonetic identity

SSA records exact spelling. Sophia and Sofia are tracked as separate names with separate ranks, even though they sound identical. Aiden, Aydan, Aydyn, and Ayden each have their own rank curve. To compare "phonetic popularity" you must aggregate spelling families yourself, the dataset itself does not.

Why pre-1937 data is partial

SSA records are based on Social Security Number applications, not birth certificates. The Social Security program launched in 1936; names from 1880-1936 were retroactively added when older Americans applied for cards in adulthood. This means pre-1937 counts under-represent people who died young or never registered for Social Security. The 1880s curve is a meaningful trend line, not a complete birth census.

Keep reading

Frequently Asked Questions

Where does SSA baby name data come from?

The data comes from Social Security card applications. When parents apply for a Social Security number for their newborn, the application includes the child's name. The SSA aggregates these applications into annual name frequency counts by state and gender, published as public datasets.

Why does the SSA only include names with 5+ occurrences?

To protect privacy. Names given to fewer than 5 babies in a given year and state are excluded from the public dataset. This means very rare names, unique spellings, and names from small-population states may not appear. The threshold applies independently to each year and each state.

Does SSA data include all babies born in the US?

Nearly all, but not quite 100%. SSA data covers babies who received Social Security numbers, approximately 98%+ of all births. A small number of families (for religious or personal reasons) do not apply for SSNs immediately after birth. The data is comprehensive enough for trend analysis but not an exact birth count.

How far back does SSA name data go?

The national dataset goes back to 1880. State-level data starts in 1910. NameAlmanac includes the complete historical dataset, allowing you to trace name popularity over more than 140 years. Early years have lower coverage because Social Security didn't exist until 1935, pre-1935 data comes from retroactive SSN applications.

Does the SSA track middle names?

No. The public dataset includes only first names. Middle names are collected on the SSN application but are not included in the published name frequency data. This means a child named "Mary Elizabeth" would only appear as "Mary" in the dataset.

Why do name counts seem to decrease in recent years?

It's not that fewer babies are being born, it's that names are becoming more diverse. In the 1950s, the top 10 names accounted for 30%+ of all births. Today, the top 10 account for less than 10%. Parents are choosing from a wider variety of names, so each individual name has a lower count even as total births remain steady.

Sources

Social Security Administration, Baby Names Dataset (ssa.gov/oact/babynames/)
SSA, Beyond the Top 1000 Names

According to the Social Security Administration, as of June 2026 this guide is grounded in the agency's public-use baby-name files, which document 105,966 distinct names spanning 1880-2025.