Long-term sociolinguistics trends and phonological patterns of American names

This paper identified macro trends and phonological patterns of 348 million American baby names over 137 years from 1880 to 2017. The analysis showed that sociolinguistic trends have significantly influenced naming over time, as seen in the rise of individualism and unisex names, the impact of public figures and pop culture, and the substantially higher count of unique female names compared to male names. In addition, phonological analysis showed significant differences between male and female names in the number, type, and location of vowels as well as the number of syllables. On average, female names had more vowels, less consonants, and more syllables than male names. Also, names with certain wordfinal vowels and consonants were identified to be mostly-female or mostly-male. These findings demonstrated an inherent correlation between phonology and the perceived gender of names.

transcribing the dataset into International Phonetic Alphabet (IPA). While before a connection was made on a smaller scale (Barry & Harper, 1995), this research demonstrated an inherent correlation between phonology of the names and the perceived gender on a macro level.
2. Data preparation. The raw data came from the United States Social Security Administration (SSA, 2018). The dataset contained first names for 348 million American babies born over 137 years from 1880 to 2017 and was organized as individual files for each year.
Python programs were written to compile the individual files into a master file and to conduct various analyses. Table 1 shows, compiled in the master file, the total number of babies born (~348 million), the total number of data records (~1.9 million), and the total number of unique first names (~97 thousand) broken down by gender from 1880 to 2017. To study phonological patterns, all names were then transcribed into the International Phonetic Alphabet (IPA) using three IPA dictionaries sequentially: the Amerlex dictionary, the Carnegie Mellon University (CMU) dictionary, and the LOGIOS Artificial Intelligence tool.
A Python program was used to automate the transcription process and to add the IPA transcriptions to the master file. Names were first run through the Amerlex dictionary. The names that were not found in the first dictionary were then run through the CMU dictionary. The remaining names were then run through the LOGIOS tool, which transcribed the names using artificial intelligence. Names transcribed by LOGIOS were hand combed in order to ensure consistent transcription with the other two dictionaries.
The final master file was structured to include the name, gender, popularity (number of babies born), the year of birth, and the IPA for each record.
3. Analysis findings. All measurements were normalized against population to remove the effect of population growth. After normalization, analysis shows that the number of unique names per thousand people has grown significantly over time ( Figure 1). Although the chart appears to illustrate the trend first decreasing before increasing, this can be attributed to the limitation of data before 1935 (when birth name collection became mandatory for Social Security reasons). Especially during the births of the Baby Boomer generation from 1946-1964, Figure 1 clearly shows that more and more unique names were being introduced into the US population, eventually increasing the number of unique names per thousand people from about 2 to 8 for males (a 4x increase), and from 4 to 11 for female (a 2.75x increase). This suggests that novel, nonconventional names have become more popular over time, which is likely driven by the rise of individualism. Figure 1 also shows that, except for pre-1935 years when the data was limited, there are significantly more unique names for females than males. For example, currently females have on average about 11 unique names per thousand, which is 37% more than males with only 8 unique names per thousand. This shows that Americans are more open to give females novel, nonconventional names and giving the males more existing, conventional names. The data shows that many names are used for both males and females, which are also known as unisex names. Figure 2 shows that, when normalized against population, there is clearly significant growth of the number of unisex names, especially starting around the 1950s. This demonstrates a growing trend of naming babies without gender association, which can also be interpreted as a form of individualism. While it is expected that public figures and pop culture can influence social perception of particular names, and thus naming trends due to the mere-exposure effect, this study reviewed two specific datasets with good control.
The first sample was names of US presidents. The analysis looked at the ratio of US presidents' name counts, both 4 years before and 4 years after the presidents' first election. The result (Figure 3) shows that names that are either associated with popular presidents and/or unusual first names (e.g. president Woodrow Wilson and president Barack Obama) had a clear jump (40x and 25x) in the number of corresponding baby names after the presidents' first election.
The second sample was names of Disney movie protagonists. The analysis reviewed the ratio of Disney protagonists' names 5 years before and after the movies premiere ( Figure 4). All name counts are normalized to start at 100% at 5 years before movies' premiere. Similar to the president study above, the popular and/or unusual protagonists' names gained significant in-crease in the newborn population after the movie. For example, the name "Belle" from Beauty and the Beast released in 1991 saw almost 900% increase in name count over the 5 years after the movie premiere.  Phonologically, the analysis shows significant differences between male and female names. Figure 5 shows female names on average have 25% more vowels than male names, while male names have 8% more consonants.
In addition, female names end in vowels for 37% of all females in the dataset, while male names only end in vowels for 8% of the males. Similar difference is also seen in names beginning with vowels, though the percentages are much smaller, with only 8% of female names and 3% of male names starting in vowels ( Figure 6). Interestingly, when names end with certain vowels, they are mostly female. Table 3 shows that in the dataset, there were 17 million names ending with [ʌ], of which 97% are female, and there were 49 million names ending with [ə], of which 94% are female.
Conversely, when names end with certain consonants, they are mostly male. Table 3 shows that in the dataset there were 3 million names ending with [f], of which 99% are male, there were 9 million names ending with [k], of which 95% are male, and there were 18 million people's names ending with [d], of which 94% are male. Table 3. Mostly-male ending consonants and mostly-female ending vowels, 1880-2017 Male and female names also differ in syllable count. As shown in Figure 7, of all female names from 1880 to 2017, 29% have three-syllable names compared to 11% for male names. On the other hand, 21% of male names have only one syllable, compared to 6% of female names. 4. Discussion. Some limitations for this study came from the dataset itself. As the US Social Security Act and Administration was established in 1935, it was only after this year that data started to be collected on all newborns in the US. Thus, the data before 1935 is incomplete relative to the newborn population then. The data also does not include information on race or immigration history to analyze culture or immigration impact on American names. Building on this study, there are several areas in which there could be further research. First, as this study focused on American names, it would be interesting to apply the same methodology to names from another culture and understand if similar phonological patterns exist between the male and female names of that culture. Second, in addition to names of people, this research could be expanded to study brand and product names. The social perception of brand names can be critical to the success of the underlying businesses, and understanding the potential perceptions will be useful for naming new brands and products, which can then be customized to generate better perception with the target audience.

Conclusion.
Through analyzing 348 million American names over 137 years, this study identified the naming trends from sociolinguistic impacts: the rise of individualism, unisex names, the higher number of unique female names, and the impact of public figures and pop culture. By adding IPA to the names and analyzing the names phonologically, the study showed the significant contrast between male and female names, which suggested an inherent psychological correlation between names' phonology and perception of genders. On average, female names were found to have 25% more vowels than males, and 37% of all female names in the dataset ended in vowels while only 8% of all male names ended in vowels. In addition, there were certain vowels that, when word-final, were linked mostly to female names. Conversely, there were certain consonants that, when word-final, were linked mostly to male names. Male and female names also differ in syllable count -proportionally more female names have three syllables and more male names have one syllable.
While this study focused on American names, the approach and methodology can be applied to names from other cultures which are expected to have different patterns that drive their own social perceptions and naming trends.