John Speaks

Scrutinizing Bias Towards Groups Protected in U.S. Employment Within the GPT-4o-mini Model

December 2024 Machine Learning Project and Paper Champaign, IL

I measured biases in the GPT-4o-mini model towards groups protected in U.S. employment law. I found that the model exhibits biases towards these groups, which could have significant implications for the use of such models in contexts like hiring and mental health. Using both story generation and labeling prompts to evaluate the model, the results were striking.

Evaluating GPT-4 Modeled Arabic-English Code Switching With Python Word2Vec

November 2024 Deep Learning Project and Paper Champaign, IL

I evaluated GPT-4's ability to emulate Arabic-English code-switching by comparing synthetic examples with natural data through Word2Vec models. My analysis highlights a gap in GPT-4's capacity to replicate nuanced sociolinguistic phenomena like code-switching. The project reinforces the need for localized datasets to enhance the accuracy of language models in all contexts.

Exploring Variation Across Four Languages Using Reality TV Captions and Legal Documents

October 2024 Deep Learning Project and Paper Champaign, IL

I analyzed register variation in web data for English, Finnish, Greek, and Portuguese by comparing it to legal documents and reality TV subtitles. Using Jaccard and cosine similarity, I explored how web documents align with these anchors. Python tools like Scikit-learn and Pandas were used for data cleaning, sampling, and similarity analysis.

Using Deep Learning and Embedding Spaces to Explore Gender Bias Online

May 2024 Deep Learning Project and Paper Champaign, IL

Using large linguistic corpora from X (formerly Twitter), Reddit, and Wikipedia, I trained three separate 100 dimensional embedding space to store the semantic meaning of words. I then used these spaces to explore gender biases through my own custom designed metrics to measure the distance between gendered words and seniment related words on each platform.

Comparing Pop and Rap Music Lyrics With Embedding Spaces

April 2024 Deep Learning Project and Paper Champaign, IL

I trained two 100 dimensional embedding spaces using Genius Lyrics data for pop and rap music. I then used the Jaccard Similarity of a variety of common words in each corpus to explore the relation of words in semantic clusters between the two genres.

Using Machine Learning KMeans Clustering Techniques To Categorize Wikipedia Articles

March 2024 Machine Learning Project and Paper Champaign, IL

Using a large corpus of Wikipedia articles, I trained a KMeans clustering model to categorize the articles into several categories. I then explored the categories, and compared them to the categories that can be generated usign the tag data in each Wikipedia article's metadata. This outlined several differences and similarities between how humans and machines categorize information.

Developing A Classifier For English Dialects

February 2024 Machine Learning Project and Paper Champaign, IL

I trained a classifier using X (formerly known as Twitter) data to classify English Tweets into their country of origin. This provided a useful tool that could be used on other corpora without country labels to determine the country of origin of the text.

Crackdown - A Productivity App Startup

June 2023 - May 2024 Full-Stack Software Project Champaign, IL

I am one of two full stack developers creating Crackdown: A Productivity App from the ground up. As part of this project, I am developing for Android and IOS simultaneously in the Dart programming language using the Flutter library. In addition, I am managing user creation, authentication, and databases using Firebase and Cloud Firestore.