img:hover { transform: scale(1.05); } Today I fine tuned an image recognition deep learning model, created a web interface with Gradio, and deployed it on Hugging Face Spaces. This project was the subject of lesson 2 of fast.ai’s Deep Learning for Coders course.
1. Fine-tuning a Model in Kaggle Deep learning models need to train on a GPU (Graphics Processing Unit) because CPUs are too slow. I followed fast.
I created a chatbot in Shiny by coding along with James Wade’s three-part YouTube tutorial series (#1, #2, and #3), and added ideas I learned from Alejandro AO’s embedding tutorial in python.
You can try it out on shinyapps.io. Unfortunately, the OpenAI API service is not free. I pay a fraction of a penny for every call to it, but the pennies add up. I made the API key a setting the user must enter.
William Poundstone’s Rock Breaks Scissors (Amazon) shows how people are often predictable in their efforts to be random. For instance, from the title, a good strategy for playing Rocks, Paper, Scissors is to start with paper because your opponent is likely to lead with the aggressive rock. The insights are neat throughout, but I’m especially delighted with the simple applications of probability. Let’s play with a few from the chapter on Chapanis’s random number experiment
Retrosheet is one of several sources of detailed baseball statistics.1 It is unique in that it is the only one (that I know of) that curates play-by-play event files of games. Their event files reach back to 1914. Unlike other sources that summarize teams or games, these files summarize individual plays all the way down to pitch sequences. For example, I used my database to find out whether changing pitchers mid-inning is related to game duration.
Career advancement is one way to measure inclusion in an organization. In a five-year study of its employees, KPMG found that socioeconomic background was the strongest determinant of progression, stronger than gender, ethnicity, disability, and sexual orientation. The pdf report is available here. KPMG’s largest socioeconomic gap was in the average time to progress from Manager to Senior Manager. Employees raised in lower socioeconomic conditions took 19% longer to advance.
Participant dropout is a potential source of bias in longitudinal studies. However, Bell (2013) showed differential dropout doesn’t always bias results, and sometimes non-differential dropout does bias results. The solution in all cases is a mixed model. Bell demonstrated this with a simulation using a fictional data set. Bell created 10,000 data sets with perturbations meant to simulate between-person and within-person effects. I created 100 data sets to explore the paper for this post.
There are three techniques usually used to deal with missing values:
Ignore them and just use complete cases. This is acceptable if <5% of cases are incomplete and missingness is random (Azur, 2004)
Impute values with their mean, median, or mode. A mean imputation leaves the overall mean unchanged, but artificially reduces the variable’s variance (Alice, 2018).
Impute values with multivariate imputation by chained equations (MICE). This method creates multiple predictions of each missing value, allowing the researcher to account for uncertainty in the imputations.
For my new survival analysis project on baseball career longevity, I am using the TR Plaza font adopted by the Cleveland Guardians for the 2022 season. Style choices shouldn’t be gratuitous, but I feel like using the Guardians font for my graphics connects statistical analysis to the underlying topic of baseball. Anyway, it’s cool, so I’m doing it.
Before After I ran into difficulties adding the font, so I’m capturing the process here to smooth the way for you and for future me.
Organizations pursuing gender equity in leadership roles are often challenged by a lack of candidates. A 2019 study by McKinsey and LeanIn.org hypothesized the “broken rung” effect: “The biggest obstacle women face on the path to senior leadership is at the first step up to manager.”1. While companies will surely benefit by striving to develop and promote leaders equally, there may be social influences that create a headwind. One such influence is the tendency for women to exit the workforce in order to focus on family.
The Pew Research Center recently published the results of its 2020 survey of 3,535 adults, exploring the importance of cultural origins among various racial/ethnic groups. The data is available (with a free account) from the Pew site. Pew’s main finding was that Black and Hispanic adults are more likely that White adults to feel connected to their roots and regard them as central to their identity. I wonder whether this finding holds generally throughout the Black and Hispanic communities, or whether the importance of roots is also a function of other factors, such as political ideology, level of education, and age.