Biologist turned data scientist
View My LinkedIn Profile
2021 WIDS Texas Datathon project
The 2021 WIDS Texas Datathon asked participants to predict the hourly electricity usage for the 8 ERCOT regions of the Texas power grid for one week in June 2021. I used this opportunity as a chance to develop a machine learning project while working under a deadline. My final predictions scored in the top 25% of the finishers, and I was awarded an “Open Source Excellence Award” for my submitted Jupyter notebook which contained my Python code and project explanation. I presented my project, along with the other winners, during the awards presentation webinar. If you would like to view my presentation, advance to timestamp 49:40 in this YouTube video of the webinar: https://www.youtube.com/watch?v=4JTcAUgo5S4
Analysis of r/progresspics post titles
r/progresspics is a subreddit focused on weight change where the post titles are customarily formatted to contain demographic information about the post authors. I developed two projects using a year’s worth of post titles from this subreddit.
Analysis of r/progresspics post titles - Part I This Python project is an exploratory data analysis using features such as sex, age, height, starting weight, ending weight, and duration of weight change extracted from the post titles. I used a linear regression analysis to see if variability in the amount of weight loss could be explained by any of the extracted features. A second linear regression analysis looked to see if variability in post popularity could be explained by the extracted features or by additional features describing the posts themselves.
Analysis of r/progresspics post titles - Part II This complete machine learning project is written in Python and uses the scikit-learn library to develop a model to predict the final weight of Redditors posting to r/progresspics.
BiologyFinder is a tool aimed at new biology graduate students or postdoctoral researchers who have just joined a lab and are looking to understand the research of the lab. It is written in Python and relies on the Python library, Biopython, to interact with the API of NCBI’s PubMed, a biology literature database. It uses a shared citation history to identify biologists doing work similar to a user-named biologist. It then generates a recommended reading list containing the papers most frequently cited by the identified group of biologists in the subfield.
I worked on a team that competed in the Machine Learning for Social Good hackathon. My partner and I developed a dataset with 123 features that characterize Texas counties using publically available information about population, income, home values, health, infrastructure, voting records, and land use. We used this dataset to model COVID outcomes.
Calculating and visualizing the 14 day COVID case rate for counties in Texas
I wrote a Python program that calculates the 14 day COVID case rate for each county in Texas using data on cumulative COVID cases downloaded from the Texas Department of Health Services website. The results are output as a .csv file which I used to create a Tableau visualization that shows how the case rate changes over time. Users can select TX counties of interest as well as the state as a whole to visualize.
Python implementation of the card game Crazy Eights
Return to your childhood with a round of Crazy Eights. It is you versus the computer as opposed to you versus grandma!
Page template forked from evanca