Perrin’s Data Science Final Project
Idea 1: Gender gaps across school subjects at different levels of education overtime.
Questions:
- Does the level of education effect the rate at which gender gaps close?
- Does the level of education effect the time that gender gaps start to close? For example, it is plausible that a gender gap in computer science could start to close at the colege level a few years after it starts to close at the highschool level as the cohort of original highschoolers enter college.
- How do gender gaps across different school subjects vary by region? Does this change overtime?
Potential Challenges:
- Gender is a spectrum, but I will be shocked if I can find data about gender gaps that acknowledges this. Older data will probably record biological sex, and new data will probably record gender as man, woman, and other. Depending on who collected the data, the other category could just contain non binary and gender fluid people, or it could also contain trans people. Should I anlayse the extremely broad and undescriptive “other” category as its own gender?
- Lots of subjects branch into more sub-subjects as education level progresses. Math in middle school can become math, computer science, and physics in highschool. Computer science in high school can become computer science, computer engineering, data science, cyber security, etc. in college. How can I accound for this in my analysis? How can i even track which subjects branch in what ways? Should I focus mostly on named college majors that match named classes in highschool?
Idea 2: (Probably) unconcious patterns in fiction writing
Questions:
- How consistent are authors about the order in which they list their characters names?
- For the authors that are consistent, is there any correlation between this order and the importance, gender, race, or age of chatecters?
- How do the demongraphics of authors correlate to the demographics of major charaters with significant amounts of dialogue?
Potential Challenges:
- This project may involve a lot of manual collection of data from text files. Some things, like the order of name lists and the amount of dialogue attributed to a character I might be able to scrape from text files myself. However, things like figuring out the demographics and importance of a character would be hard. I might be able to call on an AI, but I would probably have to pay to access their APIs. I generally think that unless I can find a really good data set for this, I might strugle to make my own big enough.
Idea 3: According to Our World In Data the price of lighting in the UK has fallen drastically since the 1300s. I want to explore possible causes and effects of this.
Potential Questions
- What correlations exist between the cost of light and levels of education?
- What correlations exist between the cost of light and innovation?
- What correlations exist between the cost of light and GDP?
Week 10 Update:
This will be a solo project on gender gaps in classes and majors at Whitman, and how they are related to race. If the College Board responds to my data request, then I also hope to discuss how gender gaps at Whitman are related to gender gaps in high school AP classes.
I hope to use the following Data Sources:
Data Source 1: Whitman’s Institutional Research
I have contacted Neal Christopherson to ask for data, but I have not heard back yet. As a result, my Pros/Cons list is speculative.
Pros