Loss of surnames in small communities: a model
We recently visited my wife’s grandmother’s village of origin, on the eastern edge of the Massif Centrale in France. It is tiny and relatively remote, in good but very hilly farmland. The town has a population now of under 500, and was maybe twice that during her grandmother’s childhood. Visiting the graveyard it struck us that certain surnames came up again and again (Débatisse, Barge, Goutorbe, Pras, Copperé, Fragne). Naturally in small communities (particularly before easy transport) people will marry locally, but to contemporary eyes it seems almost incestuous: everyone seems related to everyone!…
Read more ⟶
GPS on two phones: measurement error
I got a new phone a few days ago, a Fairphone 5, after a long time with a Samsung S8. I decided to compare their performance in terms of GPS. I’ve been using a homebrew mess of code (principally Python and Stata) to extract GPS data for years, as I’m kind of allergic to using Strava-type apps. I store timestamp, latitude, longitude and elevation on trips. The example I will use to compare the devices is a long bike ride that I took recently in the French countryside.…
Read more ⟶
Julia instead of NetLogo
Simple agent-based models: NetLogo or Julia I’m idly thinking of putting on a new computational social science module, which lead me to look at Paul Smaldino’s book, Modeling Social Behavior. This looks like a very accessible text on its title-topic. Smaldino uses NetLogo throughout, however, which I’d prefer not to do. My potential students will have been exposed to Julia, so it would be desirable to continue with that. To get an impression of the pedagogical costs of using a general rather than a domain-specific language, I decided to implement Smaldino’s first example, “particles”, from Chapter 2.…
Read more ⟶
Advent of Code 2024
Here are my solutions to the 2024 Advent of Code, using Julia.
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 1 using CSV, DataFrames df = CSV.read("advent1.csv", DataFrame, header=["L1", "L2"], delim=" ") println("Day1, Answer 1: ", sum(abs.(sort(df.L1) - sort(df.L2)))) println("Day1, Answer 2: ", sum([i*count(x->(x == i), df.L2) for i in df.L1])) Day 2 Part 1 Read the data
records = [[parse(Int32, x) for x in field] for field in [split(r, " ") for r in readlines("advent2.…
Read more ⟶
No, your variable doesn't need to be normally distributed
“I need to test my variables for normality” I keep encountering students (mostly with former exposure to business or psych stats) who insist on testing their variables for normality as a sine qua non prior to conducting analysis. While it’s clearly a good idea to have a sense of how your variables are distributed, normality is (a) not a general requirement and (b) rarely the case (if your tests tell you the variable is normal, it’s most likely because your N is too small).…
Read more ⟶
Taming BrightSpace, for Linux users
Making the LMS work for you, not you for the LMS BrightSpace is our new Learning Management System here in UL, since last semester. I’m trying to use it as a tool, rather than experience it as a constraint. The first thing I wanted to do was to manage content with as little interaction as possible with the BS user interface. The interface is reasonably well designed, but it depends a lot (like most LMSes) on doing things one at a time, with lots of clicking and opening menus and SAVEs and CONFIRMs.…
Read more ⟶
Simulating Single Transferable Votes in the UK
FPTP vs STV The UK uses a first-past-the-post voting system (FPTP) that frequently leads to parties supported by significantly less that 50% of the voters forming majority governments. Many other countries use a variety of methods described as PR, proportional representation, which are intended to bring about electoral outcomes (parties’ share of seats) that are much closer to the parties’ share of support among voters. Ireland and a number of other countries use Single Transferable Voting in multi-seat constituencies, which works well and generally leads to more proportional outcomes.…
Read more ⟶
Parameterising TWED
Time Warp Edit Distance (TWED) is a measure for comparing categorical time-series such as life-course sequences, that is designed to recognise similarity that may be displaced in time. It is similar to Optimal Matching distance in implementation, but can be thought of as locally compressing and stretching the time dimension, whereas OM deletes and inserts elements. TWED, OM and a range of other sequence comparison tools are implemented for Stata in my SADI package (see Appendix: code).…
Read more ⟶
The role of indelcost: OM, LCS and Hamming
How do you parameterise Optimal Matching analysis of lifecourse sequence data? A lot of ink has been spilt on how to build the substitution matrix, defining the differences between the states through which sequences move (the cost of substituting elements), but what about the role of indelcost, the cost of inserting or deleting elements? Here is a quick demonstration (using SADI, in Stata) that shows that below a certain threshold OM becomes LCS, the longest-common-subsequence distance measure, that above another threshold (less well-defined) it reverts to Hamming distance, and that in between there is a minimum meaningful indelcost.…
Read more ⟶