Posts
No, your variable doesn't need to be normally distributed
“I need to test my variables for normality” I keep encountering students (mostly with former exposure to business or psych stats) who insist on testing their variables for normality as a sine qua non prior to conducting analysis. While it’s clearly a good idea to have a sense of how your variables are distributed, normality is (a) not a general requirement and (b) rarely the case (if your tests tell you the variable is normal, it’s most likely because your N is too small).
read more
Posts
Taming BrightSpace, for Linux users
Making the LMS work for you, not you for the LMS BrightSpace is our new Learning Management System here in UL, since last semester. I’m trying to use it as a tool, rather than experience it as a constraint. The first thing I wanted to do was to manage content with as little interaction as possible with the BS user interface. The interface is reasonably well designed, but it depends a lot (like most LMSes) on doing things one at a time, with lots of clicking and opening menus and SAVEs and CONFIRMs.
read more
Posts
Simulating Single Transferable Votes in the UK
FPTP vs STV The UK uses a first-past-the-post voting system (FPTP) that frequently leads to parties supported by significantly less that 50% of the voters forming majority governments. Many other countries use a variety of methods described as PR, proportional representation, which are intended to bring about electoral outcomes (parties’ share of seats) that are much closer to the parties’ share of support among voters. Ireland and a number of other countries use Single Transferable Voting in multi-seat constituencies, which works well and generally leads to more proportional outcomes.
read more
Posts
Parameterising TWED
Time Warp Edit Distance (TWED) is a measure for comparing categorical time-series such as life-course sequences, that is designed to recognise similarity that may be displaced in time. It is similar to Optimal Matching distance in implementation, but can be thought of as locally compressing and stretching the time dimension, whereas OM deletes and inserts elements. TWED, OM and a range of other sequence comparison tools are implemented for Stata in my SADI package (see Appendix: code).
read more
Posts
The role of indelcost: OM, LCS and Hamming
How do you parameterise Optimal Matching analysis of lifecourse sequence data? A lot of ink has been spilt on how to build the substitution matrix, defining the differences between the states through which sequences move (the cost of substituting elements), but what about the role of indelcost, the cost of inserting or deleting elements? Here is a quick demonstration (using SADI, in Stata) that shows that below a certain threshold OM becomes LCS, the longest-common-subsequence distance measure, that above another threshold (less well-defined) it reverts to Hamming distance, and that in between there is a minimum meaningful indelcost.
read more