Parameterising TWED


Time Warp Edit Distance (TWED) is a measure for comparing categorical time-series such as life-course sequences, that is designed to recognise similarity that may be displaced in time. It is similar to Optimal Matching distance in implementation, but can be thought of as locally compressing and stretching the time dimension, whereas OM deletes and inserts elements. TWED, OM and a range of other sequence comparison tools are implemented for Stata in my SADI package (see Appendix: code).…
Read more ⟶

The role of indelcost: OM, LCS and Hamming


How do you parameterise Optimal Matching analysis of lifecourse sequence data? A lot of ink has been spilt on how to build the substitution matrix, defining the differences between the states through which sequences move (the cost of substituting elements), but what about the role of indelcost, the cost of inserting or deleting elements? Here is a quick demonstration (using SADI, in Stata) that shows that below a certain threshold OM becomes LCS, the longest-common-subsequence distance measure, that above another threshold (less well-defined) it reverts to Hamming distance, and that in between there is a minimum meaningful indelcost.…
Read more ⟶