Now for some actual windsurfing data...
Regarding testing, I tend to do 3 types of testing where possible - static testing (leave devices in an open area for several hours), driving (non-challenging dynamic testing) and real-world testing (windsurf / windfoil / wing). Real-world testing uses the most ideal (but still practical) method of mounting / wearing the device. Secure on the bicep produces great results with the Motion, but the wrist is obviously most practical for a watch. I have no doubt that two watches would produce more consistent results if they were worn on the bicep, but that isn't very practical.
I always wear two motion minis as my references devices. This makes it easy to spot any outliers and allow for an evaluation as to whether the Motion results can be trusted as the baseline. Visual inspection of the data always usually shows bigger differences in my watches than the Motions. A visual inspection of the entire track is par for course, along with a comparison of the top 5 results for all significant categories. Visual inspections will usually cause me to pick up any anomalies (and occasional spikes) which don't appear in top 5 results.

My previous reference to errors includes the standard GPS / GNSS error budgets and things like an upside down receiver. A receiver may be capable of measuring the true speed at any specific moment (including split-second jerky movements), but in an ideal world you really need multiple measurements per second if outputting PVT solutions at 1 Hz. Recall the previous post about what happens internally in the SiRF chipsets. That is very likely to be the reason we don't see aliasing effects on 1 Hz devices from Locosys.
Peter's driving tests show little difference between the ESP and Garmin devices that were tested, although there was a slight bias in the watches (not centered around zero). This is consistent with my own testing but sadly it's not representative of the real-world performance we see on the water. It provides useful insights into the innate precision of the watches, so I appreciate seeing Peter's data and think it provides useful insight to anyone interested in the nuances of GNSS performance.
I've taken my last windsuring session (pre-Christmas) and created similar plots to my earlier winging example. This was a typical speed session in Portland Harbour. The charts aren't measuring the innate precision, but they show how the live speeds (smoothed using a 2s rolling average) differ between 2 equally good watches worn on the wrist, and 2 motions worn on the bicep.

The percentile figures say that during this session, whilst sailing > 25 kts:
- watches within 0.13 kts of each other for 50% of the time, 0.44 kts for 95% of the time and 0.81 kts for 99.7% of the time
- motions within 0.03 kts of each other for 50% of the time, 0.09 kts for 95% of the time and 0.18 kts for 99.7% of the time
Roughly speaking the differences between the motions are 4 to 5 times smaller than the watches across a range of speeds. These figures are not meant to represent "errors" or give an absolute measure of precision, they simply show how much two watches / motions vary throughout a typical windsurfing session when worn in the recommended manner.
These stats are consistent with what I observe when comparing top 5 results using Speedreader, or comparing 2s results as I finish a run, which I have done many hundreds of times. A lot of the differences are likely due to how the watches are being worn on the wrists, but that is unavoidable. Most runs show results within 0.1 or 0.2 kts of each other on the watches, but 0.5 to 0.7 kts is not a rare occurrence.
Accuracy and precision are different things from a scientific perspective, and this post does not even touch on the differencea or attempt to quantify them for these devices. The ISO definition of accuracy works quite nicely for GNSS, which is a combination of trueness and precision. One aspect of Peter's earlier post related to trueness, when discussing errors being high or low for sustained periods.
I guess the main goal of this specific post is to show what can be achieved with modern Garmin watches in real-world conditions and when wearing the devices in the most optimal (but practical) way. It isn't giving a measure of accuracy in the ISO sense (combination of trueness + precision) but observing the consistency of data from two devices provides some useful insights imho.