OK, I will try to critique Experiment 1.
There are potentially numerous methodological issues in the description of the experiment on pages 7-8, but I am not qualified to discuss them, as they are specific to psychology. Notably, the required "settle-down" period between erotic images could introduce some kind of confounding, although again, I can't say.
As for the statistics described on page 9:
Across all 100 sessions, participants correctly identified the future position of the erotic pictures significantly more frequently than the 50% hit rate expected by chance: 53.1%, t(99) = 2.51, p = .01, d = 0.25.3 In contrast, their hit rate on the nonerotic pictures did not differ significantly from chance: 49.8%, t(99) = -0.15, p = .56. This was true across all types of nonerotic pictures: neutral pictures, 49.6%; negative pictures, 51.3%; positive pictures, 49.4%; and romantic but nonerotic pictures, 50.2%. (All t values < 1.) The difference between erotic and nonerotic trials was itself significant, tdiff(99) = 1.85, p = .031, d = 0.19. Because erotic and nonerotic trials were randomly interspersed in the trial sequence, this significant difference also serves to rule out the possibility that the significant hit rate on erotic pictures was an artifact of inadequate randomization of their left/right positions.
3 Unless otherwise indicated, all significance levels reported in this article are based on one-tailed tests and d is used as the index of effect size.
It appears from this description that the unit of observation is a "session", with n=100 independent observations. Given that they report 99-degree-of-freedom test statistics, I gather that the quantitative outcome being assessed is the number of correct guesses (out of 36 trials, see page 6), treated as an approximately-normal continuous variable. Given the Central Limit Theorem (CLT), it may be appropriate to approximate a binomial variable with a large number of trials as continuous, but I worry that 36 trials (and as few as 18 when aggregating by picture type) are insufficient for the CLT to apply. I would probably have used a generalized linear mixed model (GLMM) with logistic regression, to address the heterogeneity concern that Dr. Volin raised. However, there are issues with this approach as well, given that there is (possibly) between-subject heterogeneity as well as (possibly) temporal autocorrelation, the latter introducing a fair bit of computational complexity; in any case, it would have been a good sensitivity check to at least try a random-intercept GLMM, and/or a generalized estimating equation (GEE) with a random intercept and an autoregressive working correlation assumption. There is a multiple testing problem here (3 tests), that requires multiplying each p-value by 3 (for a crude Bonferroni correction), or else something potentially more sophisticated that could improve power. However, I am not impressed by p=0.03, which is the best p-value after Bonferroni-adjustment. I am also concerned that the authors used one-tailed tests. If you convert to 2-tailed tests with Bonferroni correction, the best p-value is 0.06, i.e. insignificant at the conventional alpha=0.05. In any case, the effect sizes are very small.
In other words, this is not particularly impressive.
And this is a critique issued by a statistician/epdemiologist who believes in woo.