What’s Wrong with the World

The men signed of the cross of Christ go gaily in the dark.


What’s Wrong with the World is dedicated to the defense of what remains of Christendom, the civilization made by the men of the Cross of Christ. Athwart two hostile Powers we stand: the Jihad and Liberalism...read more

How to lie with statistics, example #4,563,699

Here is a very interesting letter by a researcher named Jokin de Irala (and his co-writers) that is relevant to the "they're going to do it anyway" meme. De Irala et. al. compared statistics in three countries showing the mean and median ages of "first sexual intercourse" with their own findings as to the percentage of teens in those countries who are actually sexually active at those ages. And what strange things we do find. For example, we have the following odd situation in Spain: Mean age of first sexual encounter--16.3 years. Median--16 years. Yet Irala's research found only 21.7% of 16-year-olds were sexually active at all and only 34.8% of 17-year-olds. Other countries Irala checked turned up similar results.

How can these things be?

Irala suggests the following solution to the puzzle:

The mean age of first sexual intercourse was obviously estimated solely using subjects who have already had sexual intercourse whereas the proportion of youth that were sexually initiated use all youth in each age group as the denominator.

Ohhhh. I get it. So when we hear that the "mean age of first sexual encounter" is getting lower and lower, that just means among the young people who aren't remaining chaste during their teen years. That's a very different matter from "they're going to do it anyway." Let me repeat that another way: The data on mean and median age of first sexual encounter apparently do not include chaste teenagers! That's almost incredible.

As Irala calmly notes,

Sentences derived from average data such as this one—“compared with previous generations, young people (16–20 year-olds) were having intercourse for the first time at an earlier age, on average at 16.5 years of age” (Avery & Lazdane, 2008)—leave the facts as to how many from these age groups have, in fact, had sex unspecified. These confusing interpretations of epidemiological data create definite impressions that can be misleading and thus may hinder public health and educational interventions that are trying to delay sexual initiation in youth[.]

No kidding.

I don't know for certain whether researchers who report these mean and median ages were trying deliberately to mislead, but it will be interesting to see whether anything changes in the reportage now that Irala et. al. have issued this cautionary note. My bet: Probably not.

Comments (14)

This is hardly a good method to assess this: surveys. Suffice it to say there are many confounding variables, Why, the mean age for first sexual encounter was probably 14.0 years old - in 1400. Longevity, health, etc. are all confounding variables.

As for surveys, they depend on people telling the truth and this is a subject, unfortunately, probably under-reported. I would guess the percent is higher than their statistics indicate.

The more telling statistic, which few have asked, is the percent of virgins at the time of marriage. Now, that would be tending towards single digits, I suspect, sadly.

The Chicken

But presumably the statistics were all gathered by means of some form of subject reporting, so that doesn't cast any more doubt upon Irala's statistics than upon the statistics of the people who gathered data showing the mean and median age.

The point is that it appears that the mean/median age statistics _must_ be gathered in some way other than by taking into account all subjects, including those who have had no sexual experience at all. Irala's conjecture seems exceedingly plausible--the median and mean are being reported from *sexually active* teens. In fact, Irala doesn't seem to expect that conjecture even to be challenged. But all by itself that is a very, very serious issue as far as the impact of these kinds of statistics on public policy, where this type of thing is used to support the idea of inevitable sexual activity among teens.

To argue that it is misleading, one is at least making the implicit argument that it is probable that more people are maintaining virginity and those who are not are doing losing it earlier. Considering the researcher had the data available to support that claim but chose not to make the claim would indicate to me that the data doesn't support it.

"Aw, people can come up with statistics to prove anything...
Forty percent of all people know that."

I'm not a fan, alas, anymore. But Homer Simpson got something right, now and then.

Like "Facts are meaningless - you could use facts to prove anything that's even remotely true!"

Mackie wrote an entire book on ethics when he heard that...


The most important quote to remember when dealing with statistics:

The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases.

The second most important:

There's lies, and there's damn lies, and then there's statistics.

Some portion of my work day is spent working with statistics, and whenever anyone quotes statistics to me my first response is to ask: where did those come from, and how were they first collected? After that, I can ask how they have been organized. Finally, I can check on whether the "conclusions" are actually supported by the statistics that purport to support them. More often than not, there's problems in AT LEAST one step, often in more than one. Often, the method of collecting some data cannot be justified as being sound in any formal way, and indeed nobody can imagine finding a rigorous method to collecting some data - such as "age at first sexual encounter". Actually, the case is even worse: often even the matter at issue cannot even be defined rigorously, such as "annual household income below poverty level". What is "annual household income" to a young woman who serially lives with 3 men during a year, one of whom makes (unreported) money selling drugs, and another of whom makes money doing roofing, but only during the season and only when he can get hired on (and half of THAT is unreported, too)?

There are important places where statistics are valuable and lead to worthwhile advances in understanding our world. Politicians find it difficult to use such. Most politicians know very little mathematics, even less of the science of statistics, and wouldn't know a bad statistic if it bit him on the...or a good one either, for that matter (should one accidentally manage to pop up).

Badger, if we assume that the "mean age getting younger" statistic is true *in any sense*, it looks like the data do support the idea that those who are sexually active are becoming so earlier while the majority remain sexually chaste. I'm not sure why you think they don't. Look at the chart. Check out Peru, for example. Other researchers reported a mean age of first sexual encounter as 14.3. Irala, et. al.'s research supported the conclusion that only 9.6 % of 14-year-olds and only 17% of 15-year-olds were sexually active at all. You do the math. This is *not possible* on its face. The only explanation Irala could come up with was the one he put forward--namely, that the mean was being reported only for sexually active teens. If you have a better explanation, you're welcome to bring it forward. Now, *if* the mean in Peru (the "mean," that is, of whatever population they're actually talking about) is getting younger while at the same time only 9.6% of teens at that mean age are sexually active at all, then, yes, what this apparently means is precisely that the majority of the teens are not sexually active but that those who are, are becoming so earlier.

Oh, bother. Mean and median.

Let's suppose that the survey question is "At what age did you first have sex", and the answers allowed only as whole numbers: either 13, or 14, or 15, or 16, or 17... Now, let's suppose that in Year X the following data are returned: 8 people say "16", and 4 people say "17". The mean age is announced as 16.3. The median number is reported as 16.0.

This is bad analysis and reporting, but it happens all the time. The number "16.0" uses tenths for the simple and idiotic reason that the mean number did. But it is false: NOT ONE of the values entered in the survey used a decimal number, they all used a whole number. The real median is "16", with no trailing ".0". You cannot tell whether it means age 16 and 0 months, or age 16 and 5 months, or something else, all you know is they have not yet reached their 17th birthday.

Now, another survey of the very same 12 people returned the following data: 8 people said 16 years, 11 months. 4 people said 17 years, 3 months. Now the mean age is announced as "age 17.0". The median is announced as 16.9 years. This result is based on the exact same facts as the first study.

The researchers do a third study on the same population, but now they have teens who are damned tired of being asked private questions. They get the following results: 5 people say age 11. 2 people say age 29 (the highest age they can actually envisage). The rest say nothing. The report shows that the mean is now 16, and the median is 11.

Lydia, I think you're missing a distinction drawn by the authors when you conflate mean and median. The (unconditional) median age of first intercourse (including virgins) can in fact be reasonably estimated if more than half of the sample has had sex - Spain, according to these charts - if one defines the age of first sex for those who die virgins as infinity.

But the authors note that, obviously, the unconditional mean age of first sexual intercourse cannot be estimated from these samples. (Unless, I'd add, you make some modeling assumptions about when the virgins will have intercourse, but there's no reason to suggest that anyone did that, nor should they have.) You can only estimate the conditional mean, the mean age of first sex given that one has had sex. That's the essential point about median and mean here.

I agree that citing the mean age statistics is really misleading. My a priori guess is that the original researchers were clear about this, but that the findings were reported in a misleading way. Cases where statistics are misinterpreted by journalists can generally be explained by ignorance rather than by malice or even by unconscious bias. Why do you think those people majored in journalism in the first place? Because they want to "make a difference," yeah, but also because they hate statistics. Never, ever underestimate the incompetence of journalists.

Whether the original researchers were clear would require looking up their papers. I think it significant that Irala asked "Do they know what we mean?" and seemed to be implying that _some_ responsibility rests with researchers to be _more_ clear in issuing these reports. After all, scientific research results usually don't get into the news all by themselves or just by means of some news hound sniffing them out and choosing all on his lonesome to report them. I certainly don't underestimate the incompetence of journalists. I also don't underestimate the agenda of sex researchers, which has, shall we say, been around for a while.

Aaron, I don't see how the population Irala sampled (shown in the first column) can possibly yield an unconditional median, either, of 16 for Spain. It seems obvious, especially since the studies are by different researchers, that both the mean and the median (in the second and later columns) were conditional on the teens' being sexually active.

Lydia, I agree with you, the medians given in the table cannot possibly be unconditional if the "sexually active" numbers are correct, including for Spain. What I meant was that it's possible to get a good estimate of the unconditional median, whereas the unconditional mean isn't even defined, and if one did define it by ignoring those few people who die without ever having sex (a reasonable approximation), it would still be impossible to estimate that unconditional mean from a sample of teenagers. This is just unpacking the authors' word "obviously."

By the way, I glanced at one of the references (Ma et al., 2009), and the authors of that study did make it clear that they were talking about sexual active people, e.g., "The mean ages at first sexual activity for [before high school, high school, and university] initiators were...". So I'm still guessing that the primary sources are clear, the secondary scholarly sources are sometimes muddled, and by the time it gets to the newspapers all bets are off.

What would you call Avery and Lazdane? Secondary scholarly source? It may be that that is where the most culpable unclarity is entering the process.

Why don't they just lot the raw number data in histogram form over time, taking a new survey each year? This would give a time-dependent, "waterfall" plot so that one could see the variation from year to year.

Lies, damned lies and statistics aside, I'm just wunnerin' what Candidate Perry has to say about all of this . . .

Post a comment

Bold Italic Underline Quote

Note: In order to limit duplicate comments, please submit a comment only once. A comment may take a few minutes to appear beneath the article.

Although this site does not actively hold comments for moderation, some comments are automatically held by the blog system. For best results, limit the number of links (including links in your signature line to your own website) to under 3 per comment as all comments with a large number of links will be automatically held. If your comment is held for any reason, please be patient and an author or administrator will approve it. Do not resubmit the same comment as subsequent submissions of the same comment will be held as well.