Revisiting Hynek's Strangeness-Probability Curve: Analysis of 341 Case Reports

Finding signal in the noise of UFO case reports using a modern computational approach to an old idea: Hynek's strangeness-probability curve.

Revisiting Hynek's Strangeness-Probability Curve: Analysis of 341 Case Reports

Ufology, if there is such a thing, suffers from a characteristically paradoxical problem: there is an overwhelming amount of data and virtually no substantive information.

Of course, most of the data are really just witness reports. Those that study these reports readily admit that they mostly describe misidentified objects. We also know for a fact that some reports are hoaxes; there is reasonable disagreement about how common they are. The most hotly debated question is what proportion of case reports contain something truly unusual.  

"Unusual" means different things to different people. It seems likely that the "unusual" category has often included experimental aircraft and space technology. This has created a policy dilemma: how attentively should the government study something that is known to largely be made up of commonly misidentified objects, very occasionally made up of things it'd prefer to be discreet if not secret, and an unknown proportion of some "other" category?

The answer of course depends on one's vantage. From a scientific perspective, if any strange report describes something real and truly unknown, it is of significant interest and should be pursued. The calculus is more complex from a defense perspective. Unknown intentions and capabilities are traditionally the sort of thing that worry defense planners. However, the need to maintain secrets might overwhelm the desire to fully qualify an unknown that seems benign.

Certainly, if there is high confidence that there is no true phenomena of interest, it makes sense to minimize reporting entirely to avoid accidentally disclosing secrets. A performative effort would help assure the public and perhaps add a measure of deniability for intelligence purposes. The end result would be a scientific Potemkin village – something that gives the surface appearance of concern.

Maddeningly, a similar result could just as easily stem from bureaucratic malaise and reflexive avoidance of uncertainty. Combined with a culture of secrecy, such malaise could engender a highly muddled response. Government is never monolithic; it is a relatively safe assumption that decision-makers have held a wide variety of views among themselves. Policy is the product of inter and intraorganizational negotiation. There is little reason to think the UFO issue is any different.

Aside from a priori views on the matter, there is also practicality to consider. Assuming there was an appetite and a need to investigate this, how expensive would it be to chase down every report? How many investigators and analysts would that require? What kind of analytical methods should be employed? What is worthy of consideration and what is not? What do you do if you accidentally touch the obvious issues described above in the process – that plane or drone that doesn't formally exist?

This is a scientific and intelligence problem of nightmarish proportion. Whatever the constraints and motivations of government, the end product for the public is the same. We have an amorphous mass of mostly narrative reports. An unknown but large quantity are likely outright junk; a small portion might be "real" but describe secret experimental technologies. Another smaller portion could be something scientifically new and interesting.

In the absence of a credible and professional (read: expensive and difficult) effort, a small handful of investigators have labored to independently collect data. They've done so alongside a growing community of amateurs, amid a growing din of others attracted to the topic with all its attendant baggage. Meanwhile, the topic has waxed and waned in public interest and attention.

The late 1970s was arguably a peak in that attention. Steven Spielberg's landmark film Close Encounters of the Third Kind came on the heels of a wave of sightings, and a growing distrust in institutions following the Nixon resignation and the Church committee's investigation into misconduct by the intelligence community.

In response to public interest in the matter, the Carter administration requested a report from the Congressional Research Service (CRS) on the issue. The CRS is often described as "Congress's think tank" and is responsible for informing members of Congress on policy issues in a rigorous, non-partisan manner. An updated version of the report was published again during the Reagan administration, in 1983. Much of the report is concerned with issues of data, and of making sense of complex witness testimony.

A full copy can be found below:

Full Document Here

The report is fascinating in many respects. It serves as one of the few governmental but non-military and non-intelligence analyses of the issue in United States history.

One of the more fascinating passages directly addresses the problem described above: how to sort through a mass of cases and identify the ones most likely to be "interesting?" Former Blue Book astronomer Dr. J. Allen Hynek had grappled with this very question for years. His answer: the "strangeness-probability curve." Here is the relevant portion of the text:

Hynek's innovation was born of necessity. The idea is simple. We should filter cases by focusing on just those that are both credible (by virtue of the number and quality of witnesses, or the availability of more forensic data, etc) and the ones that are highly unusual and not easily explained by commonly misidentified phenomena. Cases meeting both the "strange" and "credible" criteria are more likely to represent something scientifically interesting – a good place to invest meager resources.

As he has so often done, Jacques Vallée offered a useful addendum to the concept: the probability of an event being reported actually declines past a certain threshold of strangeness, as witnesses seek to avoid ridicule. Here is his explanation:

The "Hillside curve" was first explained in print in Invisible College. It simply illustrated my observation that the probability for a witness to report an unusual occurrence rises rapidly as "strangeness" increases, but only to a certain point. When events become too bizarre for comfort the probability of reporting decreases again to a vanishing point where the conscious mind of the percipient may not even be aware of the occurrence. --Jacques Vallée, Forbidden Science Volume II

In other words, moderate "strangeness" might induce a witness to report something. Once it becomes overwhelmingly odd however, there is greater social risk in reporting it. Controversially, Vallée further asserts that events can become sufficiently bizarre that the witness does not consciously perceive them at all.

Dr. J. Allen Hynek, Credit:

In fact, there is some evidence to support the notion that as much as we have an overabundance of reports from an economic or scientific perspective, there is an under reporting of the total events – either because they are deemed insignificant, or for fear of ridicule. Even more fundamentally, the passive nature of report collection will always produce fewer reports than actively asking people about their experiences.

An important example comes from data published by Ron Westrum in an article titled "UFO Sightings Among Scientists and Engineers" in Zetetic Scholar #8. In an attempt to survey 4000 scientist and engineering professionals, Westrum inadvertently sent follow up questionnaires to a small number of respondents who claimed to have never observed a UFO. Several of these wrote back positively, leading him to question what would have happened if all respondents had been asked for follow up. Whatever the ultimate reason (faulty memory, initial embarrassment, intervening experience, etc) asking for experiences yielded yet more reports.

In short, inundated as we are with reports from people who proactively file them, they are likely an underestimate of the total number of incidents. Few social scientists or statisticians would be surprised by this result; it is extremely common for there to be different responses depending on the nature of the inquiry. Answering a poll is quite different from answering a questionnaire, and yet different from filing a report with the police or with the Air Force.

With those caveats in mind, we can view Hynek's "strangeness-probability" curve as both an investigative and economic heuristic tool. Within the mass of reports, we should focus first on the ones that are well documented or observed, and are strange enough to contribute new information. We should also recognize the potential that an individual case report might really be one among many – a topic for another time.

The question now, as it has been for decades, is how exactly to disentangle a mass of data in order to extract some potentially interesting information.

Looking for Signal in Noise: The Hatch Database

Unfortunately, many efforts at UFO case collection have neglected providing much structure. One notable exception is the U Database, designed and curated by programmer Larry Hatch (d: 2018). I recently wrote about my effort to learn more about the database, and to aid others in preserving its underlying data. Researcher David Marler wrote a moving and informative tribute to Larry Hatch at OpenMinds, available here:

Larry Hatch, UFO Database Creator, Remembered -
Larry created from the ground level one of the most robust UFO databases up until that time – the *U* UFO Database. This was a database that the public could use on their home PCs. The *U* Database itself could do queries based on word search, geography, or UFO characteristics.

Hatch's database, though the effort of one person, wisely incorporated several helpful features for conducting a modern strangeness-probability curve analysis. First, Hatch took the important step of giving a simple credibility and strangeness score on a 16 point scale to each case (incidentally, he selected this range of values based on a programming convenience of representing the score in hexadecimal.) Though his scoring approach was subjective, it provides a starting quantitative basis to identify "credible" and "strange" cases in the first place.

As the product of an individual, there are many important caveats about it as a data source. The database necessarily relies on secondary reports from the UFO literature which is, to put it very charitably, known to be highly uneven in quality. The database also inevitably reflects idiosyncrasies in Hatch's personal interests and theories.

These idiosyncrasies are palpable within the data, both in terms of the construction of categories and the descriptions of cases. Hatch's commitment to making his application and database easy to distribute also imposed technical limitations on the data, likely leading him to abridge key details. Some cases that have been the subject of multiple books are described in a fragmentary, highly shorthanded sentence. Still, despite these issues, it remains the only fully public resource I'm aware of that is readily amenable to the Hynek "strangeness-probability" heuristic.

What follows below is an attempt to apply that heuristic to try to recover some feint signal from an abundance of noise. It may be tempting for some to describe the analysis below as "scientific." It is emphatically not. It is an empirical analysis of one person's collection of stories about UFOs. I used quantitative methods to try to identify "interesting" cases that may provide pointers for subsequent research. However, the degree to which further research can be carried out is extremely limited – a decades old issues of Flying Saucer Review or APRO Bulletin only has so much information, and is in turn limited by its underlying sources. This work is better characterized as computational history, or perhaps as a form of digital humanities.

As such, this analysis provides suggestions, but few firm conclusions. As a preview:

  • The best cases in each category of evidence tracked by Hatch are outside of the United States
  • The most active "hotspots" in the database do not match with popular public perception. For example, the greatest case density in the data is near a heavily forested portion of the United Kingdom. Coincidentally or not, that area is also home to a facility that manufactures nuclear submarine reactor cores.
  • The number of "interesting" cases by the Hynek method of using both strangeness and credibility is decreasing significantly over time, at least through the end of the 1990s
  • The distribution of shapes of objects ("morphology") is dynamic over time. "Saucer" and "cigar" shapes have tended to vary together, and are declining through the end of the 1990s.
  • Brazilian cases are a significant cluster, and are strikingly different from others. They involve far more instances of human harm, and also appear to have attracted greater regional press attention.
  • "Consciousness" effects are a component of the Hatch data, but they are relatively minor and occur less frequently among "interesting" cases compared to baseline. The more "credible" cases among them tend to involve mass religious events, particularly Marian apparitions. As a caveat, Hatch tended to score events "credible" when there was strong evidence that something occurred, not necessarily that the event itself was truly unexplained.
  • A brief methodological section that demonstrates how the "credible/strange" concept can be applied to a graph theoretic approach to case analysis. Applications include identifying "hinge" cases within waves, as well as analyzing geographic hotspots at multiple scales of geographic resolution.
  • A comparison with spatial models derived from French CNES/GEIPAN data show similarities in the observation of a decline of cases, and an apparent relationship with nuclear facilities.

Overall, this piece is intended as a survey and methodological experiment for other researchers. The piece is admittedly dense, but hopefully provides useful – and specific – resources.

The Data

View the interactive version here:

A few preliminaries. The underlying data from the complete Hatch database is available here in JSON format:

Download the Full Dataset

Other formats can be found in my previous piece. Following Hynek's "strangeness-probability curve" described above, I partitioned the data using Hatch's credibility and strangeness metrics. Across all cases, the mean "credibility" score was 7.54/15 with 90% of cases scored at or below 10. Likewise, the mean "strangeness" score was 6.60/15 with 90% of cases scored at or below 8. Note that the hatch scale starts at 0. Using these scores, I retrieved a set of 341 cases that are in the top 10% of both strangeness and credibility. These are intended to serve as a proxy for Hynek's heuristic described above.

The smaller data set of "credible and strange" cases is available here:

Download the Credible-Strange Subset

Additionally, I constructed a spatial database to allow for flexible querying and analysis of the data. An interactive version of the map displayed above can be viewed here:

Click to View Interactive Version of the Case Map

Some early users have mentioned the map does not display well on phones, so you may wish to use a desktop or laptop to explore the tool.

Within the smaller set, the credibility metric has a maximum score of 15. The minimum was the threshold of 10. The mean credibility was 10.54, with a median of 10. In the strangeness metric, the maximum score was 15, and the minimum was the threshold of inclusion at 8. The mean strangeness score was 8.56, with a median of 8.

The temporal distribution of the cases is as follows:

Pre-1940s 1940s 1950s 1960s 1970s 1980s 1990s Total
Credible/Strange 3 17 75 73 95 41 37 341
Baseline 435 1018 4776 3082 4272 1572 2562 17717

The data shows a relative peak of credible/strange reports in 1970s, followed by a sharp decline. Reports since 1940 are not evenly distributed over time, X2 (5, N = 341) = 75.93, p < .01.

The following visualizes a correlation matrix between characteristics in the credible and strange sets:

The numeric values are provided below:

relativeAltitude elevation credibility duration strangeness year
relativeAltitude 1.00000000 -0.10738664 0.17077406 -0.02922637 -0.12104228 -0.26653857
elevation -0.10738664 1.00000000 0.10413248 -0.07109671 0.10574473 0.01160489
credibility 0.17077406 0.10413248 1.00000000 0.06557058 -0.00895009 0.02927648
duration -0.02922637 -0.07109671 0.06557058 1.00000000 0.05473428 0.06204757
strangeness -0.12104228 0.10574473 -0.00895009 0.05473428 1.00000000 0.18326318
year -0.26653857 0.01160489 0.02927648 0.06204757 0.18326318 1.00000000

Overall, there is a small positive correlation of credibility with elevation (0.10) and duration (.06). The strangeness metric is also correlated with elevation (.11) and with the year (.18).

Taken together, we see a decline in the total number of cases, as well as a decline in "credible/strange" cases. However, the "strangeness" of "credible/strange" cases has slowly increased over time.

In terms of national geography, countries with more than 10 cases include the following:

Country Count
USA 145
Brazil 22
France 19
UK + Ireland 13
Russia 11
Argentina 11
China 10

Analysis of Case Categories

The Hatch database makes use of a large number of binary "flags" indicating categorical properties of the cases. This may have been a technical choice by Hatch to reduce the overall size of the data by simplifying the case descriptions. Further information about the "case flags" can be found on my previous piece about the software.

An interesting question arises: are there differences in the proportion of some case flags in the "credible-strange" set versus baseline cases? We should expect so. For example, the credibility metric alone should show a higher frequency of flags related to evidence, such as "High Quality Observers" or "Radar." Indeed, these simple differences are apparent between the two sets in the following table:

Flag Credible/Strange Proportion (n=341) Baseline Proportion (n=17782) Z-score P-value
High Quality Observers 94.43 45.42 17.98 <0.01 **
Coverup 9.97 1.34 12.99 <.01 **
Military 34.60 11.98 12.56 <.01 **
Wave 32.26 10.98 12.25 <.01 **
Technical 44.87 18.77 12.10 <.01 **
Radar 13.20 3.23 10.03 <.01 **
Airborne 18.77 7.72 7.48 <.01 **
Government Security Agency Involvement 6.45 1.45 7.44 <.01 **
Other Government Agencies 13.49 4.78 7.35 <.01 **
Radiation 3.23 0.43 7.35 <.01 **
Electro_Magnetic Effect 18.48 7.73 7.28 <.01 **
Ray 19.94 9.36 6.58 <.01 **
Traces 14.08 5.96 6.20 <.01 **
Submersible 5.28 1.49 5.59 <.01 **
Military Investigation 23.17 13.48 5.16 <.01 **
News 37.24 25.06 5.12 <.01 **
Apparent Landing 26.69 16.30 5.12 <.01 **
Animals Affected 8.50 3.39 5.10 <.01 **
Scientist 7.62 2.91 5.05 <.01 **
Oddity 13.78 6.96 4.87 <.01 **
Photos 9.97 4.51 4.76 <.01 **
Human Affected 14.37 7.56 4.68 <.01 **
Saucer 79.77 67.92 4.65 <.01 **
Observation 38.12 27.35 4.41 <.01 **
Building Or Any Manmade Structure 13.49 7.71 3.94 <.01 **
Nuclear 4.69 1.79 3.94 <.01 **
Humanoid 8.50 4.18 3.92 <.01 **
Cigar 21.99 14.64 3.79 <.01 **
Abduction 5.28 2.24 3.71 <.01 **
Dirt 9.38 5.09 3.55 <.01 **
Historical 1.76 0.45 3.49 <.01 **
Injuries 5.28 2.38 3.43 <.01 **
Operations 2.35 0.73 3.42 <.01 **
Plants Affected Or Sampled 7.92 4.16 3.42 <.01 **
Figure 5.87 2.78 3.39 <.01 **
Sampling 4.40 1.88 3.34 <.01 **
Sea 4.99 2.35 3.15 <.01 **
Signal 4.99 2.55 2.81 <.01 **
Vehicle Affected 18.18 13.99 2.20 0.05 *
Giant 1.17 0.42 2.09 <.05 *
Delta 13.49 10.26 1.94 0.052
Man_In_Black 0.59 0.19 1.67 0.095
Pseudo_Human 3.81 2.50 1.53 0.127
Robot 0.88 0.51 0.96 0.339
Missing Time 2.93 2.26 0.83 0.405
Conversation 2.05 1.53 0..78 0.437
Odors 0.59 0.40 0.52 .602
Monster 0.88 0.69 0.41 .679
Telepathy 1.47 1.36 0.18 .861
Coast 20.53 20.27 0.12 .908
Ground 93.26 93.36 -0.08 .940
Blue Book 9.68 9.80 -0.08 .939
Probe 12.90 13.28 -0.20 .840
Contactee 0.29 0.39 -0.28 .780
Camouflage 8.50 9.22 -0.45 .650
Map 99.41 99.58 -0.46 .643
Hoax 0.88 1.24 -0.59 .553
Sound 5.28 6.43 -0.86 .388
Fireball 6.45 7.75 -0.89 .374
No Ufo 1.17 1.99 -1.07 .282
Nightlights 14.37 17.17 -1.36 .174
Civilian 90.03 92.31 -1.56 .118
Misidentification 1.76 5.51 -3.03 <.01 **
No Occupant 83.28 90.45 -4.43 <.01 **

Reviewing the table also provides some occasion to absorb the idiosyncratic nature of the data, with categories such as "Pseudohuman" and "Men in Black." As expected, evidence flags have a much higher proportion in the credible/strange set. Conversely, flags like "hoax" and "misidentification" have a far lower proportion in the credible/strange set than in the baseline.

The following several sections explore differences in groups of case flags. Examining individual categories provides useful insight; for example one can ask "what is the most credible case involving photographic evidence?" A question we'll answer promptly below.

Types of Evidence

Flag Credible/Strange Proportion Baseline Proportion Z-score P-value
High Quality Observers 94.43 45.42 17.98 <.01 **
Radar 13.20 3.23 10.03 <.01 **
Traces 14.08 5.96 6.20 <.01 **
News 37.24 25.06 5.12 <.01 **
Photos 9.97 4.51 4.76 <.01 **

The strongest difference involves high quality observers – an overwhelming 94% of the "credible/strange" has this flag, versus less than half in the baseline. The lowest proportion involved photographic evidence, which was approximately double the baseline.

Interrogating individual case exemplars can help give some preliminary flavor of the data. To that end, the top cases in each evidence type are given below:

ID Strangeness Credibility Description Date Location Flag
8916 15 15 SCR LITES AREA/lo alt 3 OIDS INSIDE MAN ZAPPED+DIES! 1968-08-09 JABOTICATUBAS,BRZL High quality observer(s)
15669 8 13 300+GOOD OBS/LAST YEAR /WALL St.JRNL 1990-01-01 WAVRE+SE BELGIUM Radar
4312 8 12 2 WHT DOTS MNVR/2km alt >>S ABS.SLNT big news 1954-10-30 ROMA,ITL Traces
12551 10 14 60%/POPULATION SEES CHASE+ZAP ETC /FSRv39#3+/r63p158 1977-03-01 PINHEIRO,BRZ News
16862 9 13 6 25M SCRS SHOW INSTEAD/900M alt 1994-10-01 BATURITE,CEARA,BRZ Photos

Notably, none of the top cases by evidence flag took place in the United States – despite the United States having the largest number of cases. Intriguingly, Brazilian cases represent 60% of the most "credible" top five. Details on some of these cases are difficult to obtain.

Among the cases, the case from the "Belgian Wave" of 1990 is probably the most well known. Major General Wilfried De Brouwer gives some comment about the case here:

The "best" photographic evidence case reflects an interesting quirk of the Hatch database. Hatch's "credibility" metric is often highest for mass events – even if the interpretation of the events are somewhat ambiguous. This appears true for a Brazilian case that involved a large gathering of people. A news report below shows the crowd and interviews Brazilian ufologists who analyze what appear to be underwhelming pictures:

If any Brazilians readers have further context on this incident, I invite them to reach out to me. You can find my contact information here.

One of the men pictured in the news report is Reginaldo Ataide (d: 2016), a Brazilian ufologist who in later years would grant access to his case files to Bigelow Aerospace Advanced Space Studies (BAASS), the eventual contractor for the United States government UFO programs. Researcher Marc Cecotti detailed a 2009 trip by BAASS team members that took records of Ataide and other ufologists research materials. A participant in the meetings described it this way:

[T]hey arrived with camcorders and cameras and spent several hours photographing and filming our files and case reports.” He also insisted: “Here in Brazil they didn't do any research. They took and photographed what the ufologists saw and documented, and asked each ufologist permission to make public what we had documented.”

Ataide is pictured below, greeting three members of the Bigelow team:

Many questions remain about BAASS and successor efforts interest in Brazilian cases – a matter for subsequent articles.

One final evidence type to review is the "great cross" event in Rome in November of 1954. Hatch coded this as a case involving trace evidence. An account of the event in Flying Saucer Review references significant contemporaneous coverage of a UFO flap. This was followed by a strange "filamentous" material described as "angel hair." The material was described to evaporate over the course of several hours. Importantly, the matter of "angel hair" does not appear to have been the subject of wide reporting – just the initial sightings. The topic of "angel hair" goes well beyond our scope here, but references are available below:

Government Response

Flag Credible/Strange Proportion Baseline Proportion Z-score P-value
Coverup 9.97 1.34 12.99 0.000 **
Military 34.60 11.98 12.56 0.000 **
Government Security Agency Involvement 6.45 1.45 7.44 0.000 **
Other Government Agencies 13.49 4.78 7.35 0.000 **
Military Investigation 23.17 13.48 5.16 0.000 **
Blue Book 9.68 9.80 -0.08 0.939

The Hatch database has a notable number of "credible and strange" case reports involving government responses. Cases with the "Coverup" flag occur with nearly 7.5 times the frequency of the baseline. Military cases are nearly 3 times the frequency, and constitute over a third of the Hatch cases.

Credit: Farmington Daily Times

The top "Coverup" case cited by hatch is the Farmington, New Mexico mass sighting event of 1950. This event has been well-researched by UFO historian and archivist David Marler. Marler's excellent piece addresses the Air Force's handling of the matter:

It may come as no surprise to those who have followed the history of the United States Air Force’s investigation into UFOs, that nothing significant was concluded regarding the Farmington incident. Certainly, the “explanations” they concocted for many classic flying saucer episodes defied logic and the facts in many cases. What is a surprise is that, upon a cursory review of Project Grudge files, there is no reference to Farmington, NM in March of 1950 whatsoever! When one examines the tally sheet of sightings that month, there is no indication of anything occurring over Farmington, NM in the way of UFO activity. Strange considering the inordinate volume of objects observed by numerous witnesses!

Despite this apparent oversight, we know thanks to Marler's research that an official account of the incident was picked up in a "spot intelligence report":

The full details are beyond the scope of this piece – and would provide only a summary of Marler's excellent findings. In short, the case was largely dismissed by authorities with explanations that many found unconvincing, if not impossible. Further, rumors circulated about photographs being confiscated by officials after the fact. However, Marler spoke directly to witnesses on the matter of intimidation:

First, as previously mentioned, no one indicated any level of intimidation. In fact, in speaking with me directly, eyewitness Marlo Webb scoffed at the notion that military / government representatives threatened anyone to keep silent on the matter. Further, he humorously remarked to me the “agents” that spoke with him asked some questions, but nothing to the extent of the questions that I posed to him decades later. He suggested they seemed interested but not overly interested in his testimony. There were also no references to said threats whatsoever in Dr. McDonald’s interview notes or audio recordings of interviews with witnesses. There is simply no evidence of this.

In summary, it is anything but clear that the Farmington case involved a conspiratorial "coverup" so much as a muted official response. As Marler notes, such a response was typical of the Air Force in that era.


Flag Credible/Strange Proportion Baseline Proportion Z-score P-value
Saucer 79.77 67.92 4.65 <.01 **
Cigar 21.99 14.64 3.79 <.01 **
Delta 13.49 10.26 1.94 0.052
Fireball 6.45 7.75 -0.89 0.374

The "saucer" and "cigar" shape are both more prevalent in the "credible/strange" data than the baseline. The overall frequency of structured craft (saucer, cigar and delta) over time is visualized below:

Blue: saucer, Red: cigar, Yellow: delta

Further, "saucer" and "cigar" cases appear to co-occur within the same year (r=.52). There is a much smaller correlation between "saucer" and "delta" cases (r=.09). Over the last several decades, there have been notable peaks of "saucer" cases, with a relatively low level of "delta" cases that has slightly increased in recent years.

Psychological and Physiological Effects

Flag Credible/Strange Proportion Baseline Proportion Z-score P-value
Human Affected 14.37 7.56 4.68 <.01 **
Injuries 5.28 2.38 3.43 <.01 **
Missing Time 2.93 2.26 0.83 0.405
Conversation 2.05 1.53 0.78 0.437
Odors 0.59 0.40 0.52 0.602
Telepathy 1.47 1.36 0.18 0.861
Sound 5.28 6.43 -0.86 0.388

The highly general flags of "human affected" and "injuries" are significantly more prevalent in the "credible and strange" set than the baseline. However, specific categories of strange phenomena like "missing time" or "telepathy" are not significantly different from the baseline, and makeup a low overall proportion of cases.

Examination of the "injuries" cases shows that they are not limited to human injury; many pertain to various cases of cattle mutilation. The vast majority of human injury cases appear in Brazil, and are extremely rare elsewhere. Further details are pending my ability to obtain references and underlying material, some of which is unfortunately quite obscure.

One of the more notable abduction cases in the data set is the Allagash, Maine incident. It is likely coded as highly credible because the event involved four witnesses who allegedly came into close contact with a UFO while hiking in a remote area of Maine. Two of the witnesses spoke about their recollection of the event in recent years here:

The narrative about the event was complicated in 2016 when one of the participants admitted that the group had embellished details of their story "to make money":

To add an additional measure of complexity, the witness still maintains that they did see a UFO and experienced some emotional disturbance – just not the elaborate details of the abduction added to the case later.

Methodology: Interrogating Wave Structures with Graphs

The concept of UFO "waves" or "flaps" refers to clusters of cases within the same period. A basic outline of UFO "waves" can be seen above in the plot of object morphology. Hatch further coded some cases as belonging to a "Wave" category using his flag system. Approximately 32% of cases in the "credible and strange" data were assigned this flag. However, "waves" do not appear to be strictly defined in the database.

To further describe the relationship of cases with each other with respect to time, I conducted an analysis of case co-occurrence. 310 out of 341 (91%) of the Hatch credible and strange cases had at least one other case within 1 week prior or after the reporting date. The median case had 21 other cases reported within one week. The maximum was a July 1947 case, which had a striking 436 reports within one week.

Exploring the relationship of cases with each other introduces a useful analytic construct: networks or graphs. Graph analysis can be a helpful tool for finding relationships between cases along a variety of dimensions. While simple timelines can be very helpful in understanding a sequence, graphs can be useful in identifying which cases were particularly central to a particular "wave" event.

Visualizing Hatch Cases as a Temporal Graph

For example, we can envision cases as vertices in a large graph. We draw undirected edges between cases if the cases occurred within one week of each other. We can then calculate the degree distribution, or the number of edges each case has. A simple histogram of the degree distribution is shown below:

Informally, distribution can be described as roughly log-normal:

To give a simple application, take this cluster of cases from July of 1947:

Deconstructing the 1947 Wave Via Betweenness Centrality

Here, the larger dots represent "credible-strange" cases, whereas the smaller dots represent other cases in the larger data set that may have a lower credibility or strangeness score.

The individual case dates and locations are given below:

CaseID Date/Time Location
783 1947/07/03 18:30 St. Maries, Idaho
788 1947/07/03 22:00 Springfield, Illinois
806 1947/07/04 13:10 Portland, Oregon
810 1947/07/04 14:50 Twin Falls, Idaho
814 1947/07/04 17:00 Colorado Springs, Colorado
855 1947/07/05 14:30 Auburn, California
1002 1947/07/07 17:30 Gettysburg, Pennyslvania
1085 1947/07/09 12:20 Boise, Idaho

These eight cases all took place within a period of about 6 days. However, cumulatively they span over four hundred related cases. Based on a graph analysis, the "hinge" event was case #855 in Auburn, California. By refining or filtering edge relationships, it is possible to study "cliques" or smaller groupings of cases that occurred within narrower time periods. It is beyond our scope here, but offered as potential inspiration for others interested in examining this data.

Download the Case Temporal Adjacency List Data Here

Methodology: Decomposing Geographic Clusters

Similar methods can be applied to analyzing the geographic distribution of cases. In the graph theoretic framework described above, cases can again be considered as vertices, with edges drawn based on their geographic distance from each other.

There are a large number of ways this can be done – for instance, edges can be weighted based on geographic distance, or some combined function of distance and the underlying case properties, such as the "credibility" metric. Indeed, graph representations can also be developed that combine both temporal and geographic relationships simultaneously.

In this initial piece, I present a simple, undirected graph representation based on thresholds. For the sake of simplicity, I used an arbitrary distance of 100 kilometers between case events. In the "credible and strange" data set, the highest number of nearby cases was 357 near the city of Derby in the United Kingdom (more on this later). Network analysis can be used to identify case clusters, though spatial visualization tends to be more practical as a first-order analytical tool. The cases with the largest number of nearby cases are given in the table below, along with the year of the report. Several overlap and are part of the same case cluster – for example Bakewell and Matlock in the United Kingdom:

CaseId Location Year # Nearby Cases
16518 BAKEWELL DERBY 1993 357
16521 MATLOCK DERBYs 1993 347
14295 GLENDORA CA 1982 337
11872 TUJUNGA CA 1975 333
16192 M56 at STRETTON CHESHIRE 1992 321
6507 LONG BEACH CA 1961 313
5302 LONG BEACH CA 1957 309
4661 PALMDALE CA 1955 307
14433 MAHOPAC NY 1983 302
6692 ORADELL NJ 1962 299
7626 DORCHESTER MA 1966 298
3355 RESEDA CA 1953 294
14566 YORKTOWN NY 1984 294
15669 WAVRE+SE BELGIUM 1990 287
4726 nr RIVERSIDE CA 1955 286
7420 SALEM MASS 1965 283
7210 US SAT.BASE/BEDFORDs 1965 282
7634 DANVERS MASS 1966 281
7544 EXETER NH 1966 262
15534 LOUVAIN AREA BELG 1990 255
7246 D8 SW/VALENSOLE FR 1965 223
14585 DUTCHESS CTY NY 1984 214

Overall, the median case had 28 neighbors, with an overall average of 58. A simple histogram of the degree distribution, both untransformed and log, is given below:

Though networks can be a very powerful tool for this analysis, it is often easier to understand the results using maps. As such, we'll review a few maps that visualize the geographic degree distribution (more simply, cases that have a large number of other cases nearby)

An interactive version of the maps below can be found online here if you'd like to explore for yourself.

To begin, here is a map of cases in the continental United States. Again, the size of the dot corresponds to the overall number of nearby cases:

Here is a closer view of western cases, with a distinct cluster in the Los Angeles area and a smaller smattering in the less densely occupied southwest:

Here is a closer view of East coast cases. Note that many of the New England cases come from the 1965 "wave":

A view of the Western European cases shows a concentration in France and a very dense case cluster in England:

Here is a closer view of the United Kingdom. The cluster near Manchester and Sheffield is the densest in the credible/strange data set:

Fortunately, geospatial visualization allows us to dig more deeply by expanding a cluster into its individual cases. Here is a plot of the location of all cases near the index case in the Derby region. The size and color of case markers is determined by the "credibility" index. Cases with a higher score are both larger and redder:

These raw case points can also be spatially "binned" or combined. Sometimes this can be helpful for identifying larger scale patterns. For example, here is a plot of 10km hexagons in the region. The hexagons are coded by the mean credibility score of cases:

Here we see a potential "corridor" in the region just west of Derby and Nottingham, along with several other "hotspots." Recalling the Hynek concept, maps like these can help investigators to make sense of the spatial structure of a group of sightings and to correlate with potential environmental variables of interest.

Beginning with the index case itself, we find an initial media report of 22 sightings of triangular objects over the town of Matlock:

Here we see references to multiple preceding "waves," and a potential coincidence with the "Belgian wave" discussed above. Temporal and geographic tools can be used to identify these case relationships and sort them by credibility. In examining this individual case however, we see an apparent cluster further south near Derby and Nottingham.

A cursory search shows that local industry includes a major manufacturing facility that produces nuclear reactor cores for submarines. Evidently, it underwent some reconstruction in recent years:

A further report shows a recent safety violation involving an overabundance of fissile material in one place:

Of course, this may be completely coincidental. However, it is helpful to know that the region is home to significant defense manufacturing. Researchers like Robert Hastings and former government officials like Luis Elizondo have long discerned a pattern between UAP sightings and nuclear facilities.

This brief case study is meant to demonstrate how geographic analysis can create a global scale picture of "hotspots." Using spatial or network analysis, these hotspots can be decomposed into their underlying cases, and then analyzed at more granular scale. This allows both for finding potential investigative patterns, and as a tool for prioritizing which cases to examine or follow up on.

Download the Geographic Case Adjacency List Data Here

Comparative Study: "Spatial Point Pattern Analysis of the Unidentified Aerial Phenomena in France"

More sophisticated forms of spatial modeling are possible, provided appropriately structured data. For example, in "Spatial Point Pattern Analysis of the Unidentified Aerial Phenomena in France," Laurent et. al examined French government data related to unidentified aerial phenomena (UAP). I have included a brief review of the paper both as an example of more advanced spatial analysis methods, and as an example of how government transparency can empower fruitful scientific investigation of this difficult subject.

The authors explain the procedure for reporting and analyzing UAP events in France:  

In France, once a person has been a witness to a UAP, she or he has the possibility to report at the Gendarmerie. The witness is asked to fill in a detailed survey to provide information such as date, time, place, duration, orientation, shape, size, trajectory, witness’ distance to the phenomenon, etc. The investigation is then handed over to the GEIPAN (http: //, a unit of the French Space Agency CNES, whose main mission is to validate the information provided by the witness and to determine the nature of the UAP. In addition, the GEIPAN classifies each UAP into 4 categories A/B/C/D, forming a kind of scale, which goes from perfectly known and determined (A) to unknown and undetermined (D), after investigation.
It should be noted that even today, 19.5% of UAPs remain undetermined after investigation which is frustrating of course for both the witness and the scientist.

Laurent, et al. focus on Class D sightings – those that remain unexplained after examination. As an additional note, this report was written prior to a major review of cases, where a large number of Class D sightings were retroactively solved, leaving the remaining proportion closer to 5%.

Mirroring the Hatch database, Laurent et al. found that Class D sightings have been decreasing over time:

Their modeling approach focused on examining the potential relationship between environmental and anthropogenic factors and UAP-D sightings. Their initial spatial model is described below:

Their final model coefficients are described below:

Proximity to nuclear facilities is strikingly significant in their model, with a very large coefficient size. Laurent, et al. stress this finding, along with the relationship with contaminated land in their conclusion:

In sum, they find:

  1. The number of Class D sightings is decreasing over time
  2. UAP-D sightings have a non-random geospatial distribution
  3. There are statistical relationships with human activity, notably sites related to nuclear energy and industrial contamination

Intriguingly, Findings 1 and 2 are also found within the Hatch database. It is not yet possible to rigorously claim a nuclear relationship within the Hatch data, but individual examples like the Derby cluster and the work of Hastings suggest it is an important avenue of research. Former US officials have expressed concern about this relationship in several venues.

Replicating the methods above would be a worthwhile objective. However, the analysis of Laurent, et al. is only possible due to the data collection and transparency of CNES/GEIPAN. The case analyses provided by government investigation are an invaluable resource that have no direct parallel in the American context. The Hatch database provides an initial foothold, but is not rigorous enough to substitute for a professional investigative effort.

Towards Better UAP Case Data

As stated at the outset, data has long been the core problem in the study of UAP, both in the public and private sphere. The issue has never lacked for quantity of cases; what has been persistently missing is useful structure that allows for effective filtering of those cases.

Designing scientific databases is rarely a simple enterprise, even in relatively "conventional" fields. That said, here are a number of suggestions that could potentially benefit future data collection efforts:

  1. Record variables that address the credibility and nature of the witnesses. It is imperative to be able to filter reports based on high quality observations. The majority of the analysis is above is only possible due to the "credibility" and "strangeness" metrics. A mass of simple case narrative reports is not helpful without contextualizing information. This is invariably a sensitive matter, but also a crucial one.
  2. When using rating scales, use standard methods like the Likert-type scale. These scales help prevent raters from simply picking the "center" score and can make the more data more informative.
  3. If using subjective scales, try to use multiple evaluators. For example, subjective scores from just 3 randomly assigned but qualified assessors can be analyzed in terms of inter-rater reliability statistics and the like.
  4. Consider hierarchical and multilabel taxonomies. Cases typically have a number of salient factors. A thoughtful and flexible taxonomy can be a highly useful tool for categorizing and analyzing cases.
  5. Be very clear about references. Always prefer primary sources to secondary sources.
  6. Take pains to accurately geocode case locations. Modern tools also make it easier to describe complex geometries; consider using data beyond simple points if appropriate.