aliens

UFO Sightings Data Exploration

UFO Sightings Data Exploration

T Carrico, Ailantic, LLC

2021-12-27

Reading time: 19 minute(s) @ 200 WP

Introduction

The National UFO Research Center created a data set of all unidentified flying objects (UFO) sightings dating from 1561 to 2019 publicly available online at Kaggle. In these notes, we explore these data for insight into the times, places and things people saw in the sky over the years. We are interested in how these sightings have changed over time and the description of what was seen.

There are three goals of this analysis:

  1. To summarize the UFO data in words and graphics for people interested in what is in these data files;
  2. Explore correlations, statistical inferences and possible causation to help put sightings into context;
  3. Perform Natural Language Processing (NLP) to the free-form text data in the reports for more insight.

This paper does not conclude extraterrestrial visits - it provides information on a data science approach to analyzing these data and some exposure to techniques used in other data analysis to gain understanding.

UFO Sighting Data

In 1561 the people of Nuremberg witnessed a terrifying scene in the sky over the city. This is the first report in the data set and worth reading about the account which includes an excellent woodcut engraving image reporting the event shortly after.

Report of the celestial phenomenon over Nuremberg in 1561. Report of the celestial phenomenon over Nuremberg in 1561.

The remainder of the data from 1561 on, is very sparse until mid 20th century. As we review this, we keep in mind data collection techniques which will bias information. Issues like the world population, reporting mechanisms and even social and cultural impacts on reporting (e.g., accusations of sacrilege, heresy and witchcraft) greatly influence data collection. What was seen and when vs. what was recorded is a factor throughout this exploration.

The data set contains over 88,000 reports. In the table below we show some of the columns and examples of reports. We can see some rich data in the narrative summary along with some categorical, positional and time information.

A subset of sighting data.
summary city state date_time shape duration city_latitude city_longitude
My wife was driving southeast on a fairly populated main side road, it was dark out side at about 6:43pm, And my wife exclaimed” fallin Chester VA 2019-12-12 18:43:00 light 5 seconds 37.34315 -77.40858
I think that I may caught a UFO on the NBC Nightly News that aired March 21st or 22nd. Rocky Hill CT 2019-03-22 18:30:00 circle 3-5 seconds 41.66480 -72.63930
I woke up late in the afternoon 3:30-4pm. I went to have a bath, while shaving my legs i noticed indentations around my left ankel. I t NA NA NA NA NA NA NA

In the figure below, we see all of the reported sightings. Three years are marked with red lines to indicate the years where we generally saw increases from single digit sightings, to in the tens (1947), the hundreds (1966) and thousands (2006). These steps show significant increases at the end of the 20th century in to the 21st century. For context, these years coincide with World War II (1940), the space race (1960s) and the internet (2000). Another facotr we want to consider is that the National UFO Reporting Center was also founded in 1974. We know these events saw more aircraft, more spacecraft and more human connections and ease of reporting. These are the types of contextual information that help us formulate hypotheses into a form where we can mathematically solve for significance.

Shapes

The data contains information on the shapes reported in the sightings. The bar chart on the right displays the total for each shape. The most common shape seen is light followed by circle and triangle. We have some useful knowledge in what gives off light in the sky, and when that started in history. The light alone could be many things, so we also want to investigate movement and other data that may have helped the reporting individual classify this as a UFO vs aircraft or passing satellite. This data may reside in the narrative section of the report which indicates the possible need for some natural language processing.

The shape 'light' is most numerous in the sightings. The shape ‘light’ is most numerous in the sightings.

Locations

The data reports sightings globally. The vast majority of sightings are located in the United States (68,052 sightings). For this exploration we will filter the data to just those sightings in the continental United States.

Most sightings in data are located in the United States. Most sightings in data are located in the United States.

Below we examine these sightings over time plotting the sightings in the US since 1942. In this graph we can see the significant jumps in sightings starting in the 1960s and then again in the early 2000s. Despite what appears to be graphs of sightings accumulating over time, each graph is just for that year. The yearly growth in sightings climbs so rapidly we have the impression of new dots being placed down next to existing dots. In reality (as recorded) UFO sightings are just becoming more numerous just about everywhere in the country.

The map below shows all the sighting since 1947. We do see general clusters of sightings. At this point let’s start to consider some possible correlations. Some clusters in the sighting align with high commercial air traffic airports and routes. These commercial hubs, and the routes to/from may offer some explanation for higher concentrations of sightings.

24 hours of flights over the United States in 2013 (t.carrico) 24 hours of flights over the United States in 2013 (t.carrico)

Locations of all sightings and airports in CONUS.

Locations of all sightings and airports in CONUS.

Plotting these as density plots we can see similar locations around airports with UFO sightings.

Locations of all sightings and airports in CONUS.

Locations of all sightings and airports in CONUS.

To examine similarity in these locations we can plot density graphs of the latitude and longitude. Positional information will not be identical since the sighting and the airport would unlikely be in the same exact position - that is, people in the airports would not be reporting UFO sightings. The latitude distributions show the differences in North and South and the longitude distributions show difference in East and West. These distribution comparisons capture sightings near the airports and show similarity. This means: the positions for airports and UFO sightings are similar.

Distribution of all positions are similar for UFO sightings and airport locations.

Distribution of all positions are similar for UFO sightings and airport locations.

This evidence suggests a relationship between airport locations and UFO sightings. To formally quantify this, we will create a hypothesis:

There is no relationship between the UFO sightings and airport locations

A statistical approach will prove, or disprove this hypothesis. To frame this to be solved mathematically, we need to compare UFO sightings to sightings around the airports. Since we don’t have non-UFO reports of aircraft near airports, we will generate random points withing visual range of airports and compare them to the location of sightings. Then we apply the Chi-square test to determine if there is a statistically significant difference between the UFO sightings and locations near airports.

Busy Airports

The busiest airport in the US is the Hartsfield–Jackson Atlanta International Airport in Georgia (ICAO: KATL) with over 110 million passengers a year (c: 2019). Below we zoom in to the area surrounding this airport to compare UFO sightings in area. The plot below shows the UFO sightings around KATL and the airports. Large metropolitan areas tend to have many airports nearby. This area also has military, delivery and regional maintenance airports.

Aircraft at making their approach to airports line up and turn on landing lights in the US at 10,000 ft. These lights can been seen from the ground up to 110 km.

## [1] "heliport"       "small_airport"  "closed"         "seaplane_base" 
## [5] "balloonport"    "large_airport"  "medium_airport"
KATL regional airports and UFO sightings.

KATL regional airports and UFO sightings.

The runways at KATL run East and West and most of the time the prevailing winds are from the West. This would mean the aircraft would be arriving (landing light on) from the East.

KATL airport chart showing runways running East and West. KATL airport chart showing runways running East and West.

In the density plots of the UFO sightings’ latitude and longitude below we can see the most sightings are within the blue shaded area. This area represents the viewing distance from ground to aircraft with landing lights turned on at 10,000 ft. The black dotted line indicates the location of KATL. The small tick marks are known as a rug plot and show us the actual data. The black on top is the location of the airports and the red on the bottom axis represents the locations of the sightings. We note the highest numbers of sightings are North and East of the airport within the viewing range.

Most sightings around KATL are within visual range of the airport

Most sightings around KATL are within visual range of the airport

We can view these distributions in 2D on the map. Below we see that most of the UFO sightings are near KATL. The other large airports in the area also have clusters of UFO sightings. Interestingly, we can even see some line patterns resembling airport approaches. Eglin Air Force Base (ICAO: KVPS), Florida is located in the lower left of this graph and runway 1/19 runs along the UFO sightings lined up. We also note that some airports, like KVPS, do not have Standard Arrival Routes (STAR) or Standard Instrument Departures(SID) which would account for more straight line approaches. By contrast, KATL has heavy traffic requiring distributed arrival and departure routing all around the airport.

Most sightings around KATL are within visual range of the airport

Most sightings around KATL are within visual range of the airport

KATL airport chart showing runways running East and West. KATL airport chart showing runways running East and West.

We next look at shapes reported through the years. Since the data is so skewed with the vast majority of sightings in recent years, we will do a log transformation on the count. We do this since we are interested in relative reporting instead of focusing on the increase in sightings which would fully bury these patterns. In this way, we can see some bumps where a particular shape was more common than others. Two graphs are shown below to demonstrate how the log transformation reveals patterns.

Log transformation of shape counts reveals patterns.

Log transformation of shape counts reveals patterns.

A few of the shapes had bumps over the years. In particular there is a bump in some shapes from 1960 to 1990. We can see the shapes cigar,circle and disk all have a similar pattern in the years with a center around 1975.

Next we can look at the time of day. Most of the sightings occur around 8 pm in the locally reported time zone. A histogram shown in the right margin confirms that most of the sightings occur at night.

Most sightings occur at night. Most sightings occur at night.

Data Bias

How data is collected impacts analysis. The reporting mechanisms since 1561 have changed dramatically. Johannes Gutenberg’s (c. 1398-1468 CE) development of the printing press revolutionized how people capture and share information in printed form.

In the UFO data we see dramatic changes in report volume, and these reports happening in the United States where there is ease of communication with very little restriction or consequence for sharing thoughts - which can be good thing.

The adoption of the telephone, and then internet has vastly facilitated ease in reporting UFOs. We can imagine the effort and potential consequence of reporting a UFO in the 1600s in places like Salem, Massachusetts. Data collection bias is definitely at play in all data analyses and we can expect this to be present in the UFO data.

We can also look at the adoption of cellphones and the internet to compare it to reports.

The ability to access a telephone for reporting The ability to access a telephone for reporting

The correlation of the adoption of the cell phone to the number of UFO reports is 9.84782718337954e-15. The correlation of the internet to UFO reports is 4.70051224702121e-15. These are considered highly correlated.

Cell phone adoption is highly correlated with the increase in reported UFO sightings

Cell phone adoption is highly correlated with the increase in reported UFO sightings

Summary

This article explored UFO report data. We are always cautioned about correlations not being equivalent to causation. But correlations are always a good place to start in an effort to explain unknown phenomena. The sightings are highly correlated with the increases in access and ease of reporting. We also noted strong correlations of sightings around airports.

We don’t have reports of known flying object to compare to the unknowns but estimates on how many aircraft (for example) could be seen can be compared to unknown sightings for further statistical and creation of data models. With these approaches, we should be able to calculate the probability of seeing a known object as compared to a UFO.

What is unknown to someone, isn’t neccessarily unknown to others.

Previous
Previous

words

Next
Next

u-boats