Disclaimer: This article is intended to provide an objective analysis of climate data. It neither disputes nor debates the existence of climate change but rather demonstrates how data can be utilized to explore trends and answer important questions.
Let’s talk Climate Change!
Global warming and climate change is an age-old discussion. It has been a hot topic for many years. Over the last two decades, we have experienced hotter summers, colder winters, harsh acts of God, and wanky temperatures. All of those issues are always blamed on climate change.
Undeniably, this is a hot subject, as many people still believe that climate change is a hoax. This article is not intended to defend or deny a belief. Instead, it is an exercise to demonstrate how data can be utilized to explore answers and trends.
Without further ado, let’s start our Python instance to answer the questions.
Acquiring Data for Analysis¶
Acquiring the data was a little tricky at first. I chose to use NOAA. To get the data I wanted, it took me a few tries as I had to fiddle with the controls. You can find most of the climate data here: https://www.ncei.noaa.gov/cdo-web/datasets.
For this project, I selected the daily summaries in Atlanta. I will walk through step by step how to acquire the same data set I have.
On the next page, you can select the dates and search for Stations. That is the easiest way to gather data. Searching by state or anything larger makes it very difficult to extract information as NOAA restricts how many stations you can pull information from.
On the next screen, I selected daily summaries for the climate data ranging from 1930 to 2025-01-13. On the search term, I found easier to go by stations as there is a limit of stations you can pull from. On the search term, I added the city of my interest, in this case, Atlanta.
To gather that much information, you could select different stations that cover the dates you are searching. I found that the airport is usually a long-stating station and will have a wide range of dates. For Atlanta, the Hartsfield-Jackson Atlanta Airport had the full amount of data.

On the next screen, select Custom GHCN-Daily CSV and make sure that your dates align with what you are looking for and hit continue.

Finally, select air temperature, which will provide all the information we need for this analysis. On the next screen, put your email address and wait for the email to let you know the data is ready.
Analyzing the Data – Atlanta¶
The first step is to import all of the packages you will need.
import pandas as pd
import requests
from io import StringIO
import matplotlib.pyplot as plt
Most of the times, I would download the data and import the csv
into a data frame. But in this case, I felt it would be better to extract the live data without the need to download into my folders first. The link to the dataset is provided on the email that you receive from NOAA. It is important to save the data as the link will expery in a few days.
# this link was in the email received when data was ready
url = 'https://www.ncei.noaa.gov/orders/cdo/3906220.csv'
response = requests.get(url)
csv_data = StringIO(response.text)
atlanta = pd.read_csv(csv_data)
#saving the data to harddrive
atlanta.to_csv("atlanta30-25.csv")
atlanta.head()
STATION | NAME | DATE | PSUN | TAVG | TMAX | TMIN | TSUN | |
---|---|---|---|---|---|---|---|---|
0 | USW00013874 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1930-01-01 | NaN | NaN | 64 | 45 | NaN |
1 | USW00013874 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1930-01-02 | NaN | NaN | 67 | 49 | NaN |
2 | USW00013874 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1930-01-03 | NaN | NaN | 54 | 31 | NaN |
3 | USW00013874 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1930-01-04 | NaN | NaN | 49 | 27 | NaN |
4 | USW00013874 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1930-01-05 | NaN | NaN | 49 | 33 | NaN |
Now that we have the data in a data frame, we will run a light exploratory data analysis (EDA) to determine what we are working with.
atlanta.shape
(34710, 8)
atlanta.isna().sum()
STATION 0 NAME 0 DATE 0 PSUN 27783 TAVG 27730 TMAX 0 TMIN 0 TSUN 21120 dtype: int64
For this project, we are only interested in the TMAX
, TMIN
, TAVG
. We can see that we have a number of TAVG
that is null and will need to be addressed.
Wrangling the Data¶
# creating a cleaner dataset
atlanta = atlanta[['NAME','DATE', 'TMAX', 'TMIN','TAVG']]
# checking dtypes
atlanta.dtypes
NAME object DATE object TMAX int64 TMIN int64 TAVG float64 dtype: object
We want to gather a yearly average. For that we will ensure the dates are in datetime
format. Then, we will create a column with the year.
# passing date to date
atlanta['DATE'] = pd.to_datetime(atlanta['DATE'])
# creating a year column
atlanta['Year'] = atlanta['DATE'].dt.year
To address the missing values in the average, we will replace the empty values with the average of the day temperature. In the process, I noticed that some of the average values were 0
. Those did not show in the EDA, but it came to my attention when analyzing the data. Noticing this error raised suspicions over the quality of the data.
Several methods can be used to address this problem. One way, and the most direct way is to completely remove the TAVG
variable and insert a new one by summing TMAX
and TMIN
and divide it by 2. I chose to go a more complicated way.
First, I will fill the nan
. Then we will create a lambda filter that will check if the result is correct. If it is correct, it will do nothing, otherwise
# filling the averages that are blank
atlanta['TAVG'] = atlanta['TAVG'].fillna((atlanta['TMAX'] + atlanta['TMIN'])/ 2)
# replacing zeros and checking if average is correct
atlanta['TAVG'] = atlanta.apply(lambda row: (row['TMAX'] + row['TMIN']) / 2
if row['TAVG'] != (row['TMAX'] + row['TMIN']) / 2 else row['TAVG'],
axis=1)
The code above ensure that all the averages are correct. Now we will sort the dataframe ascending on the TAVG
, allowing us to see the lowest temperatures first
atl_sorted = atlanta.sort_values(by='TAVG', ascending=True)
atl_sorted.head(20)
NAME | DATE | TMAX | TMIN | TAVG | Year | |
---|---|---|---|---|---|---|
20108 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1985-01-21 | 18 | -8 | 5.0 | 1985 |
13177 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1966-01-30 | 15 | -3 | 6.0 | 1966 |
19715 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1983-12-25 | 17 | 0 | 8.5 | 1983 |
19002 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1982-01-11 | 23 | -5 | 9.0 | 1982 |
12075 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1963-01-24 | 23 | -3 | 10.0 | 1963 |
12032 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1962-12-12 | 15 | 5 | 10.0 | 1962 |
7632 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1950-11-25 | 17 | 3 | 10.0 | 1950 |
3677 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1940-01-26 | 18 | 4 | 11.0 | 1940 |
19001 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1982-01-10 | 24 | -2 | 11.0 | 1982 |
24139 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1996-02-04 | 18 | 7 | 12.5 | 1996 |
17182 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1977-01-17 | 25 | 1 | 13.0 | 1977 |
10273 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1958-02-17 | 22 | 5 | 13.5 | 1958 |
2221 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1936-01-31 | 23 | 5 | 14.0 | 1936 |
17184 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1977-01-19 | 27 | 1 | 14.0 | 1977 |
3678 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1940-01-27 | 23 | 5 | 14.0 | 1940 |
14616 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1970-01-08 | 23 | 6 | 14.5 | 1970 |
21905 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1989-12-23 | 22 | 8 | 15.0 | 1989 |
12033 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1962-12-13 | 29 | 1 | 15.0 | 1962 |
14617 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1970-01-09 | 28 | 2 | 15.0 | 1970 |
3676 | ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO… | 1940-01-25 | 21 | 9 | 15.0 | 1940 |
As 2025 has just started, we will remove it. The next step we will get the mean temperature of each year.
# removing 2025
atl = atlanta[atlanta['Year'] < 2025]
# grouping average by year
yearly_avg_temp = atl.groupby('Year')['TAVG'].mean().reset_index()
yearly_avg_temp.head()
Year | TAVG | |
---|---|---|
0 | 1930 | 61.534247 |
1 | 1931 | 63.038356 |
2 | 1932 | 62.286885 |
3 | 1933 | 63.305479 |
4 | 1934 | 61.361644 |
Plotting the Data¶
Now, let’s plot the
# Plotting the yearly average temperature trend
plt.figure(figsize=(12,6))
plt.plot(yearly_avg_temp['Year'], yearly_avg_temp['TAVG'], marker='o', linestyle='-', color='orange')
plt.title('Yearly Average Temperature in Atlanta (1930-2024)')
plt.xlabel('Year')
plt.ylabel('Average Temperature (°F)')
plt.grid(True)
plt.tight_layout()
plt.show()
Undoubtedly, the temperatures are going up over the years. We can see that it has been a steady increase since 1980s. This peaked my curiosity, and I decided to look at another location. I chose to go with Minneapolis, which is significant colder than Atlanta.
Analyzing the Data – Minneapolis¶
The process will be similar. Except the data available from Minneapolis is from 1939 to today. I selected the dates from January 1, 1939 to December 31, 2024. In this process, we will ensure that all average data is correct and the formats are how they are supposed to be. Once that is complete, we will
# this link was in the email received when data was ready
url = 'https://www.ncei.noaa.gov/orders/cdo/3906231.csv'
response = requests.get(url)
csv_data = StringIO(response.text)
minneapolis = pd.read_csv(csv_data)
#saving the data to harddrive
minneapolis.to_csv("minneapolis39-25.csv")
minneapolis.head()
STATION | NAME | DATE | TAVG | TMAX | TMIN | |
---|---|---|---|---|---|---|
0 | USW00014922 | MINNEAPOLIS ST. PAUL INTERNATIONAL AIRPORT, MN US | 1938-04-09 | NaN | 51 | 29.0 |
1 | USW00014922 | MINNEAPOLIS ST. PAUL INTERNATIONAL AIRPORT, MN US | 1938-04-10 | NaN | 66 | 34.0 |
2 | USW00014922 | MINNEAPOLIS ST. PAUL INTERNATIONAL AIRPORT, MN US | 1938-04-11 | NaN | 63 | 43.0 |
3 | USW00014922 | MINNEAPOLIS ST. PAUL INTERNATIONAL AIRPORT, MN US | 1938-04-12 | NaN | 71 | 37.0 |
4 | USW00014922 | MINNEAPOLIS ST. PAUL INTERNATIONAL AIRPORT, MN US | 1938-04-13 | NaN | 78 | 46.0 |
Though my request was to end on 2024-12-31, I checked and noticed the end date is on 2025-01-12 and the earliest day was 1938-04-09. Thus, the command to remove anything over 2025 will remain along with adding a command to remove anything below 1939.
minneapolis['DATE'].max()
Timestamp('2025-01-12 00:00:00')
minneapolis['DATE'].min()
Timestamp('1938-04-09 00:00:00')
# passing date to date
minneapolis['DATE'] = pd.to_datetime(minneapolis['DATE'])
# creating a year column
minneapolis['Year'] = minneapolis['DATE'].dt.year
minneapolis.dtypes
STATION object NAME object DATE datetime64[ns] TAVG float64 TMAX int64 TMIN float64 Year int32 dtype: object
minneapolis.isna().sum()
STATION 0 NAME 0 DATE 0 TAVG 24714 TMAX 0 TMIN 1 Year 0 dtype: int64
# filling the averages that are blank
minneapolis['TAVG'] = minneapolis['TAVG'].fillna((minneapolis['TMAX'] + minneapolis['TMIN'])/ 2)
# replacing zeros and checking if average is correct
minneapolis['TAVG'] = minneapolis.apply(lambda row: (row['TMAX'] + row['TMIN']) / 2
if row['TAVG'] != (row['TMAX'] + row['TMIN']) / 2 else row['TAVG'],
axis=1)
# removing 2025
minn = minneapolis[(minneapolis['Year'] < 2025) & (minneapolis['Year'] > 1938)]
# grouping average by year
minn_yearly_avg_temp = minn.groupby('Year')['TAVG'].mean().reset_index()
minn_yearly_avg_temp.head()
Year | TAVG | |
---|---|---|
0 | 1939 | 46.489011 |
1 | 1940 | 44.329235 |
2 | 1941 | 47.638356 |
3 | 1942 | 46.019178 |
4 | 1943 | 43.995890 |
minn_yearly_avg_temp['Year'].max()
2024
# Plotting the yearly average temperature trend
plt.figure(figsize=(12,6))
plt.plot(minn_yearly_avg_temp['Year'], minn_yearly_avg_temp['TAVG'], marker='o', linestyle='-', color='blue')
plt.title('Yearly Average Temperature in Atlanta (1938-2024)')
plt.xlabel('Year')
plt.ylabel('Average Temperature (°F)')
plt.grid(True)
plt.tight_layout()
plt.show()
Combining Both Datasets in a Graph¶
plt.figure(figsize=(12, 6))
plt.plot(yearly_avg_temp['Year'], yearly_avg_temp['TAVG'], label='Atlanta', marker='o', linestyle='-', color='orange')
plt.plot(minn_yearly_avg_temp['Year'], minn_yearly_avg_temp['TAVG'], label='Minneapolis', marker='o', linestyle='-', color='blue')
plt.title('Yearly Average Temperature Atlanta vs Minneapolis')
plt.xlabel('Year')
plt.ylabel('Average Temperature (°F)')
plt.grid(True)
plt.tight_layout()
plt.show()
Conclusion¶
Data visualization is a powerful tool for making sense of climate change and sharing its impact with the world. Python’s rich ecosystem of libraries—like Pandas, Matplotlib, Seaborn, and Plotly—makes it easier to turn complex climate data into engaging and informative visuals.
There is no much that can be done with this powerful tool.