Christmas Data Special 2024
Manage episode 456675271 series 2805499
It’s that time of year again, we we try scratch up some open data on the Canadian Christmas experience! Some good ones this year, along with a special guest from Niagara college.
Christmas Ice
By: Doug Sartori
The holiday season in Canada is a time for family, feasting and fun. Many Canadian children wait expectantly for Santa Claus to leave his North Pole workshop on December 24th and criss-cross the world delivering toys to all of the good boys and girls. Despite NORAD propaganda, all good Canadians know that sleighs are exclusively surface vehicles, even the reindeer-propelled kind.
With this in mind, I considered Santa’s likely overland route to Canada. A glance at an Arctic map from the CIA World Fact Book shows that the obvious starting point for Santa’s Canadian travels is the town of Alert on Ellesemere Island, the world’s northernmost town.
To understand the risks to Santa’s overland route, I looked at data on sea ice in Canada’s Eastern Arctic (which includes the stretch north of Ellesemere which Santa certainly must cross). The Canadian Ice Service produces regional shapefiles covering spatial trends in ice extent. They also publish these charts in GIF form, which look like this:
These shapefiles encode data in a format called SIGRID-3. SIGRID-3 is a vector format designed by the Ice Charting Working Group, who are a coalition of the world’s ice research centers. The format was developed in 1981 and formalized in 1989, with SIGRID-3 being the third revision. The shapefile’s data table must have 17 mandatory fields along with 38 optional fields. These data fields describe the nature and extent of various ice types in the geography indicated. For our purposes, we only need the field “CT” which indicates the total concentration of ice. To calculate area, I used the shapefile’s geometry, reprojected to the Canadian Lambert conformal conic map projection for accuracy and consistency.
There were some anomalous ice years in the early part of this century, but for the most part the extent of ice in December in the Eastern Arctic continues to be maximal. Looks like Santa can do a 72-month payment plan on that new electric sleigh without worrying that he’ll end up going for a swim on his way to Canada.
That’s the Canadian avenue of approach taken care of, but what about the rest of the Arctic circle? For that, I accessed data from the National Snow and Ice Data Center (NSIDC) FTP archive, which provides monthly records going back to 1979. This data is a simple CSV and doesn’t require much fancy work to visualize, so I added a linear regression for a simple straight-line projection, as well as a polynomial regression for a more nuanced projection, out to 2050. This analysis provides mathematical approximations of Arctic ice loss using regression models but does not account for physical climate processes like ocean currents, feedback loops, or emissions scenarios, which are included in more sophisticated climate models. There is a data anomaly in the late ‘80s which is indicated on the chart. This chart is a little less rosy and perhaps Santa will need to consider other transportation options over the next 25 years or so.
Python is great for this sort of work. Tools used in the analysis included:
- pandas for data cleaning and processing.
- geopandas for spatial data extraction and area calculations.
- matplotlib for visualization of both time series and spatial data trends.
- scikit-learn for linear and polynomial regression modeling.
You can find the source code and instructions to generate these charts on Github.
Student Mean Median Moose Segment
By: Lee Doucet
I’m a professor at Niagara College in their Business Analytics program, focusing my teaching mostly on Data Visualization and Communication, having designed and created the course for the college. As a practitioner of open data, I embed open data in my course to move students beyond textbook data and let them see real challenges with working with open data. For instance, last semester, Niagara College partnered with Living Lakes Canada and learnt firsthand how important it is for adequate funding to ensure there is consistency in data collection year over year for accurate reporting. This is something you can’t see with data that is already cleaned and transformed.
This semester I wanted to focus on housing as the cost of living is very topical. I want students to be able to investigate complex topics and produce data driven insights. When I then observed the Mean, Medium, and Moose’s methodology which emphasizes reproducibility when breaking down complex topics to present them in an accessible and approachable way, a light bulb went off. This is a highly valuable analytical skill, I want the students to be able to demonstrate their knowledge while being able to communicate it to someone on their team, such as a colleague or a supervisor, who may not have the same technical abilities to work with data.
Often, I worry that society puts too much emphasis on the results of people’s work and not the process. I see this through busy people flipping through to the policy recommendations, skimming the report, or just reading an executive summary only. To me, the process itself is an opportunity to engage people with how you did the work and understand if you made the correct decisions and pivots based what information was available to you. Not only that, once students correctly understand how workflows operate, they can begin to transfer those skills to their teams in the Workplace.
All of this is lead to my interest in working with Doug from Mean, Medium, and Moose to give the students an opportunity to try something different. Housing is not an easy topic at all by any means, and over the course of several weeks, students with busy lives and jammed semesters, got together in 8 teams and produced some great reports with a supplemental dashboard on housing. They gathered data from a multitude of sources, including StatsCAN and the Canadian Mortgage and Housing Corporation, all of which are available to anyone who wants to explore Canadian Housing data. I learnt new things while reading the reports which to me is a hallmark of success. I also saw things I expected which is the struggles of affordability. In reference to that, the winning group called their project “Housing, Hurdles, and Hope”, which if you think about it, how many Canadians unable to afford the hurdles of acquiring a house in this market are relying on hope as their strategy?
Here’s a link to the report.
NFB Christmas short film color profiles
By: John Haldeman
The National Film Board of Canada is a federal government organization that funds, produces and distributes Canadian films. They produce mostly short films and documentaries related to Canada or by Canadian filmmakers. Almost all of their films are free to view on their website. Keeping with my recent theme of attempting to explore ways to visualize unstructured data, I wanted to see if I could use the films themselves as a source of data. What I ended up doing is creating visualizations representing a sort of color profile for each film taking inspiration from a project called “A viz of Ice and Fire” which was a master’s degree project in Georgia Tech’s CS 7450 course. Those students did much more than just extract the predominant colors in frames captured from Game of Thrones, but that is the part I copied for this analysis.
Here’s the result for twelve selected Christmas or winter themed NFB short films:
Many of these films are provoking slices of Canadiana. “The Nativity Cycle” is a film of a play put on by elementary school children in the 1950’s exploring dimensions of the story of Christ’s birth with surprising complexity and high production values. “Teach Me to Dance” is a Christmas story involving Albertan Ukrainians in 1919. “Christmas at Moose Factory” is a film entirely made up of Cree children’s drawings from a residential school in Moose Factory on the Hudson Bay. It explores life in Ontario’s North through the eyes of children in 1971 during Christmas. It’s quite an array of what is now a collection of thousands of films.
Just for fun, here are my favourite NFB short films and their color profiles:
The most striking is “The Cat Came Back” which careens through different landscapes while the protagonist desperately attempts to flee his feline pursuer.
The home of Reindeer in Canada
By: Andy Dyck
Rudolph the red-nosed Reindeer back a part of Christmas holiday lore with an appearance in a booklet written by Robert L. May for holiday booklets to be distributed to customers of the Montgomery Ward department store in Chicago. Now, what I don’t know about reindeer could fill volumes, including the fact that reindeer and caribou are indeed refer to the same species. That said, I’m certain that with Rudolph hitting 85 years since his introduction, Santa might need to start looking for a more youthful lead for his team. In order to help out our friend Kris Kringle, I embarked on a journey to find the habitat in Canada where he’d be most likely to find a replacement.
I started my search for caribou in the most logical place – the Government of Canada Open Data Portal. I found 63 provincial records and 25 federal records to choose from. In order to get a national view, I narrowed my search to only federal records and further filtered down the list to only those records that would have spatial data including GDB, CSV, SHP, and XML. I found this dataset concerning the range of species at risk in Canada that includes the habitat range of Reindeer (Caribou) in Canada.
The format of this dataset is in GDB and I used the {sf} package in R to quickly read this dataset before doing some further analysis. I honestly can’t say enough good things about the {sf} package for R. Like geopandas for python, this package not only handles the complexity of reading/writing geographic datasets, but it also makes geographic transformations and calculations so easy that one can really focus on the analysis with needing to get too deep into the details or complexity. The dataset was fairly clean and the only thing it needed in order to make a nice looking map was a base map of Canada’s boundaries in the background and I grabbed that quickly using the {rnaturalearth} package for R and this is the result.
I calculated the total area that reindeer (caribou) range covered and compared that to the total area of Canada in order to make a catchy title for the map that highlights that over 80% of Canada’s landmass is considered territory for Santa’s sleigh team. This is great news for Santa as he shouldn’t have too much trouble finding more reindeer in Canada.
Out of curiosity, I wanted to know which province in Canada would have the largest share of it’s area covered by reindeer habitat. Nova Scotia, New Brunswick, and Prince Edward Island look to be out of the running from the start, but for the other provinces, I’d need to calculate the intersection of each province and the reindeer ranges first and then do the calculation of the percent of each province covered by reindeer habitat. This analysis produced the following bar chart showing us that at 56.7%, Manitoba has the largest share of its territory covered by reindeer habitat.
Bottom line:
- I continue to find the combination of the {sf} and {ggplot2} package in R to be great to quick spatial analysis and plotting. I’d love to try doing this analysis using geopandas at some point.
- The range of Reindeer (Caribou) in Canada is absolutely huge. I’d love to learn and understand more about the species and populations in Canada and how this range does or does not overlap with Canada’s human population.
- Check out and repeat this analysis by following along with code in the GitHub repository.
10集单集