TABLEAU | Hiking Trails analysis
About the project
As a hiker, I enjoy exploring different hiking trails near my city or in national parks. From the new type of plants to bugs or new views at the top, I always learn something new every time I hike. Therefore, I decided to combine my passion for data and my love for hiking with this project. I analyze one of the most famous national parks which spreads from Washington to Oregon: Columbia River Gorge National Park.In this project, I mainly focus on data cleaning using SQL to prepare for my analysis. To view my detailed analysis using TABLEAU, click here.
Data Quality Check
Column names are not MySQL-friendly.
Null values in family-friendly and backpackable columns
Inconsistent data type for trail type, difficulty_level, distance, high point, elevation gain columns
Data Cleaning and Manipulation Process
Since SQL column names are case sensitive, I changed them into SQL friendly format.
Then, I use CASE WHEN to categorize data into 7 types of hiking trails. After that, I used similar syntax to categorize difficulty levels into easy, moderate, and difficult.
After that, distance column includes both numerical value and text value, I use Text functions to extract the numeric value of the hike distance.
For null values, I changed them to "No records" using the UPDATE function.
With the Family-friendly and Backpackable column, I used CASE WHEN to change the value into Yes/No (Boolean Data Type) for later analytics.
Then I added one additional column to provide information about whether the hiking trail is available all year or not.
After cleaning and manipulating data, I double-checked if duplicates exist and drop unnecessary columns.
Key takeaway from the dataset
The most challenging trail is Wahtum Lake Via Ruckel Creek Hike. The easiest trails are Sames Walker Loop Hike, Hood River Waterfront Hike, and Fort Cascades Loop Hike
108 trails are available all year long, which accounts for 62.79% of the total trails in Columbia River Gorge National Park.
Trails that are suitable for families are usually easy trails that are family-friendly, available all year, and not crowded. There are 16 trails that satisfy those criteria with 14 loop trails and 12 out-and-back trails. Those include Ainsworth Loop Hike, Balfour - Klickitat Loop Hike, Buck Creek on Larch Mountain Hike, and other trails.
There are 41 trails that are suitable for backpackers with different level of difficulties and trail types.
Tools used
SQL for data quality check, cleaning, and manipulation
CASE WHEN and UPDATE to manipulate data
Window functions to detect duplications
Text functions (LIKE, LEFT, POSITION) to extract data
Check out my detailed analysis:
Interact with my dashboard here:
Contact me