TABLEAU | Hiking Trails analysis

Photo

About the project

As a hiker, I enjoy exploring different hiking trails near my city or in national parks. From the new type of plants to bugs or new views at the top, I always learn something new every time I hike. Therefore, I decided to combine my passion for data and my love for hiking with this project. I analyze one of the most famous national parks which spreads from Washington to Oregon: Columbia River Gorge National Park.In this project, I mainly focus on data cleaning using SQL to prepare for my analysis. To view my detailed analysis using TABLEAU, click here.


Data Quality Check

  1. Column names are not MySQL-friendly.

  2. Null values in family-friendly and backpackable columns

  3. Inconsistent data type for trail type, difficulty_level, distance, high point, elevation gain columns


Data Cleaning and Manipulation Process

Since SQL column names are case sensitive, I changed them into SQL friendly format.

Photo

Then, I use CASE WHEN to categorize data into 7 types of hiking trails. After that, I used similar syntax to categorize difficulty levels into easy, moderate, and difficult.

Photo

After that, distance column includes both numerical value and text value, I use Text functions to extract the numeric value of the hike distance.

Photo

For null values, I changed them to "No records" using the UPDATE function.

Photo

With the Family-friendly and Backpackable column, I used CASE WHEN to change the value into Yes/No (Boolean Data Type) for later analytics.

Photo

Then I added one additional column to provide information about whether the hiking trail is available all year or not.

Photo

After cleaning and manipulating data, I double-checked if duplicates exist and drop unnecessary columns.

Photo

Key takeaway from the dataset

  1. The most challenging trail is Wahtum Lake Via Ruckel Creek Hike. The easiest trails are Sames Walker Loop Hike, Hood River Waterfront Hike, and Fort Cascades Loop Hike

  2. 108 trails are available all year long, which accounts for 62.79% of the total trails in Columbia River Gorge National Park.

  3. Trails that are suitable for families are usually easy trails that are family-friendly, available all year, and not crowded. There are 16 trails that satisfy those criteria with 14 loop trails and 12 out-and-back trails. Those include Ainsworth Loop Hike, Balfour - Klickitat Loop Hike, Buck Creek on Larch Mountain Hike, and other trails.

  4. There are 41 trails that are suitable for backpackers with different level of difficulties and trail types.


Tools used

SQL for data quality check, cleaning, and manipulation

  • CASE WHEN and UPDATE to manipulate data

  • Window functions to detect duplications

  • Text functions (LIKE, LEFT, POSITION) to extract data

Check out my detailed analysis:

Interact with my dashboard here:


Contact me