πŸ–₯️ Week 03 | Day 03
Building Your Project Website

ME204 – Data Engineering for the Social World

Dr Jon Cardoso-Silva

30 July 2025

πŸ“¦ Final Project: Communication Strategy

  • Technical skills are not enough, you need to communicate your findings well.
  • Today, you will learn how to build and curate your own website for data storytelling.

The Three Audiences

We want you to exercise your communication skills at communicating data-driven insights to different audiences. In your πŸ“¦ Final Project, you will need to think of three hypothetical audiences for your project:

The Three Audiences

πŸ“„ README.md

Technical Colleagues

Other data scientists who might want to reproduce your work

πŸ““ notebooks/

Data Analysts

Technical professionals who understand Python, pandas, and SQL.

🌐 docs/index.md

General Public

Educated readers without any particular technical background.

The Three Audiences

πŸ“„ README.md

Technical Colleagues

What they need to know:

  • How to set up their Python environment
  • What API credentials they need
  • Which scripts to run and in what order

πŸ““ notebooks/

Data Analysts

What they need to know:

  • Why you chose specific filtering methods
  • What decisions you made and alternatives considered
  • How to follow your reasoning and reproduce results

🌐 docs/index.md

General Public

What they need to know:

  • Why they should care about your findings
  • What you discovered and what it means
  • How to understand your results without knowing code

The Three Audiences

πŸ“„ README.md

You will find a README template on πŸ–₯️ Week 03 Day 02 lecture page.

πŸ““ notebooks/

You will find notebook templates on πŸ–₯️ Week 03 Day 02 lecture page.

🌐 docs/index.md

General Public

Today’s Focus: Building Your Website

Learn to create beautiful, accessible data stories that anyone can understand and engage with.

Setting Up GitHub Pages

  • Since we are hosting our data project on GitHub, we can use GitHub Pages to host our website.

What is GitHub Pages?

GitHub Pages is a static site hosting service that takes files from a GitHub repository, runs them through a build process, and publishes a website.

Key Benefits:

  • Free hosting for your data science portfolio/project
  • Automatic deployment from your repository
  • Professional URLs
    (username.github.io/repository-name)
  • Version control integration
  • Custom domain support

What do I need to set it up?

Enable GitHub Pages:

  • Go to your repository’s Settings tab
  • Navigate to Pages section
  • On the β€œSource” dropdown, select β€œDeploy from a branch”

What do I need to set it up? (cont.)

Point to the docs/ folder:

  • On the β€œSource” dropdown, select β€œDeploy from a branch”
  • Select the main branch
  • On the β€œFolder” dropdown, select /docs
  • Click β€œSave”

Creating your website

  • Now that you have your project structure set up, you can start creating your website.

Setting Up Your Project Structure

Does your project structure look like this?

  • You must have a docs/ folder
  • You must add a index.md file to the docs/ folder

Note: The docs/index.md file can be empty for now. We’ll add content to it in the next step.

Your Project Structure:

your-me204-project/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ reddit.db
β”‚   └── raw/
β”‚       β”œβ”€β”€ comments.json
β”‚       β”œβ”€β”€ posts.csv
β”‚       └── subreddits.json
β”œβ”€β”€ docs/
β”‚   └── index.md                      ← πŸ†• ADD THIS FILE
β”œβ”€β”€ .gitignore                        ← Ignore files
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ NB01-data-collection.ipynb    ← Saves JSON/CSV files
β”‚   β”œβ”€β”€ NB02-data-processing.ipynb    ← Creates SQLite database
β”‚   └── NB03-analysis.ipynb           ← Exploratory analysis
β”œβ”€β”€ README.md          ← High-level description and reproducibility
└── requirements.txt   ← Python dependencies (optional)

Creating Your First docs/index.md

Your Project Structure:

your-me204-project/
β”œβ”€β”€ README.md          ← For technical colleagues
β”œβ”€β”€ notebooks/         ← For data analysts
β”‚   β”œβ”€β”€ NB01.ipynb
β”‚   └── NB02.ipynb
└── docs/
    └── index.md       ← For general public

Key Points:

  • docs/index.md is just a regular markdown file
  • Same format as your README.md file
  • This becomes your website’s homepage
  • Write it for your general public audience

You can mix markdown and HTML
(the language of the Web that we saw yesterday)

Example docs/index.md content:

# My Reddit Analysis Project

This project, which I did for [ME204](https://lse-dsi.github.io/ME204/2025/), is about [...].

**What you will find in this website:**

1. **I discovered that [...].**
2. **Some other finding**
## Methodology & Justification

Because I was curious in understanding [...], I chose to collect data
from the following three subreddits: `r/AskReddit`, `r/explainlikeimfive`, and `r/AmItheAsshole`. 

I focused on [...this and that rankings...] because [reasons...].
I also collected **all** comments from the posts.

The diagram below illustrates how I collected and preprocessed the data.

![](./figures/data-flow.png)

## Findings

Isn't this following figure cool?

πŸ’‘ Pro-tip:

Just like in Jupyter Notebooks, you can mix markdown and HTML
(the language of the Web that we saw yesterday)

Example docs/index.md content:

# My Reddit Analysis Project

This project, which I did for  [ME204](https://lse-dsi.github.io/ME204/2025/), is about [...].

<div style="border:1px solid #000;padding:10px;margin-bottom:10px;border-radius:5px;">
  
<span style="font-weight:bold;font-size:1.1em">What you will find in this website:</span>

<ol>
  <li>I discovered that [...]</li>
  <li>Some other finding</li>
</ol>
</div>

## Methodology & Justification

Because I was curious in understanding [...], I chose to collect data
from the following three subreddits: `r/AskReddit`, `r/explainlikeimfive`, and `r/AmItheAsshole`. 

Checking Your Deployment Status

Go to the Actions tab:

  1. Navigate to your repository on GitHub
  2. Click on the Actions tab
  3. Look for β€œpages build and deployment” workflow
  4. Check if it shows a checkmark indicating success βœ…

Finding Your Live Website URL

Go back to Settings:

  1. Navigate to your repository’s Settings tab
  2. Click on Pages in the left sidebar
  3. Look for β€œYour site is live at”
  4. Click the Visit site button to open your website

Your URL will look like: https://username.github.io/repository-name

Note: It may take a few minutes for your changes to appear live.

Alternative way to see your website URL

⚠️ Common Mistakes and Misconceptions

Students often make these mistakes when building their websites. Let’s clear up the confusion.

What You See on GitHub (Built-in Markdown Rendering)

πŸ‘ˆ This is NOT your public website, but it’s what you see when you view any .md file directly on GitHub.

  • Shows when you view any .md file directly on GitHub
  • Just like how you see your README.md file
  • Uses GitHub’s own styling and layout
  • Only visible to people with access to the repository
  • Looks like an OK website, but it’s not your public site

Key point: This is just GitHub’s way of displaying markdown files nicely within their platform.

See the Difference? Your Actual Public Website

πŸ‘ˆ This is your actual public website:

  • Lives at https://username.github.io/repository-name
  • Anyone on the internet can visit it
  • Uses GitHub Pages’ rendering engine
  • Different styling and layout from GitHub
  • This is what you share with the world

Key point: This is your real website that you can share with anyone.

GitHub Pages Only Sees Files in the docs/ Folder

Your project structure:

your-me204-project/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ reddit.db
β”‚   └── raw/
β”‚       β”œβ”€β”€ comments.json
β”‚       β”œβ”€β”€ posts.csv
β”‚       └── subreddits.json
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ index.md
β”‚   └── figures/
β”‚       └── my-plot.png          ← βœ… WILL be rendered
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ NB01-data-collection.ipynb
β”‚   β”œβ”€β”€ NB02-data-processing.ipynb
β”‚   └── NB03-analysis.ipynb
β”œβ”€β”€ figures/
β”‚   └── another-plot.png         ← ❌ WON'T be rendered
β”œβ”€β”€ README.md
└── requirements.txt

Key point: Only files inside the docs/ folder are accessible to your GitHub Pages website.

βœ… This will work:

![](./figures/my-plot.png)

❌ This won’t work:

![](../figures/another-plot.png)

Understanding ./ and ../:

  • ./ means β€œcurrent folder” (the docs/ folder)
  • ../ means β€œparent folder” (the repository root)
  • GitHub Pages can only access files inside docs/, so ../ won’t work

Examples:

  • ./figures/plot.png = look in docs/figures/plot.png
  • ../figures/plot.png = look in figures/plot.png (outside docs/)

πŸ’‘ Useful HTML & CSS Resources

If you want to learn more about HTML and CSS, here are some useful resources:

🎨 CSS Styling

Learn to style your markdown content.

πŸ“ HTML Elements

Essential HTML tags for better structure.

πŸ€– GenAI chatbots

Be very precise when querying:
  • Pass your markdown
  • Describe where you want to style it
  • Explain how you want to style it

Practice: Adding Images to Your Website

  1. Locate one image from your project
    1. Or just grab a random one
  2. Add it to your website
  3. Commit and push your changes
  4. Check your live website to see the image

Time: 15 minutes

Exporting your NB03 images

When you are done creating plots for your project, you will need to export them as PNG or SVG files so you can add them to your website.

# After your plot code
plt.figure(figsize=(10, 6))

... your plot code ...

# Export the plot
plt.savefig('../docs/my-plot.png', 
            dpi=300, 
            bbox_inches='tight')
plt.close()

# Alternative: Save as SVG for better quality
plt.savefig('docs/my-plot.svg', 
            bbox_inches='tight')
plt.close()

In your NB03 notebook:

After creating your plot with matplotlib, add these lines to save it:

  • plt.savefig() saves the current figure
  • dpi=300 gives high quality
  • bbox_inches='tight' removes extra whitespace
  • plt.close() frees up memory

PNG vs SVG:

  • PNG: Raster format, good for photos and complex plots, smaller file size, fixed resolution
  • SVG: Vector format, perfect for charts and graphs, scales perfectly at any size, larger file size

Choose PNG for: Complex visualisations with many data points

Choose SVG for: Simple charts, graphs, and when you need crisp resolution

Styling summary tables

You don’t need to always go for a plot to show data results. You can also use a table.

It helps if you customise how it looks so it’s not just a boring screenshot!

Introduction to pandas Styler

Let’s give a bland DataFrame a makeover!

Step 1: The Bland Dataframe

Your basic pandas table:

At any point you can save a DataFrame to HTML.

The data is there, but it could be more attractive.

The code:

df.to_html('basic_table.html', index=False)

produces the table below.

Subreddit Posts Comments Avg_Score Engagement_Rate
AskReddit 152 1595 3.116167 0.778937
explainlikeimfive 229 1544 4.732352 0.682710
AmItheAsshole 142 621 4.202230 0.248637
todayilearned 64 966 4.416145 0.227277
relationship_advice 156 1738 3.041169 0.228383

Step 2: Basic Styling with CSS

What if I want to change how the table, as a whole, looks?

You can use df.style.set_properties() to apply CSS styling to your table.

# Apply basic CSS styling
styled_df = df.style.set_properties(**{
    'text-align': 'center',
    'font-size': '0.85em', # Reduces the font 85%
    'padding': '0.5em', # Adds space around the text
    'border': '1px solid #ddd' # Adds a border to the table
})

# Save the styled table
styled_df.to_html('basic_styled_table.html', index=False)
Subreddit Posts Comments Avg_Score Engagement_Rate
AskReddit 152 1595 3.116167 0.778937
explainlikeimfive 229 1544 4.732352 0.682710
AmItheAsshole 142 621 4.202230 0.248637
todayilearned 64 966 4.416145 0.227277
relationship_advice 156 1738 3.041169 0.228383

Step 3: Adding Visual Hierarchy

Now let’s make the headers stand out:

Use set_table_styles() to style specific parts of your table. Let’s make the headers bold and have a different background.

The skills you need here are similar to those I showed yesterday in the web scraping section. You’d need to know, for example, that th represents the table headers. and which CSS properties to use to style them.

# Add header styling
styled_df = df.style.set_properties(**{
    'text-align': 'center',
    'font-size': '0.85em',
    'padding': '0.5em',
    'border': '1px solid #ddd'
}).set_table_styles([
    {'selector': 'th', 'props': [
        ('background-color', '#f8f9fa'),
        ('font-weight', 'bold'),
        ('padding', '0.5em'),
        ('text-align', 'center')
    ]}
])
Subreddit Posts Comments Avg_Score Engagement_Rate
AskReddit 152 1595 3.116167 0.778937
explainlikeimfive 229 1544 4.732352 0.682710
AmItheAsshole 142 621 4.202230 0.248637
todayilearned 64 966 4.416145 0.227277
relationship_advice 156 1738 3.041169 0.228383

Step 4: Adding Colour and Data Visualisation

Let’s add a colour scale to highlight the data:

Use background_gradient() to add colour scales to numeric columns. This helps readers quickly spot patterns and compare values.

# Add colour gradients to numeric data
# Assume we also added the CSS styling from the previous step
styled_df = (
    styled_df
    .background_gradient(cmap='Blues', subset=['Posts', 'Comments'])
    .background_gradient(cmap='Greens', subset=['Avg_Score'])
    .background_gradient(cmap='Oranges', subset=['Engagement_Rate'])
)
Subreddit Posts Comments Avg_Score Engagement_Rate
AskReddit 152 1595 3.116167 0.778937
explainlikeimfive 229 1544 4.732352 0.682710
AmItheAsshole 142 621 4.202230 0.248637
todayilearned 64 966 4.416145 0.227277
relationship_advice 156 1738 3.041169 0.228383

Step 5: Professional Polish

Final touches for a professional look:

Add zebra striping, hover effects, and modern styling.

This creates a table that looks like it belongs on a professional website.

# Professional styling with all features
styled_df = (
    styled_df
    .set_table_styles([
        {'selector': 'th', 'props': [
            ('background-color', '#f8f9fa'),
            ('font-weight', 'bold'),
            ('padding', '0.5em'),
            ('border-radius', '0.25em 0.25em 0 0')
        ]},
        {'selector': 'tr:nth-child(even)', 'props': [
            ('background-color', '#fafafa')
        ]},
        {'selector': 'tr:hover', 'props': [
            ('background-color', '#f0f8ff')
        ]}
    ])
)
Subreddit Posts Comments Avg. Score Engagement Rate
AskReddit 152 1595 3.116167 0.778937
explainlikeimfive 229 1544 4.732352 0.682710
AmItheAsshole 142 621 4.202230 0.248637
todayilearned 64 966 4.416145 0.227277
relationship_advice 156 1738 3.041169 0.228383

Dashboards

  • Sometimes you want to create interactive dashboards instead of static websites.

Gapminder Dashboard

How are dashboards different from static reports?

  • (the obvious) Interactive data visualisations that allow users to explore data dynamically.
  • Real-time data updates to allow users to explore data as it changes
  • Data-driven insights to allow users to explore data as it changes
    • Government sectors around the world love dashboards
    • Sales and marketing teams also use them a lot
    • Anyone with a digital platform can monitor data

The user is responsible for interpreting the data and drawing their own conclusions.

Streamlit: Quick Python Dashboards

Streamlit is a Python library that makes it easy to create web apps for data science.

Perfect for:

  • Quick prototypes
  • Data exploration tools
  • Interactive visualisations
  • Sharing results with non-technical users
import streamlit as st
import pandas as pd
import plotly.express as px

# Load your data
df = pd.read_csv('data/processed/reddit_data.csv')
# Create the app
st.title('Reddit Analysis Dashboard')
# Add filters
selected_subreddit = st.selectbox('Choose a subreddit:',
    df['subreddit'].unique()
)
# Filter data
filtered_df = df[df['subreddit'] == selected_subreddit]

# Create interactive plot
fig = px.scatter(filtered_df, x='score', y='num_comments',
                 title=f'Posts from r/{selected_subreddit}')
st.plotly_chart(fig)
# Run the app
if __name__ == '__main__':
    st.run()

Running Your Streamlit App

To run your dashboard:

  1. Save your code as app.py
  2. Open terminal in your project folder
  3. Run: streamlit run app.py
  4. Your browser opens automatically

Deployment options:

  • Streamlit Cloud (free)
  • Heroku
  • Your own server (NOT on GitHub Pages unfortunately!)
# In your terminal
cd your-me204-project
streamlit run app.py
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://192.168.1.100:8501

πŸ’» Afternoon Lab:

If anyone is interessed in using Quarto for their projects, let me know. I will help you with the Publishing to GitHub Pages part.

LSE Summer School 2025 | ME204 Week 03 Day 03