➡️ \\
➡️ \\
tag_.\n",
"\n",
" Write it in the markdown cell below:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Delete this line and write your answer here_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Write the required Python code to scrape the CSS selector you identified above. \n",
"\n",
" - Don't use the notion of containers just yet - we will practice that later in the W05 lecture. \n",
" - For now, just write the full CSS selector you identified above\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete this line and replace it with your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4. **Let's simplify.** Let's capture the title of the **first event** again, but instead of writing the entire full absolute path, like above, identify a more direct way to capture it. \n",
"\n",
" - Note: Either use scrapy's `.extract_first()` or use `extract()` and later filter the list using regular Python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete this line and replace it with your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"5. **Collect all the titles**. OK, now let's practice getting all event titles from the entire page. Save the titles into a list.\n",
"\n",
" **NOTE:** Again, collect all the information from the webpage at once. Don't use the notion of containers just yet. We will practice it in the W05 lecture."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete this line and replace it with your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"6. Do the same with the dates of the events and speaker names and save them to separate lists. \n",
"\n",
" **NOTE:** Again, collect all the information from the webpage at once. Don't use the notion of containers just yet. We will practice it in the W05 lecture.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete this line and replace it with your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"7. 🥇 **Challenge:** Combine all these lists you captured above into a single pandas data frame and save it to a CSV file. \n",
"\n",
" Tip 1: Say you have lists called `dates`, `titles`, `speakers`, you can create a data frame (a table) like this:\n",
" \n",
" ```python\n",
" df = pd.DataFrame({'date': dates,\n",
" 'title': titles,\n",
" 'speakers': speakers})\n",
" ``` \n",
" \n",
" Tip 2: What if an event does not have a date or speaker name? Set that particular event's date or speaker to `None`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete this line and replace it with your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"8. Double-check that the CSV file was created correctly by opening it using pandas. Then convert the columns to appropriate data types."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete this line and replace it with your code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"\n",
"If there is some time left, use it to work on your 📝 [W06 Summative](https://lse-dsi.github.io/DS105/2023/winter-term/assessments/w06-summative.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "66a23b877595d3e158647673320c5aac91a1fe2874d6334c4fd4c069dffc5915"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}