Transport Network Analysis

This week we will cover a different type of data: network data. We will take a look at how we can use network data to measure accessibility using the dodgr R library. We will calculate the network distances between combinations of locations (i.e. a set of origins and a set of destinations). These distances can then, for instance, be used to calculate the number of a resource (e.g. fast-food outlets) within a certain distance of a Point of Interest (e.g. a school or population-weighted centroid).

Lecture slides

You can download the slides of this week’s lecture here: [Link].

Reading list

Essential readings

  • Geurs, K., Van Wee, B. 2004. Accessibility evaluation of land-use and transport strategies: review and research directions. Journal of Transport Geography 12(2): 127-140. [Link]
  • Higgins, C., Palm, M. DeJohn, A. et al. 2022. Calculating place-based transit accessibility: Methods, tools and algorithmic dependence. Journal of Transport and Land Use 15(1): 95-116. [Link]
  • Neutens, T. Schwanen, T. and Witlox, F. 2011. The prism of everyday life: Towards a new research agenda for time geography. Transport Reviews 31(1): 25-47. [Link]

Suggested readings

  • Schwanen, T. and De Jong, T. 2008. Exploring the juggling of responsibilities with space-time accessibility analysis. Urban Geography 29(6): 556-580. [Link]
  • Van Dijk, J., Krygsman, S. and De Jong, T. 2015. Toward spatial justice: The spatial equity effects of a toll road in Cape Town, South Africa. Journal of Transport and Land Use 8(3): 95-114. [Link]
  • Van Dijk, J. and De Jong, T. 2017. Post-processing GPS-tracks in reconstructing travelled routes in a GIS-environment: network subset selection and attribute adjustment. Annals of GIS 23(3): 203-217. [Link]

Transport Network Analysis

The term network analysis covers a wide range of analysis techniques ranging from complex network analysis to social network analysis, and from link analysis to transport network analysis. What the techniques have in common is that they are based on the concept of a network. A network or network graph is constituted by a collection of vertices that are connected to one another by edges. Note, vertices may also be called nodes or points, whilst edges may be called links or lines. Within social network analysis, you may find the terms actors (the vertices) and ties or relations (the edges) also used.

Figure 1: Visualising networks with vertices and edges.

Accessibility in Portsmouth

For this week’s practical, we will be using Portsmouth in the UK as our area of interest for our analysis. One prominent topic within the city is the issue of public health and childhood obesity. According to figures released in March 2020 by Public Health England, more than one in three school pupils are overweight or obese by the time they finish primary school within the city; this is much higher than the national average of one in four. One potential contributor to the health crisis is the ease and availability of fast-food outlets in the city. In the following, we will measure the accessibility of fast-food outlets within specific walking distances of all school in Portsmouth starting at 400m, then 800m and finally a 1km walking distance. We will then aggregate these results to Lower Super Output Areas (LSOA) and overlay these results with some socio-economic variables.

To execute this analysis, we will need to first calculate the distances between our schools and fast-food outlets. This involves calculating the shortest distance a child would walk between a school and a fast-food outlet, using roads or streets. We will use the dodgr R package to conduct this transport network analysis.

All calculations within the dodgr library currently need to be run in WGS84/4236. This is why we will not transform the CRS of our data in this practical.

Acquiring network data

As usual, we will start by loading any libraries we will require. Install any libraries that you might not have installed before.

R code
# libraries
library(tidyverse)
library(sf)
library(osmdata)
library(dodgr)

To create our network and Origin-Destination dataset, we will need data on schools, fast-food outlets, and a streetnetwork. Today we will be using OpenStreetMap for this. If you have never come across OpenStreetMap (OSM) before, it is a free editable map of the world.

OpenStreetMap’s spatial coverage is still unequal across the world as well as that, as you will find if you use the data, the accuracy and quality of the data can often be quite questionable or simply missing attribute details that we would like to have, e.g. types of roads and their speed limits.

Whilst there are various approaches to downloading data from OpenStreetMap, we will use the osmdata library to directly extract our required OpenStreetMap (OSM) data into a variable. The osmdata library grants access within R to the Overpass API that allows us to run queries on OSM data and then import the data as spatial objects. These queries are at the heart of these data downloads.

We will go ahead and start with downloading and extracting our road network data. To OSM data using the osmdata library, we can use the add_osm_feature() function. To use the function, we need to provided it with either a bounding box of our area of interest (AOI) or a set of points, from which the function will create its own bounding box. You can find out more about this and details on how to construct your own queries in the data vignette.

A bounding box, or bbox, is an area defined by two longitudes and two latitudes. Essentially, it is a rectangular georeferenced polygon that you can use to demarcate an area. You can either define bounding box coordinates yourself or extract values from an existing shapefile or GeoPackage.

To use the library (and API), we need to know how to write and run a query, which requires identifying the key and value that we need within our query to select the correct data. Essentially every map element (whether a point, line or polygon) in OSM is tagged with different attribute data. These keys and values are used in our queries to extract only map elements of that feature type - to find out how a feature is tagged in OSM is simply a case of reading through the OSM documentation and becoming familiar with their keys and values.

To download our road network dataset, we first define a variable to store our bounding box coordinates, p_bbox(). We then use this within our OSM query to extract specific types of road segments within that bounding box - the results of our query are then stored in an osmdata object. We will select all OSM features with the highway tag that are likely to be used by pedestrians (e.g. not motorways).

R code
# define our bbox coordinates for Portsmouth
p_bbox <- c(-1.113197, 50.775781, -1.026508, 50.859941)

# pass bounding box coordinates into the OverPassQuery (opq) function only
# download features that are not classified as motorway
osmdata <- opq(bbox = p_bbox) |>
    add_osm_feature(key = "highway", value = c("primary", "secondary", "tertiary",
        "residential", "path", "footway", "unclassified", "living_street", "pedestrian")) |>
    osmdata_sf()

In some instances the OSM query will return an error, especially when several people from the same location are executing the exact same query. If this happens, you can just read through the instructions and download a prepared copy of the data that contains all required OSM Portsmouth data instead: [Link].

You can load these downloaded data as follows into R:

R code
load("../path/to/file/ports_ff.RData")
load("../path/to/file/ports_roads_edges.RData")
load("../path/to/file/ports_schools.RData")

After loading your data, you can continue with the analysis in the Measuring Accessiblity section below, starting with the creation of a network graph with the ‘foot weighting’ profile.

The osmdata object contains the bounding box of your query, a time-stamp of the query, and then the spatial data as osm_points, osm_lines, osm_multilines and osm_polgyons (which are listed with their respective fields also detailed). Some of the spatial features maybe empty, depending on what you asked your query to return. Our next step therefore is to extract our spatial data from our osmdata object to create our road network data set. This is in fact incredibly easy, using the traditional $ R approach to access these spatial features from our object.

Deciding what to extract is probably the more complicated aspect of this - mainly as you need to understand how to represent your road network, and this will usually be determined by the library/functions you will be using it within. Today, we want to extract the edges of the network, i.e. the lines that represent the roads, as well as the nodes of the network, i.e. the points that represent the locations at which the roads start, end, or intersect. For our points, we will only keep the osm_id data field, just in case we need to refer to this later. For our lines, we will keep a little more information that we might want to use within our transport network analysis, including the type of road, the maximum speed, and whether the road is one-way or not.

R code
# extract the `p`oints, with their osm_id.
ports_roads_nodes <- osmdata$osm_points[, "osm_id"]

# extract the lines, with their osm_id, name, type of highway, max speed and
# oneway attributes
ports_roads_edges <- osmdata$osm_lines[, c("osm_id", "name", "highway", "maxspeed",
    "oneway")]

To check our data set, we can quickly plot the edges of our road network using the plot() function:

R code
# inspect
plot(ports_roads_edges, max.plot = 1)

Figure 2: OSM road network

Because we are focusing on walking, we will overwrite the oneway variable by suggesting that none of the road segments are restricted to one-way traffic which may affect our analysis as well as the general connectivity of the network.

R code
# overwrite one-way default
ports_roads_edges$oneway <- "no"

Now we have the network edges, we can turn this into a graph-representation that allows for the calculation of network-based accessibility statistics.

Measuring accessibility

Before we can construct our full network graph for the purpose of accessibility analysis, we need to also provide our Origin and Destination points, i.e. the data points we wish to calculate the distances between. According to the dodgr documentation, these points need to be in either a vector or matrix format, containing the two coordinates for each point for the origins and for the destinations.

As for our Portsmouth scenario we are interested in calculating the shortest distances between schools and fast-food outlets, we need to try and download these datasets from OpenStreetMap as well. Following a similar structure to our query above, we will use our knowledge of OpenStreetMap keys and values to extract the points of Origins (schools) and Destinations (fast-food outlets) we are interested in:

R code
# download schools
schools <- opq(bbox = p_bbox) |>
    add_osm_feature(key = "amenity", value = "school") |>
    osmdata_sf()

# download fast-food outlets
ff_outlets <- opq(bbox = p_bbox) |>
    add_osm_feature(key = "amenity", value = "fast_food") |>
    osmdata_sf()

We also need to then extract the relevant data from the osmdata object:

R code
# extract school points
ports_schools <- schools$osm_points[, c("osm_id", "name")]

# extract fast-food outlet points
ports_ff <- ff_outlets$osm_points[, c("osm_id", "name")]

We now have our road network data and our Origin-Destination (OD) points in place and we can now move to construct our network graph and run our transport network analysis.

In this analysis, we are highly reliant on the use of OpenStreetMap to provide data for both our Origins and Destinations. Whilst in the UK OSM provides substantial coverage, its quality is not always guaranteed. As a result, to improve on our current methodology in future analysis, we should investigate into a more official school data set or at least validate the number of schools against City Council records. The same applies to our fast-food outlets.

With any network analysis, the main data structure is a graph, constructed by our nodes and edges. To create a graph for use within dodgr, we pass our ports_roads_edges() into the weight_streetnet() function. The dodgr library also contains weighting profiles, that you can customise, for use within your network analysis. These weighting profiles contain weights based on the type of road, determined by the type of transportation the profile aims to model. Here we will use the weighting profile foot, as we are looking to model walking accessibility.

R code
# create network graph with the foot weighting profile
graph <- weight_streetnet(ports_roads_edges, wt_profile = "foot")

Once we have our graph, we can then use this to calculate our network distances between our OD points. One thing to keep in mind is that potentially not all individual components in the network that we extracted are connected, for instance, because the bounding box cut off the access road of a cul-de-sac. To make sure that our entire extracted network is connected, we now extract the largest connected component of the graph. You can use table(graph$component) to examine the sizes of all individual subgraphs. You will notice that most subgraphs consist of a very small number of edges.

The dodgr package documentation explains that components are numbered in order of decreasing size, with $component = 1 always denoting the largest component. Always inspect the resulting subgraph to make sure that its coverage is adequate for analysis.

R code
# extract the largest connected graph component
graph_connected <- graph[graph$component == 1, ]

# inspect number of remaining road segments
nrow(graph_connected)
[1] 60700
# inspect
plot(dodgr_to_sf(graph_connected), max.plot = 1)

Figure 3: Largest graph component

OpenStreetMap is a living dataset, meaning that changes are made on a continuous basis; as such it may very well possible that the number of remaining road segments as shown above may be slighlty different when you run this analysis.

Now we have our connected subgraph, will can use the dodgr_distances() function to calculate the network distances between every possible Origin and Destination. In the dodgr_distances() function, we first pass our graph, then our Origin points (schools), in the from argument, and then our Destination points (fast-food outlets), in the to argument. One thing to note is our addition of the st_coordinates() function as we pass our two point data sets within the from and to functions as we need to supplement our Origins and Destinations in a matrix format. For all Origins and Destinations, dodgr_distances() will map the points to the closest network points, and return corresponding shortest-path distances.

R code
# create a distance matrix between schools and fast-food stores
sch_to_ff_calc <- dodgr_distances(graph_connected, from = st_coordinates(ports_schools),
    to = st_coordinates(ports_ff), shortest = TRUE, pairwise = FALSE, quiet = FALSE)

The result of this computation is a distance-matrix that contains the network distances between all Origins (i.e. schools) and all Destinations (i.e. fast-food outlets). Let’s inspect the first row of our output. Do you understand what the values mean?

R code
# inspect
head(sch_to_ff_calc, n = 1)
         3708702676 583409150 110151723 4179720607 112032935 1684258957
35299419   4000.016  2090.485  6551.723    9067.55  10614.91   2231.919
         35510611 6806456949 1319464203 1319464086 2537832173 1319464203
35299419 11570.17   2292.263   1676.914   1680.151   1697.525   1676.914
         583409150 583409150 583409150 3708702676 6806456947 3708702676
35299419  2090.485  2090.485  2090.485   4000.016   2324.454   4000.016
         3708702676 4741221735 3080970373 6486730562 2526286989 360754572
35299419   4000.016    3338.84   1700.179   581.0582   7262.143  3370.976
         1684760757 111811784 4547890993   596188 360666535 153334012 35309497
35299419   1359.146  3321.384   340.8823 1102.226  10508.09  2644.278 2720.491
         4559843487 533710034 1584811969 35309619 35309619 1584608863
35299419   3260.791  2330.386   2580.465 2920.565 2920.565   736.0556
         2530707658   210200 8501630407   163535 33033068 1517208796 1592759310
35299419   3399.756 844.6083   6005.212 3394.408  8882.48   9083.916   8949.552
         33024082 1322971868 1584776930 1787982426 128227681 3119584321
35299419 9193.238   1629.029   3277.902   2400.918  6946.724   8688.361
         33032892 112015327 12036553018 12036553018 360951843 360951843
35299419 9114.322  9041.214    6503.173    6503.173  6470.933  6470.933
         1684217275 1684048292 8788727818 1446611129 1446611129 1765156609
35299419   804.5849   763.0456   2526.807   1010.408   1010.408   1967.697
           688134 1584811969 117492188 1765137127 1496776935 1634771122
35299419 1418.767   2580.465  1245.823    2157.02   688.8672   2162.922
           691638   672367 12033597986 28836634 1765137126   210200 851157783
35299419 2205.285 2119.323    11968.16 4603.517   2738.124 844.6083  615.1102
         3357036324 8788727825 6170004942 117484085 35510611 35510611 35510611
35299419   788.7212   2586.476   2667.125  728.4882 11570.17 11570.17 11570.17
         35510611 35510611 35510611 35510611 35510611 35510611 35510611
35299419 11570.17 11570.17 11570.17 11570.17 11570.17 11570.17 11570.17
         35510611 35510611 35510611 2469323680 11549983260 11549983260
35299419 11570.17 11570.17 11570.17    11051.2    6380.673    6380.673
         11549983260 9563701017 3708702676 3708702676 1747135467 1682386860
35299419    6380.673   6214.692   4000.016   4000.016   4234.058   5702.077
         128349051 6486730562   691582 41466838 4533088712 4533088770
35299419  3841.384   581.0582 597.3575 2654.314   4884.169   5081.596
         5589038074 850508342 4639702744 5589038074 3227275829 1684055602
35299419   4823.045  375.2084   926.0127   4823.045   3808.049    753.752
           474557 1584608915 1517984612 11513738570 1684226066 27679037
35299419 556.5173    713.536   622.6949    619.2467    1225.66 3452.874
         1381614134 5337216850 11549983251 11549983257 11549983257 11549983252
35299419   3363.827   5226.252    6439.123      6411.8      6411.8    6415.946
         11549983252 11549983252 11549983252 11549983251 11549983251
35299419    6415.946    6415.946    6415.946    6439.123    6439.123
         11549983252 11549983247 11549983251 11549983251 11549983251
35299419    6415.946    6453.555    6439.123    6439.123    6439.123
         11549983251 12034855970 1517209100 1517209100 9445278102 4179720618
35299419    6439.123    12081.21   9239.889   9239.889   5296.254    9050.93
         4179720618 360692085 618271588 360692085 4179720618 12033597756
35299419    9050.93  9002.504  8275.415  9002.504    9050.93    11708.13
         12033597753 12033597764 12033597762 9307660916 11364252738 1684259012
35299419    11696.37    11754.26    11722.44   9259.759    3349.922   2300.121
         4361632708 1314915645 7006050181   474383   474383   474383 1448664912
35299419   815.5434   3219.572   5485.599 905.8238 905.8238 905.8238   4950.036
         26658915 118724163 1804412114 107228955 10713423   547019 3754347320
35299419 4968.988  858.9023   3216.095  3159.418 11347.78 4919.699   1813.109
         33033074 3080970374 20464883 1517208858 107145004 7028566698
35299419 8871.589   1666.407 9310.528   9370.483  4462.093   3177.182
         1314915645 36866188 1682829408 5433229750 1517208858   194076
35299419   3219.572 3118.604   5756.392   9250.383   9370.483 5462.571
         130069978 360689836 108044084 108044703 111996556 1684631749 107912783
35299419  4046.518  8931.139  6193.605  6126.632   8616.91   6647.454  6049.243
         112032936   210208 1240746711 1917246883   596120 11659717644
35299419  10483.53 1505.914   4995.748   11232.78 1574.826    5218.511
         942789065 11527934107 1517209249   402702 26658915 361463301 131956097
35299419  1804.091    3084.572   8813.936 4512.959 4968.988  3803.913  12902.97
         11737454087 1381614134 1381614134   474557 3357036324 33032729
35299419    5250.123   3363.827   3363.827 556.5173   788.7212   9824.4
         4533088711 5433229750 5433229750 5433229750 4787197864 4787197864
35299419   4866.125   9250.383   9250.383   9250.383   9232.061   9232.061
         4787197864 4787197864 5433229750 5433229750 5837229773 6992272607
35299419   9232.061   9232.061   9250.383   9250.383   5774.382   5383.464
          402703 361463301 1684581956 1684582104 35298117 849545628 850213112
35299419 4492.53  3803.913   4661.927   4913.756 1543.604  1510.665  1559.537
         4533088743 1917247132 5478469295 5478469294   470102 1584776930
35299419   5081.936   11390.34   4719.045   4723.757 4271.398   3277.902
         1682809610 11659717644 4533088770 4533088770 4533088770 4533088770
35299419   5819.778    5218.511   5081.596   5081.596   5081.596   5081.596
         4533088770 4533088770 4533088770   320774 8501630403 107887646
35299419   5081.596   5081.596   5081.596 4963.868   6045.425   5140.89
           402702 1681595494 11549983252 8489626386 1917246879 1917246684
35299419 4512.959   5601.532    6415.946   4625.997   11303.07   11327.69
         5589038074 5589038074 5589038074 5589038074 4533088711 4533088714
35299419   4823.045   4823.045   4823.045   4823.045   4866.125   4895.861
         4533088711 4533088714 4533088714 4533088713 130240118 110151723
35299419   4866.125   4895.861   4895.861   4890.115  11885.28  6551.723
         110151723 110151723 110151723 110151723 110151723 29368594 26658915
35299419  6551.723  6551.723  6551.723  6551.723  6551.723 4582.555 4968.988
         7305624459 106007661 8987883618 5859802411   518610   518610   518611
35299419   11517.33  4548.575   4136.321   1129.672 1302.289 1302.289 1320.727
         800803720   290950 4081504238 4081504206 4533088742 35510611
35299419  3344.675 4464.824   1332.334   1122.165   5061.751 11570.17
         12033597756 6732089007 12036553018 7006091789 4559843598 5049080642
35299419    11708.13    3002.79    6503.173   5562.273   3135.612   3166.975
         1917246684 35309619 1740407820 158373125 7006142009 7538876197
35299419   11327.69 2920.565    1677.06  2312.611   5620.269   5518.542
         1787929945 7006142009 7006142009 4533088693 851157206 1592759339
35299419   5520.407   5620.269   5620.269   4733.129  1128.534   8944.679
          191611 107545686 8788625297 8788625298 8788625298 4081504206
35299419 2060.98  7498.559   1114.958   1092.273   1092.273   1122.165
         8788625298 4936929522   691638 2113130392 8788728291 4741418602
35299419   1092.273    4661.76 2205.285   3361.825   2738.453   3026.335
         9465974729 4559843597 4559843597 360754572 360754572 1947998799
35299419   3260.599   3275.482   3275.482  3370.976  3370.976   3400.026
         107228953 4559843597 360465477   691582 1584777018 7111358956
35299419  3139.566   3275.482   2487.06 597.3575    3294.96   7368.158
         6801562238 6801562238 7111358956
35299419    7329.26    7329.26   7368.158

Our output shows the calculations for the first school - and the distances between the school and every fast-food outlet. Because we manually overwrote the values for all one-way streets as well as that we extracted the larges connected graph only, we currently should not have any NA values.

The dodgr vignette notes that a distance matrix obtained from running dodgr_distances on graph_connected should generally contain no NA values, although some points may still be effectively unreachable due to one-way connections (or streets). Thus, routing on the largest connected component of a directed graph ought to be expected to yield the minimal number of NA values, which may sometimes be more than zero. Note further that spatial routing points (expressed as from and/or to arguments) will in this case be mapped to the nearest vertices of graph_connected, rather than the potentially closer nearest points of the full graph.

The next step of processing all depends on what you are trying to assess. Today we want to understand which schools have a closer proximity to fast-food outlets and which do not, quantified by how many outlets are within walking distance. We will therefore look to count how many outlets are with walking distance from each school and store this as a new column within our ports_school data frame.

R code
# fastfood outlets within 400m
ports_schools$ff_within_400m <- rowSums(sch_to_ff_calc <= 400)

# fastfood outlets within 800m
ports_schools$ff_within_800m <- rowSums(sch_to_ff_calc <= 800)

# fastfood outlets within 1000m
ports_schools$ff_within_1km <- rowSums(sch_to_ff_calc <= 1000)

You can inspect the ports_schools object to see the results of this analysis.

Tutorial task

Now you have calculated the number of fast-food outlets within specific distances from every school in Portsmouth and should get the idea behind a basic accessibility analysis, your task is to estimate the accessibility of fast-food outlets at the LSOA scale and compare this to the 2019 Index of Multiple Deprivation.

This skills and steps required for this analysis are not just based on this week’s practical, but you will have to combine all your knowledge of coding and spatial analysis you have gained over the past weeks.

One way of doing this, is by taking some of the following steps:

  • Download the 2011 LSOA boundaries and extract only those that relate to Portsmouth.
  • Download the 2019 Index of Multiple Deprivation scores.
  • Decide on an accessibility measure, such as:
    • The average number of fast-food restaurants within x meters of a school within each LSOA.
    • The average distance a fast-food restaurant is from a school within each LSOA.
    • The (average) shortest distance a fast-food restaurant is from a school within each LSOA.
    • The minimum shortest distance a fast-food outlet is from a school within each LSOA.
  • Aggregate accessibility scores to the LSOA level.
  • Join the 2019 Index of Multiple Deprivation data to your LSOA dataset.
  • For each IMD decile, calculate the average for your chosen aggregate measure and produce a table.

Using your approach what do you think: are fast-food restaurants, on average, more accessible for students at schools that are located within LSOAs with a lower IMD decile (more deprived) when compared to students at schools that are located within LSOAs with a higher IMD decile (less deprived)?

Want more? [Optional]

We have now conducted some basic accessibility analysis, however, there is some additional fundamental challenges to consider in the context of transport network and accessibility analysis:

  1. How do the different weight profiles of the dodgr package work? How would one go about creating your own weight profile? How would using a different weight profiles affect the results of your analysis?
  2. Why do we have unconnected segments in the extracted transport network? How would you inspect these unconnected segments? Would they need to be connected? If so, how would one do this?
  3. Why you think all Origins and Destinations are mapped onto the closest network points? Is this always the best option? What alternative methods could you think of and how would you implement these?

If you want to take a deep dive into accessibility analysis, there is a great resource that got published recently: Introduction to urban accessibility: a practical guide in R.

Before you leave

Having finished this tutorial on transport network analysis and, hopefully, having been able to independently conduct some further area-profiling using IMD deciles, you have now reached the end of this week’s content.