Data Download: Metrorail Ridership by Origin and Destination

Photo courtesy Josh Bancroft (click for original context)

Every day, Metro gathers a vast amount of information on how customers use the system – where and when they pass through turnstiles and board buses, how they pay, and more. There’s much to be learned from this data, and many in the past have done so.  We’ve heard through MindMixer, Metro’s new online community engagement site, that more detailed ridership statistics would be useful.  So in the spirit of open data and collaboration, here’s a data download of rail station-to-station passenger counts, by time period, by day of the week, for May 2012.

May 2012 Metrorail OD Table by Time of Day and Day of Week (.xls, 6.8 MB)

This data can answer many questions, such as: Where do passengers entering at one station go? Where do late-night riders enter the system? How does Saturday ridership differ from Sunday? Which stations are most commuter-oriented, and which are most lively at midday and evening hours?

What does this data tell you? Do you see any patterns? Feel free to post a link in the comments!

What other data that would help answer additional questions?

Technical notes about this data:

  • The data show average ridership, averaged across all days in May 2012, excluding Memorial Day. (We typically use May as an “average” month, since it falls in the middle of seasonal swings, is relatively unaffected by extreme weather, etc.)
  • Time period shows the time the passenger entered (not the time they exited).
  • AM Peak = opening to 9:30am
  • Midday = 9:30am to 3:00pm
  • PM Peak = 3:00pm to 7:00pm
  • Evening = 7:00pm to midnight
  • Late-Night Peak = Friday and Saturday nights only, midnight to closing

Related Posts:

  1. November 2nd, 2012 at 15:25 | #1

    Thanks for providing this dataset. I’m playing around with visualizing the data now – you can see the work in progress here: http://t.co/i0rGFdGN

    The first tab, “Ridership Volume”, summarizes the average number of weekday riders by entrance station, exit station, and time period (AM Peak, Midday, PM Peak, Evening). You can filter these three graphs by time period.

    Second tab, “Travel Between Stations”, is a network graph of the number of daily riders between any given entrance station and exit station. You can also just select a specific time period to look at. This tab is very slow loading, so please be patient (especially working with free software!).

    Appreciate any feedback on what might be interesting to others. I’m hoping to evolve this to provide more useful analysis.

    Well-loved. Like or Dislike: Thumb up 4 Thumb down 0

  2. Justin
    November 2nd, 2012 at 16:27 | #2

    Very cool, Crystal. I especially like the second tab! This piques a few ideas – what about filtering to just the “from” or “to” lines? Or grouping some stations together? Or arranging the blue dots on top of a rail map?

    Great stuff, thanks!

    Like or Dislike: Thumb up 0 Thumb down 0

  3. Doug
    November 3rd, 2012 at 19:42 | #3

    Crystal, I would love to see this broken down in some way to visualize

    a)how crowded is a train at any given points? That is, summing up all the entries and subtracting all exits from Vienna to Courthouse, how full is it when it gets to Rosslyn? Gets tricky with line transfers of course, but even for the terminal segments

    b)grouped by line. How many people exit on the same line they entered? Obviously, dual-line stations complicate this, but a trip from Crystal City to Archives is functionally all on the yellow line.

    Like or Dislike: Thumb up 0 Thumb down 0

  4. Doug
    November 3rd, 2012 at 19:44 | #4

    Let me add one more: a commuter ratio. What is the ratio of AM Peak entrances to exits? Is it a bedroom community, or an office park?

    Like or Dislike: Thumb up 0 Thumb down 0

  5. Matt Dickens
    November 7th, 2012 at 17:35 | #5

    Justin,

    Thanks for taking my suggestion into consideration and for releasing this dataset. I appreciate the effort you and others are putting into making data about WMATA more widely available. Making data like this more available is critical to helping researchers and the public advocate for transit.

    Hopefully WMATA is working on a plan to release more and more data like this – having one month that is “average” is neat, but more detailed data for more months can help people tease out patterns and changes and lend more depth to analysis like your recent chart of the week (http://planitmetro.com/2012/11/05/chart-of-the-week-5-year-ridership-change-by-station/). Having even more detailed daily data like the CTA provides could let people see how track work, daily weather, traffic patterns, or special events affect ridership patterns.

    Again, great work and thanks for listening.

    Like or Dislike: Thumb up 3 Thumb down 0

  6. Justin
    November 8th, 2012 at 09:10 | #6

    @Matt Dickens Thanks, glad to help! An average day can be useful for some analyses, but not others. What other formats of the data would be useful? What would you like to see?

    Like or Dislike: Thumb up 0 Thumb down 0

  7. Matt Dickens
    November 8th, 2012 at 17:24 | #7

    @Justin
    I think an interesting starting place, and hopefully not too heavy a lift, would be to release a spreadsheet like this every month, and start developing an archive of them. I would make the figures in the sheets total trips by day type/time period/entry station/exit station rather than the averages in this example. You can note how many days of each type there were in the month and people can produce their own averages if they want.

    Then, start going back through the historical record of ridership and filling in the gaps. You could start with an average month spreadsheet for each year Metro has been open and then produce additional months as you have the resources in order to fill in the empty spaces in the record.

    I’m not sure how easy it is to produce something similar for bus lines. I assume creating a spreadsheet with monthly trips for each line by type of day is possible, perhaps the same AM/PM splits are as well. That same information would also be useful to people trying to see how development patterns influence ridership through time.

    The ultimate granular data source would be O-D data by time period like you’ve produced here, but on a daily basis. That would allow people to look at really specific day-to-day changes, like do differing weather patterns mean more people riding metro or the bus? How does track work change ridership at different stations – do some go down and others go up? The CTA has daily entries for each of their stations and daily bus route ridership going back to 2001 (http://www.transitchicago.com/news_initiatives/ridershipreports.aspx#open). Their tool has an API that programmers can use to access the data, and it also allows exporting data in a variety of different formats suitable for excel or databases.

    Here’s a cool visualization someone made using NYCT ridership data by station that lets you transform the data through time to see how ridership patterns changed: http://diametunim.com/shashi/nyc_subways/

    Like or Dislike: Thumb up 0 Thumb down 0

  8. Justin
    November 14th, 2012 at 17:53 | #8

    @Crystal What if you grouped together stations in your circular visualization – something kind of like this? What do you think? http://bost.ocks.org/mike/uberdata/

    Like or Dislike: Thumb up 1 Thumb down 0

  9. Justin
    November 14th, 2012 at 18:15 | #9

    @Matt Dickens
    Good ideas, Matt. CTA’s interface to ridership is impressive – that would definitely be an undertaking for us, with lots of help from the IT folks!

    Our data warehouses don’t go back forever, but we can definitely pull some other months and years if you’d like, or even using other dimensions (half-hour interval, travel time, fare, etc). We can do O-D by time period, by day too, but keep in mind each day is about 30,000 rows (86 stations x 86 stations x 4 time periods). What did you have in mind?

    We’ll definitely look into posting bus ridership, too.

    And of course, seeing the data put to great use is helpful for all!

    Like or Dislike: Thumb up 0 Thumb down 0

  10. Matt Dickens
    November 16th, 2012 at 17:09 | #10

    @Justin
    I think the point of my request was just that WMATA create an archive of ridership data so that anyone could use it, not to respond to specific requests. Like I said before, if I were creating this archive I would start by posting on the WMATA website somewhere a page of ridership spreadsheets like this – the May average month sheet for every year that you can. And going forward, add to that page each month as it passes, starting with October 2012? And if you do requests for people that can be posted publicly then post those on a separate page?

    Like or Dislike: Thumb up 0 Thumb down 0

  11. Graham MacDonald
    November 18th, 2012 at 13:01 | #11

    @Justin
    I think it would be really useful to provide these statistics PER train, so that people can understand, in terms of their everyday experience, how many people are on a given train on average at a given day-time period. This could help answer interesting questions for people that are deciding whether to ride trains at certain times.

    Like or Dislike: Thumb up 0 Thumb down 0

  12. Justin
    November 19th, 2012 at 09:32 | #12

    @Graham MacDonald
    Good idea, Graham (and echoes Doug’s “b” above). If I could post a table easily for that, I would! Unfortunately, assigning passengers to trains (or “links” in the network) is a fairly complex modeling exercise, since we only count people when they enter/exit turnstiles. We need to make some estimates of which train people got on where they have more than one potential route – like from Fort Totten to Foggy Bottom, or Pentagon City to McPherson Square. Second, we need to match up ridership to the schedule of service provided. And finally, we need to select a time period in which to perform the analysis. The Planning Office does use a tool to produce results like this, and we’re updating the software now. We will definitely look into publishing the data – is this what you had in mind?

    Like or Dislike: Thumb up 0 Thumb down 0

  13. Graham MacDonald
    November 19th, 2012 at 20:04 | #13

    @Justin
    Thanks Justin. I’ve actually made my own model and created a visualization of the number of people riding. Check out the end result here: http://ridingmetro.com. I realize it’s very difficult to match up ridership with the schedule of service provided, but even more information about how the data you released is calculated would be helpful – such as what are the cutoffs for trains in the AM Peak (e.g., if I’m on the train at 9:30 does my O-D pair count toward the average for AM Peak or Midday?).

    That tool looks pretty neat, and it would be great if at some point you were able to publish the data, which I think would be very interesting and go a long way toward answering my previous question! Maybe also a white paper or some documentation (if it’s already handy) on how the tool produces such output would be helpful as well. Thanks for being so responsive.

    Like or Dislike: Thumb up 0 Thumb down 0

  14. Justin
    November 20th, 2012 at 09:28 | #14

    @Graham MacDonald
    Impressive visualization, thanks! We’re looking at the details now. The data shows trips by the time they entered the turnstiles (see second bullet in the post), so if you walk through the turnstiles at 9:25am and exit again at 9:45am, you’re counted as AM Peak based on your entry time.

    Like or Dislike: Thumb up 0 Thumb down 0

  15. Michael
    November 20th, 2012 at 15:49 | #15
  16. Graham MacDonald
    November 24th, 2012 at 11:19 | #16

    @Justin

    Thanks Michael – I’ve taken a look at that and as per my previous post, would love to see more of it and understand what assumptions go into it.

    Justin, that’s actually really helpful information! I’m now thinking, given that this data is from opening to 9:30 for the AM peak, for example, that if I were to try to show average people on the train, it might not match peoples’ experiences (as those who enter the station at 6 have a much emptier train than those entering at 8). If you do end up posting more granular data (data based on the post you and Michael reference above, or data by hour instead of AM Peak, for example), it might be more interesting to look at the average number of people per time period. Again, thanks a lot for being responsive, this level of data expertise is really heartening for me as a regular metro rider!

    Like or Dislike: Thumb up 0 Thumb down 0

  17. Matt Briney
    January 26th, 2013 at 08:40 | #17

    This dataset is a good start but it would be more helpful if it could be broken down by half hour rather than your bulk large window times. If people get sick of packed trains I suspect a solution could be to shift your schedule by a half hour in some cases.

    Like or Dislike: Thumb up 1 Thumb down 0

  18. February 11th, 2013 at 22:10 | #18

    Here’s my own project using the Metro data:
    http://www.mvjantzen.com/blog/?p=3295 (blog post)
    http://mvjantzen.com/tools/visualizer/?system=metro (application)

    Feedback appreciated!

    Like or Dislike: Thumb up 0 Thumb down 0