City Cluster Analysis (Data Science)

During my time at NLC (National League of Cities), I worked closely with the Research & Data team to start the "Urban/Rural Divide" project that focused on categorizing cities by various factors, straying away from the Census Bureau binary classification of such cities. I started by congregating expert-recommended variables and used SQL to join the data from MySidewalk from all available incorporated places(cities with local governments). Then I cleaned and adjusted the data to make all variables proportional in Python to create CSVS. These CSVs were imported into ArcGIS (a geographic data visualization software) and processed using the multi-clustering tool, which applies a k-means algorithm to group these cities into a selectable number of clusters.

The following slideshow highlights the project and was presented mostly to a non-technical audience. The presentation was met with many questions about how other teams could use this tool and using qualitative analysis, I was able to answer such questions as I focused on the "why" behind the work. Conclusively, this project showed different variables such as trend variables need to be introduced to the model to further isolate unique cities such as exurban and college areas. At NLC I developed solid SQL and Python data acquisition/cleaning skills, but most importantly gained crucial insight into the front-end side of data science through qualitative and quantitative surveying.

↑Back to Top