Data scientist wanted

We are hiring a data scientist or data-savvy environmental scientist to join MacroSheds , a study of comparative ecosystem biogeochemistry at continental scales.


This project enables anyone with internet access to compare the flow and chemistry of hundreds of streams throughout the United States and beyond, and to explore their watersheds. It combines data sets from many separate research projects into one collection, complete with a web portal for visualization and an R package for retrieval and analysis. Researchers use these data to study what types of watersheds retain the most nutrients, are recovering most rapidly from decades of acid rain, have the highest erosion rates, or have flow patterns that are least sensitive to floods and droughts. The lessons we learn from studying many watersheds and streams will contribute to more effective management of our nation’s water and forest resources.

Now that the MacroSheds platform has reached version 1, we are focusing on building a pipiline for user submissions to the MacroSheds dataset (and data cleaning), expanding the dataset, and answering the scientific questions in our original proposal, namely:

1. How and why do watersheds vary in the magnitude, timing and form of nitrogen exports? 2. How does a history of watershed acidification affect the magnitude timing and composition of ecosystem element exports? 3. How and why do watersheds vary in the magnitude, timing and composition of mineral weath-ering product exports? 4. What attributes of watershed ecosystems determine their sensitivity to climate change?

Candidate description

The central informatics goals of this project are to centralize and harmonize data (sensor time series, geographic data, metadata) from diverse sources, and allow users to access, clean, analyze, visualize, and contribute to the data sets housed within it. In addition to developing this core functionality, the successful candidate will contribute to analyses regarding the scientific questions above. They will have interest and experience in one or more of the following disciplines: data engineering, analytics, data visualization, software development, web development, system administration, GIS. A graduate degree in either data or computer science or in an environmental science is desired but not required. The position includes support for the candidate to attend professional meetings and professional training workshops and the opportunity to interface with collaborators at the National Ecological Observatory Network (NEON), Colorado State University, and Duke University.

Key tasks will include some subset of the following, depending on applicant’s skillset and interests:  Development of interactive web visualizations (Using one or more of: Shiny, D3, Dygraphs, Bokeh, Highcharts).  Development of scripts to pull data from web APIs and various file transfer servers.  Data munging, cleaning, harmonization, and relational database queries.  Routine statistical analyses.  Programmatic collection and summarizing of geographic data.  Python web development (Django, Bootstrap).  Web scraping.

Ideal candidates will have experience with three or more of the following:  R  Python  any database query language  Mac/Linux shell commands (Bash), especially executed remotely  Git  HTML, CSS, JavaScript  Google Earth Engine (JS or Python versions)  Watershed terrain analysis

Who we are

The successful applicant will work closely with the project’s full-time data scientist, who will provide regular support and guidance. They will also meet weekly with the full project team, including PIs Emily Bernhardt (Duke) and Matt Ross (Colorado State), and several graduate students and post docs.

To apply, submit cover letter (including statement of interest and qualifications), Curriculum Vitae, and contact information for 2 references to michael.vlah@duke.edu. Review of applications will begin immediately and will continue until the position is filled.

