F&M College Library

Data Science

Data sources by discipline

A repository where uses can make all of their research outputs available in a citable, shareable and discoverable manner.

A free data repository open to all within and outside the Harvard community. Users can share, archive, cite, access, and explore research data. Each individual Dataverse collection is a customizable collection of datasets for organizing, managing, and showcasing datasets.

Discover open data sets on a variety of topics. Within the online community users can share analysis methods.

A free and secure cloud-based communal repository where you can store your data, where it is easy to share, access and cite.

Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of an access to data sets to researchers, funding bodies, publishers, and scholarly institutions, re3data promotes a culture of sharing, increased access and better visibility of research data.

The quality-assured, global Directory of Open Access Repositories. Search and browse thousands of registered repositories based on features such as location, software or type of material held.

A scientific online publication that focuses on large global problems - poverty, disease, hunger, climate change, war, existential risks, and inequality.

From Urban Institute: This interactive data tool allows for specialized searches and flexible data presentation from the Census of Governments State and Local Finance series. This series contains detailed revenue, expenditure, and debt variables for the US, each of the 50 states, and the District of Columbia from 1977 to 2016. The data are available by type of government (state, local, county, etc.), and users can view the data along five dimensions: total (in real or nominal dollars), per capita, fraction of personal income, fraction of general revenue, and fraction of total expenditures.

This long-standing quarterly publication forecasts U.S. and California economies, that strives to be unbiased in its approach. Online back to 2011, and in print back to 1997. Partner publications from this site include:

A yearly survey from the U.S. Census Bureau that provides data on occupation, educational attainment, home ownership status, and more.

The United States government's open data website. Provides access to datasets published by agencies across the federal government. It is intended to provide access to government open data to the public.

Produced by the National Center for Educational Statistics, a federal agency. Includes data from enrollment to outcomes, for gradeschool through post-secondary education.

The Federal Reserve Archival System for Economic Research is a project by the Research Division of the Federal Reserve Bank of St. Louis to expand on its mission to provide economic information and data to researchers interested in the U.S. economy. On this web site you will find links to scanned images of historical economic statistical publications, releases, and documents.

The data is collected and supplied from agencies from the U.S. Department of Health and Human Services as well as state partners. Includes the Centers for Medicare and Medicaid Services, Centers for Disease Control and Prevention, Food and Drug Administration, and the Agency for Health Care Research and Quality, and more.

Includes data and metadata for OECD countries and selected non-member economies.

Designed by Socrata, this initiative is to foster data-centered collaboration between governments and the private sector.

The case-level microdata for much of their research is available to the public for secondary analysis after a period of time. You can find more information here on how to use the datasets.

The Uniform Crime Reporting Program generates reliable statistics for use in law enforcement. It also provides information for students of criminal justice, researchers, the media, and the public. Crime statistics have been provided since 1930.

From the U.S. Department of Labor, the Bureau of Labor Statistics is the principal Federal agency responsible for measuring labor market activity, working conditions, and price changes in the economy.

The U.S. Census Bureau has a new content platform where one can access data collected from their decennial collection instrument.

The Program Data site provides selected statistical information on activity in all major Food and Nutrition Service programs.

This site is a single place to find a vast selection of EPA data sources, organized into topics such as agriculture, climate change, energy, sustainability, waste, and more. For each data source, you can see a basic overview, including the geographic scale and other contextual information, then access the data source itself.

This website provides access to several EPA databases to provide you with information covering environmental activities that may affect air, water, and land anywhere in the United States.

The IEA collects, assesses and disseminates energy statistics on supply and demand, compiled into energy balances in addition to a number of other key energy-related indicators, including energy prices, public RD&D and measures of energy efficiency, with other measures in development.

Intuitive tools to analyze NCSES data on R&D and the education and employment of the STEM workforce.

The U.S. Energy Information Administration (EIA) collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.

World Environment Situation Room (WESR) is the UNEP online data, information and knowledge platform. It enables users to visualize, interrogate, access, link and download data, information and knowledge products regarding the World environment.

Contains datasets from the National Alcohol Surveys; Developing a New Scale of Treatment Readiness; Epidemiology of Drinking and Disorders in Border vs. Non-Border Contexts; and more.

This program is led by the National Institutes of Health, aiming to build one of the largest biomedical data resources of its kind. The Hub stores health data from a diverse group of participants from the U.S. It will provide a "national research resource to inform thousands of research questions, covering a wide variety of health conditions. A diverse cohort of 1 million or more participants will contribute data from electronic health records (EHRs), biospecimens, surveys, and other measures to build a comprehensive set of biological, environmental, and behavioral data.

(Wide-ranging Online Data for Epidemiologic Research) is a system of searchable databases with access to a wide array of public health indicators. These include measures of chronic and communicable disease, environmental health, disease and injury prevention, and occupational health.

This is the U.S. Government’s open data. You can find Federal, state and local data, tools, and resources to conduct research, build apps, design data visualizations, and more.

Global health data available for download: the world’s most comprehensive catalog of surveys, censuses, vital statistics, and other health-related data. Search by country, data type, keyword, organization, survey family, series or systems.

From the National Cancer Institute, HINTS measures how people access and use health information; how people use information technology to manage health and health information; and the degree to which people are engaged in healthy behaviors. 

The data is collected from agencies from the U.S. Department of Health and Human Services as well as state partners. This includes the Centers for Medicare and Medicaid Services, Centers for Disease Control and Prevention, Food and Drug Administration, and the Agency for Health Care Research and Quality, among others. 

The HCUP is the largest collection of longitudinal hospital care data in the United States, with all-payer, discharge-level information beginning in 1988.

IPUMS Global Health provides integrated international health survey data at no cost for research and educational purposes from three data series: the Demographic Health Surveys (DHS), UNICEF Multiple Indicator Cluster Surveys (MICS),  and Performance Monitoring for Action (PMA) surveys.

  • MEASURE DHS Statcompiler

    The DHS (Demographic and Health Surveys) Program STATcompiler allows users to make custom tables based on thousands of demographic and health indicators across more than 90 countries. Customize tables to view indicators by background characteristics, over time, and across countries.

  • Medical Expenditure Panel Survey (MEPS)

The Medical Expenditure Panel Survey, or MEPS as it is commonly called, is the third (and most recent) in a series of national probability surveys conducted by AHRQ on the financing and utilization of medical care in the United States.

Explore data from various collection instruments including the National Survey of Family Growth, National Health and Nutrition Examination Survey, National Hospital Ambulatory Medical Care Survey, and more.

A comprehensive look at the well-being of children and non-elderly adults, and reveals sometimes striking differences among the 13 states studied in depth.

The Health Poll database is the most comprehensive database for health-related U.S. survey questions, covering eighty years of national polling. Searchable questions and results, demographic crosstabs, and trends are available on every topic related to health, from social determinants and influences on health to insurance, costs and health-care utilization.

County, ZIP code, and census tract level data. Variables in the files correspond to five key SDOH domains: social context, economic context, education, physical infrastructure, and healthcare context. The files can be linked to other data by geography

SAMHDA is a one-stop shop for SAMHSA public use data with online analysis tools. One can learn about different types of data files, what data are available as free downloadable PUFs, and what online analysis systems are available on the site.

Featured Data and Statistics Source - Census Bureau Monthly Retail Trade

Census Bureau Monthly Retail Trade

The Advance Monthly and Monthly Retail Trade Surveys (MARTS and MRTS), the Annual Retail Trade Survey (ARTS), and the Quarterly E-Commerce Report work together to produce the most comprehensive data available on retail economic activity in the United States.

Frequently used statistics

Awesome Public Datasets

GitHub logo

"This list of a topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed are free, however, some are not."