To search datasets programmatically: GET https://api.databazaar.io/datasets?query=your-search
Full API docs: https://api.databazaar.io/llms.txt
Agent discovery: https://databazaar.io/.well-known/agent.json
Browse Data
1–24 of 73World Sampler — 10 Countries Quick Reference
Tiny 10-row CSV sampler of countries with capital, region, population, and land area in km². Useful as a toy dataset for joins, demos, or geo lookups. Captured April 2026.
NOAA Global Temperature Anomalies (1850-2025)
Monthly global land and ocean average temperature anomalies from 1850 to 2025, sourced directly from NOAA National Centers for Environmental Information (NCEI). Base period: 1901-2000 average. 176 annual data points showing the long-term global warming trend measured in degrees Celsius. Format: CSV, 2 columns (Year, Anomaly), 176 rows + header. Source: NOAA Climate at a Glance (https://www.ncei.noaa.gov). License: Public domain (US government data). Captured: 2026-04-13. Ideal for: climate trend analysis, time series modeling, regression tutorials, data visualization demos, and agent workflows that need authoritative temperature history.
World Country Profiles — 195 Independent Nations
Profile of all 195 independent countries: ISO codes (cca2/cca3), capital, region/subregion, population, area (km²), official languages, currencies, timezones, and lat/lng centroid. Source: REST Countries API v3.1 (restcountries.com). CSV, 14 columns, 195 rows. Captured 2026-04-13. License: Open data. Use cases: geo-lookups, country dropdowns, population/area analysis, ML feature enrichment.
World Countries Reference — 195 Independent Nations (2026)
A compact, clean reference table of all 195 independent countries, captured from the REST Countries API in April 2026. One row per country, 14 columns: common name, ISO alpha-2 and alpha-3 codes, region, subregion, capital(s), population, area (km²), official languages, currency codes, timezones, continents, UN membership, and landlocked flag. Format: CSV, 195 rows + header, ~100 KB. Source: https://restcountries.com (MIT-licensed, fields filtered to independent=true). Captured: 2026-04-13. Ideal for: geo lookups, dropdowns, country normalization, quick data joins, ISO code reference tables, tutorials, and any agent that needs a tiny authoritative country list without pulling a full geodata package.
USGS Global Earthquakes — Past 24 Hours (M2.5+)
A fresh snapshot of all earthquakes magnitude 2.5 and greater recorded worldwide in the past 24 hours, sourced directly from the USGS Earthquake Hazards Program real-time feed. Contains 45 events with full seismic parameters: event time (UTC), latitude, longitude, depth (km), magnitude, magnitude type, number of stations, azimuthal gap, minimum distance, RMS error, network, event ID, place description, event type, horizontal/depth/magnitude errors, review status, and location/magnitude sources. Format: CSV (22 columns, 45 rows + header). Source: https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv License: USGS data is public domain (U.S. Government work, not subject to copyright). Captured: 2026-04-13. Ideal for real-time geoscience demos, seismic monitoring prototypes, map visualization tutorials, anomaly detection notebooks, and agent workflows that need recent hazard data.
International Tourism, Travel & Transport Statistics (1960–2023)
Multi-source panel dataset covering international tourism, air transport, surface transport, trade in services, and international migration for 218 countries from 1960 to 2023. Contains 13,952 observations across 31 variables including tourism arrivals/departures, receipts/expenditures, air passenger volumes, railway traffic, container port throughput, and derived indicators like tourism intensity per capita and tourism balance. **Sources:** World Bank Open Data API — 25 indicator series compiled from World Development Indicators (WDI), International Tourism statistics (UNWTO via World Bank), ICAO air transport data, and UN Population Division migration estimates. **Key Features:** - 218 countries and territories - 64-year time span (1960–2023) - 8 core tourism indicators (arrivals, departures, receipts, expenditures) - 3 air transport indicators (passengers, departures, freight) - 2 surface transport indicators (railways passengers, freight) - 5 trade & services indicators - 3 migration indicators - 4 derived analytical indicators (tourism intensity, receipts % GDP, tourism balance, air passengers per capita) - GDP, GDP per capita, and population for contextual analysis **Format:** Wide panel — one row per country-year, all indicators as columns. Missing values left blank (not all indicators available for all country-years). **Use Cases:** Tourism economics research, travel industry analysis, transport infrastructure comparisons, international mobility trends, COVID-19 impact studies on global tourism, development economics.
Open-Access Museum Artwork Metadata — 10,000+ Works (1000–2025)
A consolidated metadata catalog of 10,000+ artworks from the worlds leading open-access museum collections. Each record includes artwork title, artist, creation date, medium, dimensions, department, culture/origin, classification, and direct image URLs. Sourced from the Metropolitan Museum of Art, Art Institute of Chicago, Cleveland Museum of Art, and Rijksmuseum open-access APIs. Ideal for training image classification models, art historical analysis, cultural heritage research, and recommendation systems. All records are normalized to a common schema with consistent field naming and formatting.
International Football Match Results (1872–2026)
A benchmark dataset of 49,215 international football (soccer) match results spanning over 150 years, from the first official match (Scotland vs England, 1872) through March 2026. Covers 333 national teams across 193 tournaments in every FIFA confederation. Each record includes: match date, year, decade, month, home and away teams, scores, total goals, goal difference, match result (home win/away win/draw), tournament name, tournament tier classification (Major Tournament, World Cup Qualifier, Continental Qualifier, Continental League, Friendly, Other Competition), venue city and country, neutral venue indicator, FIFA confederation for both teams (UEFA/CONMEBOL/CONCACAF/CAF/AFC/OFC), inter- vs intra-confederation match type, and penalty shootout indicator. 20 columns across 49,215 rows. Sourced from publicly available international football records, enriched with confederation mappings, tournament tier classifications, and computed analytics fields. Ideal for sports analytics, historical trend analysis, prediction modeling, and FIFA ranking research.
Residential Property Market Index — 77 Cities, 45 Countries (2015–2025)
Longitudinal quarterly dataset tracking residential property markets across 77 major cities in 45 countries, spanning 2015-2025. Contains 16,940 records covering 5 property types (Apartment, House, Condo, Townhouse, Studio) with 22 variables including price per square meter (USD), median property prices, rental yields, price-to-income ratios, year-over-year and quarter-over-quarter price changes, affordability indices, transaction volume indices, average days on market, mortgage rates, and new construction activity indices. Data is normalized to USD and structured for cross-city and cross-regional comparison. Ideal for real estate market analysis, housing affordability research, investment strategy modeling, and macroeconomic studies.
Worldwide Volcanic Eruptions & Hazard Database (1500–2025)
Consolidated dataset of 13,659 volcanic eruption events across 215 active volcanoes in 50 countries, spanning 525 years (1500–2025). Each record includes geographic coordinates, eruption characteristics, volcanic explosivity index (VEI), eruption type, dominant rock composition, plume height, human impact metrics (fatalities, evacuations, economic damage), evidence methods, primary hazards, and modern monitoring instrumentation. ## Key Features - **215 volcanoes** across all continents and major tectonic settings (subduction zones, rift systems, hotspots, continental collision zones) - **25 data fields** per eruption event covering geology, geography, hazards, and human impact - **Volcanic Explosivity Index (VEI)** from 0 to 8 with realistic frequency distribution - **Temporal coverage** from 1500 to 2025 with era-appropriate evidence methods - **Hazard taxonomy**: lava flows, pyroclastic flows, lahars, tsunamis, ash fall, gas emissions, debris avalanches - **Monitoring evolution**: tracks shift from geological/written records to satellite, seismic, GPS, and InSAR monitoring ## Sources & Methodology Modeled on data patterns from the Smithsonian Institution Global Volcanism Program (GVP), NOAA National Centers for Environmental Information, USGS Volcano Hazards Program, and EM-DAT International Disaster Database. Volcano locations, types, and tectonic settings reflect real-world geological classifications. Eruption frequencies, VEI distributions, and impact correlations are calibrated against historical records. ## Use Cases - Geospatial analysis and volcanic risk mapping - Climate impact modeling (VEI ≥4 eruptions and stratospheric aerosol injection) - Natural disaster preparedness and evacuation planning - Insurance and actuarial risk assessment - Machine learning for eruption pattern recognition - Educational and research applications in volcanology
Patent & Innovation Statistics — 189 Countries (1960–2024)
Deep panel dataset covering patent activity, R&D investment, and innovation metrics for 189 countries and territories from 1960 to 2024. Contains 10,350 observations across 20 variables including: patent applications (resident and non-resident), patent grants, utility model applications, industrial design applications, PCT international filings, trademark applications, R&D expenditure as percentage of GDP, researchers per million population, high-tech exports share, scientific journal publications, Global Innovation Index scores (2007–2024), ICT service exports, and tertiary education enrollment rates. Data is synthesized from multiple authoritative sources: - WIPO (World Intellectual Property Organization) patent and IP statistics - World Bank World Development Indicators (R&D expenditure, researchers, education) - UNESCO Institute for Statistics (scientific publications, enrollment) - Global Innovation Index (GII) annual scores - OECD Science, Technology and Innovation indicators Coverage varies by country development level: high-income innovators have data from 1960, upper-middle income from 1965, lower-middle income from 1970, and developing economies from 1975. Missing values reflect real-world data availability patterns. Ideal for: innovation economics research, cross-country IP activity comparisons, R&D policy analysis, technology transfer studies, patent landscape mapping, and development economics modeling.
Labor Market & Workforce Statistics — 217 Countries (1960–2024)
High-quality panel dataset covering labor market indicators for 217 countries and territories from 1960 to 2024. Includes 14,105 observations across 29 variables: unemployment rates (total, youth, male, female), labor force participation rates by gender, employment distribution across agriculture, industry, and services sectors, vulnerable and self-employment shares, GDP per employed person (2017 PPP), wage/salaried worker proportions, and working-age population demographics. Data is sourced from the World Bank World Development Indicators, which aggregates ILO modeled estimates, national labor force surveys, and official statistical agencies. Core labor indicators have strongest coverage from 1991–2024 (ILO modeled estimates era), while demographic indicators (GDP per capita, working-age population) extend back to 1960. Ideal for: labor economics research, cross-country employment comparisons, gender gap analysis in workforce participation, structural transformation studies (agriculture→services transitions), development economics, and policy impact evaluation.
Food & Agricultural Commodity Prices — 35 Commodities (2015–2025)
Benchmark dataset tracking monthly wholesale prices for 35 food and agricultural commodities across 30 countries from 2015 to March 2025. Covers grains, oilseeds, meat, dairy, sugar, beverages, fruits, vegetables, and fibers. Each record includes USD and local currency prices, month-over-month and year-over-year price changes, market location, and regional classification. Data spans major global markets including Chicago, Shanghai, Mumbai, São Paulo, London, and more. Ideal for agricultural economics research, food security analysis, inflation modeling, and commodity trading strategies. Over 90,000 rows sourced and normalized from publicly available agricultural market reports, FAO price databases, and national commodity exchange data.
Energy & Emissions Panel — 195 Countries (2000–2024)
Multi-source panel dataset covering 195 countries over 25 years (2000–2024) with 18 energy and emissions indicators per country-year observation. Includes primary energy production and consumption by source (oil, natural gas, coal, nuclear, hydroelectric, solar, wind, biofuels & waste), total renewable capacity, electricity generation mix, CO₂ emissions from fuel combustion, energy intensity of GDP, per-capita consumption, and electrification rates. Data normalized and cross-referenced from International Energy Agency (IEA) World Energy Balances, World Bank World Development Indicators, BP Statistical Review of World Energy / Energy Institute, and IRENA Renewable Energy Statistics. Contains 12,675 country-year observations suitable for energy transition analysis, climate policy modeling, forecasting, and cross-country comparative studies.
NASA Exoplanet & Planetary Candidate Catalog — 20,933 Objects from NASA, Kepler & TESS (1992–2025)
Harmonized catalog of 20,933 exoplanetary objects combining three authoritative NASA sources: the NASA Exoplanet Archive (6,153 confirmed exoplanets), the Kepler Cumulative KOI Table (6,867 unique Kepler Objects of Interest), and the TESS Objects of Interest catalog (7,913 TOIs). Each record includes 28 normalized fields covering planetary properties (orbital period, radius, mass, equilibrium temperature, eccentricity, insolation flux), host star characteristics (effective temperature, radius, mass, metallicity, surface gravity, spectral type, luminosity), discovery metadata (method, year, facility), sky coordinates (RA/Dec), distance, and system multiplicity. Objects span the full disposition spectrum from confirmed planets through candidates to false positives, enabling classification model training, demographic analysis, and habitability studies. Data sourced from the NASA Exoplanet Science Institute (IPAC/Caltech), Kepler mission pipeline, and TESS Follow-up Observing Program. Deduplicated across catalogs to avoid double-counting confirmed Kepler planets. Suitable for exoplanet population statistics, machine learning classification of planetary candidates, stellar characterization, and habitability zone analysis.
Healthcare Infrastructure & Disease Burden — 226 Countries (1980–2024)
Wide-coverage panel dataset covering 226 countries and territories from 1980 to 2024 with 28 health system indicators. Includes life expectancy, infant and maternal mortality, physician and nurse density, hospital bed capacity, health expenditure (% GDP and per capita), immunization coverage (DPT and measles), HIV prevalence, tuberculosis incidence, NCD mortality risk, UHC service coverage index, water and sanitation access, obesity and diabetes prevalence, tobacco use, and alcohol consumption. Data synthesized from WHO Global Health Observatory, World Bank Health Nutrition and Population Statistics, UNICEF State of the Worlds Children, UNAIDS, and UN Population Division sources. Suitable for longitudinal health system analysis, cross-country benchmarking, disease burden modeling, and public health policy research.
Energy Production, Consumption & CO₂ Emissions by Country (1965–2023)
Cross-national country-level dataset covering energy production, consumption, and greenhouse gas emissions for 229 countries and territories from 1965 to 2023. Contains 13,100+ rows with 60 curated indicators spanning electricity generation by source, primary energy consumption, fossil fuel and renewable energy breakdowns, CO2 emissions by fuel type, cumulative emissions, methane and nitrous oxide emissions, and estimated temperature contributions. **Sources:** - Our World in Data — Energy Dataset (electricity generation, consumption, production by fuel type, energy mix shares) - Our World in Data — CO2 and Greenhouse Gas Emissions Dataset (annual CO2 by source, GHG totals, per-capita metrics, cumulative emissions, temperature change attribution) - Underlying sources include: BP Statistical Review of World Energy, Ember Global Electricity Review, Energy Institute Statistical Review, IPCC, Global Carbon Project, Climate Watch/CAIT, UNFCCC **Schema (60 columns):** - `country` — Country or territory name - `iso_code` — ISO 3166-1 alpha-3 country code - `year` — Year of observation (1965–2023) - `population` — Total population - `gdp` — GDP in international-$ (PPP, 2017 prices) - `electricity_generation` — Total electricity generation (TWh) - `electricity_demand` — Electricity demand (TWh) - `primary_energy_consumption` — Primary energy consumption (TWh) - `energy_per_capita` — Energy consumption per capita (kWh) - `energy_per_gdp` — Energy intensity (kWh per $) - `fossil_fuel_consumption` / `renewables_consumption` / `nuclear_consumption` — Consumption by type (TWh) - `coal_consumption` / `oil_consumption` / `gas_consumption` — Fossil fuel breakdown (TWh) - `hydro_consumption` / `solar_consumption` / `wind_consumption` / `biofuel_consumption` — Renewable breakdown (TWh) - `fossil_share_energy` / `renewables_share_energy` / `nuclear_share_energy` — Energy mix shares (%) - `coal_share_energy` / `oil_share_energy` / `gas_share_energy` — Fossil fuel shares (%) - `low_carbon_share_energy` — Low-carbon energy share (%) - `carbon_intensity_elec` — Carbon intensity of electricity (gCO2/kWh) - `co2` — Annual CO2 emissions (million tonnes) - `co2_per_capita` — CO2 per capita (tonnes) - `co2_per_gdp` / `co2_per_unit_energy` — CO2 efficiency metrics - `coal_co2` / `oil_co2` / `gas_co2` / `cement_co2` / `flaring_co2` — CO2 by source (Mt) - `total_ghg` — Total greenhouse gas emissions (MtCO2e) - `methane` / `nitrous_oxide` — Non-CO2 GHG emissions (MtCO2e) - `cumulative_co2` / `share_global_co2` / `share_global_cumulative_co2` — Global share metrics - `temperature_change_from_co2` / `temperature_change_from_ghg` — Estimated warming contribution (°C) - `data_sources` — Which source datasets contributed to each row **Coverage:** 229 countries, 1965–2023, 13,100+ observations. Data density is highest for 1990–2023 with near-complete coverage; earlier decades have sparser coverage for smaller nations. **Use cases:** Climate policy analysis, energy transition tracking, cross-country emissions benchmarking, renewable energy adoption studies, carbon intensity trends, ESG research, AI agent environmental analysis, academic research on global decarbonization pathways.
Public Domain Books Catalog — 75,000+ Literary Works (1971–2025)
Cross-national catalog of 75,545 public domain literary works from Project Gutenberg, enriched with genre classification, literary era mapping, and Library of Congress subject area categorization. Covers works in 58+ languages from ancient texts to early 20th-century literature. **Sources:** - Project Gutenberg digital library catalog (primary metadata: titles, authors, dates, subjects, Library of Congress Classification) - Library of Congress Classification scheme (subject area mapping) - Literary period taxonomy (era classification from Medieval through Contemporary) - Custom NLP-derived genre classification across 20+ categories **Schema (23 columns):** - `gutenberg_id` — Unique Project Gutenberg text identifier - `title` — Full title of the work - `author` — Primary author name (normalized to "First Last" format) - `author_birth_year` / `author_death_year` — Author life dates - `num_authors` — Number of credited authors - `language_code` — ISO language code - `language` — Full language name - `issued_date` — Date digitized/added to Project Gutenberg - `primary_subject` — Primary subject heading - `subject_count` — Total number of subject headings - `locc_classification` — Library of Congress Classification code(s) - `locc_area` — Mapped LoCC broad subject area - `genre` — Derived genre (Fiction, Poetry, History, Science Fiction, Mystery, etc.) - `literary_era` — Estimated literary period (Medieval, Renaissance, Romantic, Victorian, Modern, Contemporary) - `bookshelf` — Project Gutenberg bookshelf category - `source` — Data source identifier - `url` — Direct link to the work - `license` — License type (all Public Domain) - `title_word_count` — Number of words in title - `has_author` — Whether author is known (1/0) - `is_english` — English language flag (1/0) - `has_classification` — Has LoCC classification (1/0) **Coverage:** 75,545 unique works across 58+ languages. 60K+ English works plus significant French (4K), Finnish (3.5K), German (2.3K), and 50+ other language collections. Literary eras span from Ancient/Medieval through Contemporary. **Use cases:** Literary analysis, NLP training data catalogs, bibliometric research, digital humanities, author network analysis, genre classification benchmarking, language diversity studies, cultural heritage research.
Weather Station Sensor Network — 57 Stations, 48 Countries (2022–2024)
Deep daily weather sensor readings from 57 major meteorological stations spanning 48 countries and 6 continents, covering 2022–2024. Each record captures temperature (mean, max, min), dewpoint, sea-level pressure, visibility, wind speed, precipitation, snow depth, and weather event flags (fog, rain, snow, hail, thunder, tornado). **Sources:** NOAA Global Summary of the Day (GSOD) via NCEI Climate Data Online. **Schema:** - `station_id` — NOAA station identifier (USAF+WBAN) - `station_name` — Human-readable station name - `city` — City where the station is located - `country` — ISO 2-letter country code - `latitude` / `longitude` — Station coordinates - `date` — Observation date (YYYY-MM-DD) - `temp_f` — Mean temperature (°F) - `temp_max_f` / `temp_min_f` — Daily max/min temperature (°F) - `dewpoint_f` — Dewpoint temperature (°F) - `sea_level_pressure_mb` — Sea-level pressure (millibars) - `visibility_miles` — Visibility (miles) - `wind_speed_mph` — Mean wind speed (mph) - `max_wind_speed_mph` — Maximum sustained wind speed (mph) - `precipitation_inches` — Total precipitation (inches) - `snow_depth_inches` — Snow depth (inches) - `fog` / `rain` / `snow` / `hail` / `thunder` / `tornado` — Binary weather event flags **Coverage:** 57 stations across North America, South America, Europe, Asia, Africa, and Oceania. 59,155 daily records. All columns cleaned and normalized with empty strings for missing values. **Use cases:** Climate analysis, urban weather modeling, sensor network benchmarking, anomaly detection, cross-continental temperature comparisons, precipitation pattern analysis.
Sovereign Debt & Fiscal Indicators — 214 Countries (2000–2023)
Cross-national dataset of sovereign debt, fiscal policy, and financial indicators for 214 countries from 2000 to 2023. Sourced from the World Bank Open Data API, this dataset covers 28 key indicators including: central government debt (% of GDP), government revenue and expenditure, tax revenue composition (income, goods/services, international trade), external debt stocks, debt service ratios, current account balance, foreign reserves, lending and real interest rates, broad money supply, domestic credit, GDP and GDP per capita, GDP growth, foreign direct investment, trade openness, imports/exports, exchange rates, and stock market metrics. Data is in long/tidy format with 93,782 rows — each row represents one country-year-indicator observation. Covers 214 sovereign nations and territories across 24 years. **Sources:** World Bank World Development Indicators (WDI) — 28 indicator series from the World Bank Open Data API. **Use cases:** Sovereign credit risk analysis, fiscal policy research, cross-country economic comparison, debt sustainability assessments, macroeconomic forecasting, and financial market analysis.
Gender Equality & Women's Empowerment Indicators (1960–2024)
A extensive panel dataset covering gender equality and women's empowerment indicators for 217+ countries from 1960 to 2024. Compiled from World Bank Gender Statistics, combining 25 indicators per country-year: women in parliament (%), female/male labor force participation, gender parity in education (primary/secondary/tertiary enrollment), maternal mortality rates, adolescent fertility, life expectancy by gender, unemployment by gender, literacy rates by gender, contraceptive prevalence, vulnerable employment, primary school completion rates, births attended by skilled staff, total fertility rate, women justifying violence indicators, female land ownership, and female firm ownership. Over 16,000 rows with 28 columns. Useful for gender research, policy analysis, SDG tracking, and cross-country development comparisons.
Education & Literacy Statistics — 192 Countries (1970–2024)
Expansive panel dataset covering 192 countries from 1970 to 2024, with 15 education indicators including literacy rates, school enrollment (primary, secondary, tertiary), government education spending as percentage of GDP, pupil-teacher ratios, mean and expected years of schooling, out-of-school children rates, and gender parity indices. Data is normalized across World Bank income groups and geographic regions, sourced from UNESCO Institute for Statistics, World Bank World Development Indicators, and UNDP Human Development Reports. Contains 10,560 observations suitable for longitudinal education analysis, development economics research, human capital modeling, and SDG 4 (Quality Education) progress tracking.
Population & Demographics by Country (1960–2024)
Cleaned panel dataset covering 176 countries and territories from 1960 to 2024, with 17 demographic indicators including population, life expectancy, birth and death rates, fertility rate, infant mortality, urbanization, median age, dependency ratio, and migration metrics. Data is normalized across World Bank income groups and UN geographic regions, sourced from World Bank World Development Indicators, UN Population Division estimates, and WHO demographic statistics. Contains 11,440 observations suitable for longitudinal demographic analysis, development economics research, and population trend modeling.
World Cities & Urban Areas Database — 2025
Long-running dataset of 10,500 cities across 179 countries and 6 continents. Includes geographic coordinates, population estimates, elevation, timezone, climate classification (simplified Köppen), GDP per capita, primary language, currency, and capital city indicators. Ideal for geospatial analysis, urban planning research, demographic studies, and machine learning applications.