Public Domain Books Catalog — 75,000+ Literary Works (1971–2025)
About this data
Cross-national catalog of 75,545 public domain literary works from Project Gutenberg, enriched with genre classification, literary era mapping, and Library of Congress subject area categorization. Covers works in 58+ languages from ancient texts to early 20th-century literature. **Sources:** - Project Gutenberg digital library catalog (primary metadata: titles, authors, dates, subjects, Library of Congress Classification) - Library of Congress Classification scheme (subject area mapping) - Literary period taxonomy (era classification from Medieval through Contemporary) - Custom NLP-derived genre classification across 20+ categories **Schema (23 columns):** - `gutenberg_id` — Unique Project Gutenberg text identifier - `title` — Full title of the work - `author` — Primary author name (normalized to "First Last" format) - `author_birth_year` / `author_death_year` — Author life dates - `num_authors` — Number of credited authors - `language_code` — ISO language code - `language` — Full language name - `issued_date` — Date digitized/added to Project Gutenberg - `primary_subject` — Primary subject heading - `subject_count` — Total number of subject headings - `locc_classification` — Library of Congress Classification code(s) - `locc_area` — Mapped LoCC broad subject area - `genre` — Derived genre (Fiction, Poetry, History, Science Fiction, Mystery, etc.) - `literary_era` — Estimated literary period (Medieval, Renaissance, Romantic, Victorian, Modern, Contemporary) - `bookshelf` — Project Gutenberg bookshelf category - `source` — Data source identifier - `url` — Direct link to the work - `license` — License type (all Public Domain) - `title_word_count` — Number of words in title - `has_author` — Whether author is known (1/0) - `is_english` — English language flag (1/0) - `has_classification` — Has LoCC classification (1/0) **Coverage:** 75,545 unique works across 58+ languages. 60K+ English works plus significant French (4K), Finnish (3.5K), German (2.3K), and 50+ other language collections. Literary eras span from Ancient/Medieval through Contemporary. **Use cases:** Literary analysis, NLP training data catalogs, bibliometric research, digital humanities, author network analysis, genre classification benchmarking, language diversity studies, cultural heritage research.
Schema
| Name | Type | Description |
|---|---|---|
| gutenberg_id | string | |
| title | string | |
| author | string | |
| author_birth_year | string | |
| author_death_year | string | |
| num_authors | string | |
| language_code | string | |
| language | string | |
| issued_date | string | |
| primary_subject | string | |
| subject_count | string | |
| locc_classification | string | |
| locc_area | string | |
| genre | string | |
| literary_era | string | |
| bookshelf | string | |
| source | string | |
| url | string | |
| license | string | |
| title_word_count | string | |
| has_author | string | |
| is_english | string | |
| has_classification | string |
Sample Data
Preview a sample of the data before downloading.
For AI Agents
# 1. Add to your agent's MCP config (claude_desktop_config.json or similar):
{
"mcpServers": {
"databazaar": { "command": "npx", "args": ["databazaar-mcp"] }
}
}
# 2. Your agent can then call:
search_datasets({ query: "Public Domain Books Catalog — " })
// Found: 9e20a575-5493-47d9-b71b-ad0dc12be01a
get_download_url({ dataset_id: "9e20a575-5493-47d9-b71b-ad0dc12be01a" }) // free — no API key needed# Free dataset — no API key required: curl https://api.databazaar.io/datasets/9e20a575-5493-47d9-b71b-ad0dc12be01a/download-url