Skip to content

Auto generate cspell dictionary of github members #3162

@thompson-tomo

Description

@thompson-tomo

Having just made an attempt to migrate the spec repo to cspell from misspell one thing emerged was that I was often adding people's names to the dictionary.

It would be great if there was a single dictionary of all github member organisations which include their username, first name & last name.

Doing some searching I found the following script

/**
 * Generate a CSpell dictionary from GitHub members
 * 
 * Usage:
 *   1. Install dependencies: npm install node-fetch@2
 *   2. Run: node generate-cspell-dict.js <org_or_owner> [repo]
 *   3. Output: github-members.txt (CSpell dictionary file)
 * 
 * Notes:
 *   - If only <org_or_owner> is provided, fetches organization members.
 *   - If <repo> is provided, fetches repository collaborators.
 *   - Set GITHUB_TOKEN in environment for higher rate limits.
 */

const fs = require('fs');
const fetch = require('node-fetch');

const GITHUB_API = 'https://api.github.com';
const token = process.env.GITHUB_TOKEN || null;

// Validate CLI arguments
if (process.argv.length < 3) {
    console.error('Usage: node generate-cspell-dict.js <org_or_owner> [repo]');
    process.exit(1);
}

const owner = process.argv[2];
const repo = process.argv[3] || null;

// GitHub API request helper
async function githubRequest(url) {
    const headers = { 'User-Agent': 'cspell-dict-generator' };
    if (token) headers['Authorization'] = `token ${token}`;

    const res = await fetch(url, { headers });
    if (!res.ok) {
        throw new Error(`GitHub API error ${res.status}: ${res.statusText}`);
    }
    return res.json();
}

// Fetch members or collaborators
async function fetchMembers() {
    let url;
    if (repo) {
        url = `${GITHUB_API}/repos/${owner}/${repo}/collaborators?per_page=100`;
    } else {
        url = `${GITHUB_API}/orgs/${owner}/members?per_page=100`;
    }

    let members = [];
    let page = 1;
    while (true) {
        const data = await githubRequest(`${url}&page=${page}`);
        if (data.length === 0) break;
        members.push(...data.map(m => m.login));
        page++;
    }
    return [...new Set(members)]; // Remove duplicates
}

// Save dictionary file
function saveDictionary(words) {
    const filename = 'github-members.txt';
    fs.writeFileSync(filename, words.join('\n') + '\n', 'utf8');
    console.log(`✅ Dictionary saved to ${filename} with ${words.length} entries.`);
}

(async () => {
    try {
        const members = await fetchMembers();
        saveDictionary(members);
    } catch (err) {
        console.error('❌ Error:', err.message);
        process.exit(1);
    }
})();

Ideally this was executed periodically and if changes occur a bot would create a pr.

Note this would need to be manually synced out to the repos.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions