<SYSTEM>Complete libragen documentation with all content</SYSTEM>

# API Reference

> TypeScript API for @libragen/core

The `@libragen/core` package provides programmatic access to libragen functionality.

## Installation

```bash
npm install --save-exact @libragen/core
```

## Quick Example

### Building a Library

```typescript
import { Builder } from '@libragen/core';


const builder = new Builder();


// Build from local directory
const result = await builder.build('./docs', {
  name: 'my-docs',
  version: '1.0.0',
  description: 'My documentation library',
});


console.log(`Built: ${result.outputPath}`);
console.log(`Chunks: ${result.stats.chunkCount}`);
```

### Searching a Library

```typescript
import { Embedder, VectorStore, Searcher } from '@libragen/core';


const embedder = new Embedder();
await embedder.initialize();


const store = new VectorStore('./my-library.libragen');
store.initialize();


const searcher = new Searcher(embedder, store);
const results = await searcher.search({ query: 'How do I authenticate?', k: 5 });


for (const result of results) {
  console.log(`[${result.score.toFixed(2)}] ${result.source}`);
  console.log(result.content);
}
```

## Classes

### `Builder`

High-level API for building `.libragen` libraries from source files or git repositories.

```typescript
import { Builder } from '@libragen/core';


const builder = new Builder();


// Build from local source
const result = await builder.build('./src', {
  name: 'my-library',
  version: '1.0.0',
  description: 'My library',
  chunkSize: 1000,
  chunkOverlap: 100,
});


// Build from git repository
const gitResult = await builder.build('https://github.com/user/repo', {
  gitRef: 'main',
  include: ['docs/**/*.md'],
});


// With progress callback
await builder.build('./docs', { name: 'my-docs' }, (progress) => {
  console.log(`${progress.phase}: ${progress.message}`);
});
```

#### Build Options

| Option          | Type                            | Default   | Description                               |
| --------------- | ------------------------------- | --------- | ----------------------------------------- |
| `output`        | string                          | —         | Output path for .libragen file            |
| `name`          | string                          | —         | Library name                              |
| `version`       | string                          | `'0.1.0'` | Library version                           |
| `description`   | string                          | —         | Short description                         |
| `chunkSize`     | number                          | `1000`    | Target chunk size in characters           |
| `chunkOverlap`  | number                          | `100`     | Overlap between chunks                    |
| `include`       | string\[]                       | —         | Glob patterns to include                  |
| `exclude`       | string\[]                       | —         | Glob patterns to exclude                  |
| `gitRef`        | string                          | —         | Git branch/tag/commit                     |
| `license`       | string\[]                       | —         | SPDX license identifiers                  |
| `noAstChunking` | boolean                         | `false`   | Disable AST-aware chunking for code files |
| `contextMode`   | `'none' \| 'minimal' \| 'full'` | `'full'`  | Context mode for AST chunking             |

#### Build Result

```typescript
interface BuildResult {
  outputPath: string;      // Absolute path to .libragen file
  metadata: LibraryMetadata;
  stats: {
    chunkCount: number;
    sourceCount: number;
    fileSize: number;
    embedDuration: number;
    chunksPerSecond: number;
  };
  git?: {
    commitHash: string;
    ref: string;
    detectedLicense?: { identifier: string; confidence: string };
  };
}
```

***

### `Embedder`

Generates vector embeddings from text using a local transformer model. Implements the `IEmbedder` interface.

```typescript
import { Embedder } from '@libragen/core';


const embedder = new Embedder({
  model: 'Xenova/bge-small-en-v1.5', // default
  quantization: 'q8', // quantized for speed
});


await embedder.initialize();


// Generate embedding for a single text
const embedding = await embedder.embed('Hello world');
// Returns: Float32Array(384)


// Generate embeddings for multiple texts (batched)
const embeddings = await embedder.embedBatch([
  'First document',
  'Second document',
]);
// Returns: Float32Array(384)[]


// Clean up when done
await embedder.dispose();
```

#### Constructor Options

| Option         | Type                               | Default                    | Description          |
| -------------- | ---------------------------------- | -------------------------- | -------------------- |
| `model`        | string                             | `Xenova/bge-small-en-v1.5` | HuggingFace model ID |
| `quantization` | `'fp32' \| 'fp16' \| 'q8' \| 'q4'` | `'q8'`                     | Model precision      |

***

### `IEmbedder` Interface

Interface for custom embedding implementations. Use this to integrate external embedding services like OpenAI, Cohere, or other providers.

```typescript
import type { IEmbedder } from '@libragen/core';


class OpenAIEmbedder implements IEmbedder {
  dimensions = 1536; // text-embedding-3-small


  async initialize() {
    // Setup OpenAI client
  }


  async embed(text: string): Promise<Float32Array> {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: text,
    });
    return new Float32Array(response.data[0].embedding);
  }


  async embedBatch(texts: string[]): Promise<Float32Array[]> {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: texts,
    });
    return response.data.map(d => new Float32Array(d.embedding));
  }


  async dispose() {
    // Cleanup if needed
  }
}


// Use with Builder
const builder = new Builder({ embedder: new OpenAIEmbedder() });


// Use with Searcher
const searcher = new Searcher(new OpenAIEmbedder(), store);
```

#### Interface Methods

| Method              | Description                                        |
| ------------------- | -------------------------------------------------- |
| `dimensions`        | The dimensionality of embedding vectors (readonly) |
| `initialize()`      | Initialize the embedder (called before embedding)  |
| `embed(text)`       | Embed a single text string                         |
| `embedBatch(texts)` | Embed multiple texts                               |
| `dispose()`         | Clean up resources                                 |

***

### `VectorStore`

SQLite-based storage for vectors, metadata, and full-text search.

```typescript
import { VectorStore } from '@libragen/core';


// Open existing library
const store = new VectorStore('./my-library.libragen');


// Create new library
const store = new VectorStore('./new-library.libragen', {
  create: true,
  metadata: {
    name: 'my-library',
    description: 'My documentation',
    contentVersion: '1.0.0',
  },
});


// Add chunks
await store.addChunks([
  {
    content: 'Document content here...',
    source: 'docs/getting-started.md',
    embedding: await embedder.embed('Document content here...'),
  },
]);


// Get metadata
const meta = store.getMetadata();
console.log(meta.name, meta.chunkCount);


// Close when done
store.close();
```

#### Constructor Options

| Option     | Type    | Default | Description                               |
| ---------- | ------- | ------- | ----------------------------------------- |
| `create`   | boolean | `false` | Create new database if doesn’t exist      |
| `metadata` | object  | —       | Library metadata (required when creating) |

#### Methods

| Method                       | Description                         |
| ---------------------------- | ----------------------------------- |
| `addChunks(chunks)`          | Add document chunks with embeddings |
| `getMetadata()`              | Get library metadata                |
| `vectorSearch(embedding, k)` | Search by vector similarity         |
| `ftsSearch(query, k)`        | Full-text search                    |
| `close()`                    | Close database connection           |

***

### `Searcher`

Hybrid search combining vector similarity and full-text search.

```typescript
import { Searcher } from '@libragen/core';


const searcher = new Searcher(embedder, store);


const results = await searcher.search({
  query: 'authentication methods',
  k: 10,
  contentVersion: '2.0.0', // optional filter
});


for (const result of results) {
  console.log({
    score: result.score,
    source: result.source,
    content: result.content,
  });
}
```

#### Search Options

| Option           | Type    | Default | Description                                       |
| ---------------- | ------- | ------- | ------------------------------------------------- |
| `query`          | string  | —       | Search query text (required)                      |
| `k`              | number  | `10`    | Number of results                                 |
| `hybridAlpha`    | number  | `0.5`   | Balance between vector (1) and keyword (0) search |
| `rerank`         | boolean | `false` | Apply reranking for better results                |
| `contentVersion` | string  | —       | Filter by version                                 |

***

### `Chunker`

Split documents into chunks for indexing.

```typescript
import { Chunker } from '@libragen/core';


const chunker = new Chunker({
  chunkSize: 512,
  chunkOverlap: 50,
});


const chunks = chunker.chunk('Long document content...', {
  source: 'docs/guide.md',
});


// Returns: { content: string, source: string }[]
```

#### Constructor Options

| Option         | Type   | Default | Description                 |
| -------------- | ------ | ------- | --------------------------- |
| `chunkSize`    | number | `512`   | Target chunk size in tokens |
| `chunkOverlap` | number | `50`    | Overlap between chunks      |

***

### `CodeChunker`

AST-aware code chunking for supported programming languages. Produces higher-quality embeddings by respecting language boundaries and including semantic context.

```typescript
import { CodeChunker } from '@libragen/core';


const chunker = new CodeChunker({
  maxChunkSize: 1500,
  contextMode: 'full',
  overlapLines: 0,
});


// Check if a file is supported
CodeChunker.isSupported('app.ts');  // true
CodeChunker.isSupported('README.md');  // false


// Detect language from file path
CodeChunker.detectLanguage('app.ts');  // 'typescript'


// Get all supported extensions
CodeChunker.getSupportedExtensions();  // ['.ts', '.tsx', '.js', ...]


// Chunk code with semantic context
const chunks = await chunker.chunkText(code, 'app.ts');


// Each chunk includes:
// - content: raw code
// - embeddingContent: contextualized text for embeddings
// - metadata.codeContext: scope chain, entities, imports, siblings
```

#### Supported Languages

| Language   | Extensions                    |
| ---------- | ----------------------------- |
| TypeScript | `.ts`, `.tsx`, `.mts`, `.cts` |
| JavaScript | `.js`, `.jsx`, `.mjs`, `.cjs` |
| Python     | `.py`, `.pyi`                 |
| Rust       | `.rs`                         |
| Go         | `.go`                         |
| Java       | `.java`                       |

#### Constructor Options

| Option         | Type                            | Default  | Description                      |
| -------------- | ------------------------------- | -------- | -------------------------------- |
| `maxChunkSize` | number                          | `1500`   | Maximum chunk size in characters |
| `contextMode`  | `'none' \| 'minimal' \| 'full'` | `'full'` | How much context to include      |
| `overlapLines` | number                          | `0`      | Lines to overlap between chunks  |

#### Context Modes

* **`full`** — Include scope chain, imports, sibling entities, and signatures
* **`minimal`** — Include only scope chain and entity signatures
* **`none`** — Raw code only, no additional context

#### Methods

| Method                            | Description                                    |
| --------------------------------- | ---------------------------------------------- |
| `chunkText(content, filePath)`    | Chunk code string with AST awareness           |
| `chunkFile(filePath)`             | Chunk a file from the filesystem               |
| `chunkSourceFiles(files)`         | Chunk multiple source files                    |
| `tryChunkText(content, filePath)` | Like `chunkText` but returns `null` on failure |

#### Static Methods

| Method                     | Description                              |
| -------------------------- | ---------------------------------------- |
| `isSupported(filePath)`    | Check if file type supports AST chunking |
| `detectLanguage(filePath)` | Get language from file extension         |
| `getSupportedExtensions()` | Get all supported file extensions        |

***

## Configuration Helpers

```typescript
import {
  getLibragenHome,
  getDefaultLibraryDir,
  getModelCacheDir,
} from '@libragen/core';


// Get base config directory
const home = getLibragenHome();
// macOS: ~/Library/Application Support/libragen


// Get default library storage location
const libDir = getDefaultLibraryDir();
// macOS: ~/Library/Application Support/libragen/libraries


// Get model cache directory
const modelDir = getModelCacheDir();
// macOS: ~/Library/Application Support/libragen/models
```

Override with environment variables:

* `LIBRAGEN_HOME` - Base directory
* `LIBRAGEN_MODEL_CACHE` - Model cache location

> **Tip:** Run `libragen config` to see current paths and active environment variables.

***

## Types

### `SearchResult`

```typescript
interface SearchResult {
  /** Relevance score (higher = more relevant) */
  score: number;


  /** Source file path */
  source: string;


  /** Chunk content */
  content: string;


  /** Content version if set */
  contentVersion?: string;
}
```

### `LibraryMetadata`

```typescript
interface LibraryMetadata {
  name: string;
  description?: string;
  contentVersion?: string;
  chunkCount: number;
  createdAt: string;
}
```

### `Chunk`

```typescript
interface Chunk {
  content: string;
  source: string;
  embedding: Float32Array;
  metadata?: Record<string, unknown>;
}
```

***

## Library Management

### `LibraryManager`

Manages installed libraries across multiple locations (project-local and global).

```typescript
import { LibraryManager } from '@libragen/core';


// Default: auto-detect .libragen/libraries in cwd + global directory
const manager = new LibraryManager();


// Or use explicit paths only (no global, no auto-detection)
const customManager = new LibraryManager({
  paths: ['.libragen/libraries', '/shared/libs'],
});


// List installed libraries
const libraries = await manager.listInstalled();
for (const lib of libraries) {
  console.log(`${lib.name} v${lib.version} [${lib.location}]`);
}


// Find a specific library (searches paths in order)
const lib = await manager.find('my-lib');


// Install a library (defaults to global directory)
await manager.install('./my-lib.libragen', { force: true });


// Uninstall
await manager.uninstall('my-lib');
```

#### Constructor Options

| Option          | Type      | Default         | Description                                                |
| --------------- | --------- | --------------- | ---------------------------------------------------------- |
| `paths`         | string\[] | —               | Explicit paths to use (excludes global and auto-detection) |
| `autoDetect`    | boolean   | `true`          | Auto-detect `.libragen/libraries` in cwd                   |
| `includeGlobal` | boolean   | `true`          | Include global directory                                   |
| `cwd`           | string    | `process.cwd()` | Current working directory for auto-detection               |

#### Methods

| Method                      | Description                                   |
| --------------------------- | --------------------------------------------- |
| `listInstalled()`           | List all installed libraries across all paths |
| `find(name)`                | Find a library by name (first path wins)      |
| `install(source, options?)` | Install a library from file or URL            |
| `uninstall(name)`           | Remove an installed library                   |
| `getPrimaryDirectory()`     | Get the primary install location              |

***

## Update Checking

Utilities for checking and applying library updates from collections.

```typescript
import {
  LibraryManager,
  CollectionClient,
  checkForUpdate,
  findUpdates,
  performUpdate,
} from '@libragen/core';


const manager = new LibraryManager();
const client = new CollectionClient();
await client.loadConfig();


// Get installed libraries
const installed = await manager.listInstalled();


// Find all available updates
const updates = await findUpdates(installed, client, { force: false });


for (const update of updates) {
  console.log(`${update.name}: ${update.currentVersion} → ${update.newVersion}`);


  // Apply the update
  await performUpdate(update, manager);
}
```

### `UpdateCandidate`

```typescript
interface UpdateCandidate {
  name: string;
  currentVersion: string;
  currentContentVersion?: string;
  newVersion: string;
  newContentVersion?: string;
  source: string;  // Download URL
  location?: 'global' | 'project';
}
```

***

## Sources

### `FileSource`

Read files from the local filesystem.

```typescript
import { FileSource } from '@libragen/core';


const source = new FileSource();


const files = await source.getFiles({
  paths: ['./src', './docs'],
  patterns: ['**/*.ts', '**/*.md'],
  ignore: ['**/node_modules/**'],
  maxFileSize: 1024 * 1024, // 1MB
});
```

### `GitSource`

Clone and read files from git repositories. Automatically detects licenses.

```typescript
import { GitSource } from '@libragen/core';


const source = new GitSource();


const result = await source.getFiles({
  url: 'https://github.com/user/repo',
  ref: 'main',
  depth: 1,
  patterns: ['**/*.ts'],
});


console.log(result.files);           // Array of source files
console.log(result.commitHash);      // Full commit SHA
console.log(result.detectedLicense); // { identifier: 'MIT', confidence: 'high' }


// Clean up temp directory for remote repos
if (result.tempDir) {
  await source.cleanup(result.tempDir);
}
```

### Git URL Utilities

Helper functions for working with git URLs.

```typescript
import { isGitUrl, parseGitUrl, getAuthToken, detectGitProvider } from '@libragen/core';


// Check if a string is a git URL
isGitUrl('https://github.com/user/repo');  // true
isGitUrl('/local/path');  // false


// Parse a git URL into components
const parsed = parseGitUrl('https://github.com/vercel/next.js/tree/main/docs');
// { repoUrl: 'https://github.com/vercel/next.js', ref: 'main', path: 'docs' }


// Get auth token for a repo (checks environment variables)
const token = getAuthToken('https://github.com/user/repo');
// Checks GITHUB_TOKEN for GitHub; GITLAB_TOKEN for GitLab, etc.


// Detect git provider from URL
detectGitProvider('https://github.com/user/repo');  // 'github'
detectGitProvider('https://gitlab.com/user/repo');  // 'gitlab'
```

### `LicenseDetector`

Detect SPDX license identifiers from license files.

```typescript
import { LicenseDetector } from '@libragen/core';


const detector = new LicenseDetector();


// Detect from file content
const result = detector.detectFromContent(licenseText);
// { identifier: 'MIT', confidence: 'high' }


// Detect from a directory
const detected = await detector.detectFromDirectory('./my-project');
// { identifier: 'Apache-2.0', file: 'LICENSE', confidence: 'high' }
```

**Supported licenses:** MIT, Apache-2.0, GPL-3.0, GPL-2.0, LGPL-3.0, LGPL-2.1, BSD-3-Clause, BSD-2-Clause, ISC, Unlicense, MPL-2.0, CC0-1.0, AGPL-3.0

***

## Migrations

Schema migration utilities for upgrading library files between versions.

```typescript
import {
  MigrationRunner,
  CURRENT_SCHEMA_VERSION,
  MigrationRequiredError,
  SchemaVersionError,
} from '@libragen/core';


const runner = new MigrationRunner();


try {
  const result = await runner.migrateIfNeeded('./library.libragen');
  if (result.migrated) {
    console.log(`Migrated from v${result.fromVersion} to v${result.toVersion}`);
  }
} catch (e) {
  if (e instanceof MigrationRequiredError) {
    console.log('Migration required but not run (use force option)');
  }
  if (e instanceof SchemaVersionError) {
    console.log('Unsupported schema version');
  }
}


console.log(`Current schema version: ${CURRENT_SCHEMA_VERSION}`);
```

***

## Utilities

### `formatBytes`

Format bytes into a human-readable string.

```typescript
import { formatBytes } from '@libragen/core';


formatBytes(1536);      // "1.5 KB"
formatBytes(1048576);   // "1 MB"
formatBytes(0);         // "0 Bytes"
```

### `formatDuration`

Format seconds into a human-readable duration.

```typescript
import { formatDuration } from '@libragen/core';


formatDuration(45.5);   // "45.5s"
formatDuration(90);     // "1m 30s"
formatDuration(120);    // "2m"
```

### `deriveGitLibraryName`

Derive a library name from a git repository URL.

```typescript
import { deriveGitLibraryName } from '@libragen/core';


deriveGitLibraryName('https://github.com/vercel/next.js.git');  // "vercel-next.js"
deriveGitLibraryName('https://github.com/microsoft/typescript'); // "microsoft-typescript"
```

### Time Estimation

Estimate embedding time based on system capabilities.

```typescript
import { getSystemInfo, estimateEmbeddingTime, formatSystemInfo } from '@libragen/core';


// Get system information
const info = getSystemInfo();
console.log(info.cpuModel);      // "Apple M2 Pro"
console.log(info.cpuCores);      // 12
console.log(info.totalMemoryGB); // 32


// Estimate time for embedding chunks
const estimate = estimateEmbeddingTime(500);
console.log(estimate.estimatedSeconds);  // 10
console.log(estimate.formattedTime);     // "10s"
console.log(estimate.chunksPerSecond);   // 50


// Format system info for display
console.log(formatSystemInfo(info));  // "Apple M2 Pro (12 cores)"
```

The estimation accounts for different CPU types:

* Apple Silicon (M1-M4): 35-70 chunks/second
* Intel/AMD x64: 10-30 chunks/second
* ARM Linux: 8-20 chunks/second

***

## Acknowledgments

libragen uses the following open-source models:

* **[BGE-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)** — Embedding model by BAAI (MIT License)
* **[mxbai-rerank-xsmall-v1](https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1)** — Reranking model by Mixedbread (Apache-2.0)

If you use libragen in academic work, please cite the underlying models:

```bibtex
@misc{bge_embedding,
  title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
  author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
  year={2023},
  eprint={2309.07597},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}


@online{rerank2024mxbai,
  title={Boost Your Search With The Crispy Mixedbread Rerank Models},
  author={Aamir Shakir and Darius Koenig and Julius Lipp and Sean Lee},
  year={2024},
  url={https://www.mixedbread.ai/blog/mxbai-rerank-v1},
}
```

---

# Building Libraries

> Create optimized RAG libraries from your documentation

This guide covers advanced options for building high-quality RAG libraries.

## Basic Build

```bash
libragen build ./docs --name my-docs
```

This processes all markdown, text, and HTML files, generating embeddings and a full-text search index.

## Supported File Types

| Extension                     | Format                | Chunking   |
| ----------------------------- | --------------------- | ---------- |
| `.ts`, `.tsx`, `.mts`, `.cts` | TypeScript            | AST-aware  |
| `.js`, `.jsx`, `.mjs`, `.cjs` | JavaScript            | AST-aware  |
| `.py`, `.pyi`                 | Python                | AST-aware  |
| `.rs`                         | Rust                  | AST-aware  |
| `.go`                         | Go                    | AST-aware  |
| `.java`                       | Java                  | AST-aware  |
| `.md`, `.mdx`                 | Markdown              | Text-based |
| `.txt`                        | Plain text            | Text-based |
| `.html`                       | HTML (text extracted) | Text-based |

Customize with `--include`:

```bash
libragen build ./docs --name my-docs --include "**/*.md" "**/*.rst" "**/*.txt"
```

## AST-Aware Code Chunking

For supported code files (TypeScript, JavaScript, Python, Rust, Go, Java), libragen uses AST-aware chunking by default. This produces higher-quality embeddings by:

* **Respecting language boundaries** — Chunks align with functions, classes, and methods instead of arbitrary character positions
* **Including semantic context** — Each chunk includes scope chain, imports, and sibling entities
* **Contextualizing for embeddings** — A special contextualized text is generated that includes file path, scope, and signatures

### Context Modes

Control how much context is included with each chunk:

```bash
# Full context (default) - includes scope chain, imports, siblings, signatures
libragen build ./src --name my-code --context-mode full


# Minimal context - only scope chain and entity signatures
libragen build ./src --name my-code --context-mode minimal


# No context - raw code only
libragen build ./src --name my-code --context-mode none
```

### Disabling AST Chunking

If you prefer text-based chunking for code files:

```bash
libragen build ./src --name my-code --no-ast-chunking
```

Non-code files (Markdown, JSON, text, etc.) always use text-based chunking regardless of this setting.

## Text-Based Chunking

Documents are split into chunks for indexing. The chunking parameters affect search quality. The defaults work well for most cases, but you can tune them for specific content types.

### When to Adjust Chunk Settings

**Use defaults** (chunk-size: 1000, overlap: 100) when:

* You have typical documentation with mixed content types
* You’re not sure what settings to use
* Your docs have clear headings and sections

**Use smaller chunks** (256-384) when:

* Content is dense with distinct concepts (API references, glossaries)
* Users ask very specific questions (“What does X parameter do?”)
* Each paragraph covers a different topic

**Use larger chunks** (768-1024) when:

* Content is narrative or tutorial-style
* Context matters more than precision (architecture guides, explanations)
* You want results to include more surrounding context

### Chunk Size

Target size in characters.

```bash
# Smaller chunks = more precise, more results
libragen build ./docs --name my-docs --chunk-size 256


# Larger chunks = more context per result
libragen build ./docs --name my-docs --chunk-size 1024
```

**By content type:**

| Content Type        | Chunk Size | Why                                    |
| ------------------- | ---------- | -------------------------------------- |
| API reference       | 256-384    | Each function/method is self-contained |
| Configuration docs  | 384-512    | Options are usually independent        |
| Tutorials           | 512-768    | Steps need some context                |
| Architecture guides | 768-1024   | Concepts span multiple paragraphs      |
| FAQs                | 256-384    | Each Q\&A is standalone                |

### Chunk Overlap

Characters shared between adjacent chunks. Overlap helps when:

* Important context spans chunk boundaries
* You have long sentences or paragraphs
* Search queries might match text split across chunks

```bash
libragen build ./docs --name my-docs --chunk-overlap 100
```

**Guidelines:**

* **10% overlap** (default): Good for well-structured docs with clear sections
* **15-20% overlap**: Better for prose-heavy content or long paragraphs
* **25%+ overlap**: Use when search quality suffers from split context (rare)

Higher overlap increases library size and build time, so only increase if needed.

## Versioning Content

Track documentation versions to match your releases:

```bash
libragen build ./docs \
  --name my-api \
  --content-version 2.1.0
```

Query specific versions later:

```bash
libragen query -l my-api --content-version 2.1.0 "authentication"
```

## Excluding Files

Skip files matching glob patterns:

```bash
libragen build ./docs \
  --name my-docs \
  --exclude "**/node_modules/**" \
  --exclude "**/drafts/**" \
  --exclude "**/*.test.md"
```

## Adding Metadata

Provide description for better discoverability:

```bash
libragen build ./docs \
  --name react-docs \
  --description "Official React documentation including hooks, components, and API reference"
```

The description appears in `libragen list` and helps AI tools understand what the library contains.

## License Tracking

When building from git repositories, licenses are automatically detected from LICENSE files. You can also specify licenses explicitly:

```bash
# Explicit license
libragen build ./docs --name my-docs --license MIT


# Multiple licenses (dual licensing)
libragen build ./docs --name my-docs --license MIT Apache-2.0
```

View license information with:

```bash
libragen inspect my-docs.libragen
```

**Supported licenses:** MIT, Apache-2.0, GPL-3.0, GPL-2.0, BSD-3-Clause, BSD-2-Clause, ISC, Unlicense, and more.

## CI/CD Integration

Automate library builds in your pipeline. See the [CI Integration guide](/docs/ci-integration) for complete examples with GitHub Actions, GitLab CI, CircleCI, and Azure Pipelines.

## Output Location

By default, libraries are saved to the current directory. Specify a different location:

```bash
libragen build ./docs \
  --name my-docs \
  --output ~/.libragen/libraries/
```

## Performance Tips

### Large Documentation Sets

For very large doc sets (>10,000 files):

1. Use larger chunk sizes to reduce total chunks
2. Exclude non-essential files (changelogs, drafts)
3. Build incrementally if possible

### Optimizing for Search Quality

1. **Structure your docs** - Use clear headings and sections
2. **Front-load important content** - Key information at the start of sections
3. **Use consistent terminology** - Same terms across related docs
4. **Include examples** - Code examples improve retrieval for technical queries

## Programmatic Building

Use the `Builder` class from `@libragen/core` to build libraries programmatically:

```typescript
import { Builder } from '@libragen/core';


const builder = new Builder();


const result = await builder.build('./docs', {
  name: 'my-docs',
  version: '1.0.0',
  description: 'My documentation',
  chunkSize: 1000,
  chunkOverlap: 100,
});


console.log(`Built: ${result.outputPath}`);
console.log(`Chunks: ${result.stats.chunkCount}`);
console.log(`Time: ${result.stats.embedDuration}s`);
```

Build from git repositories:

```typescript
const result = await builder.build('https://github.com/user/repo', {
  gitRef: 'v2.0.0',
  include: ['docs/**/*.md'],
});


if (result.git) {
  console.log(`Commit: ${result.git.commitHash}`);
  console.log(`License: ${result.git.detectedLicense?.identifier}`);
}
```

Track progress during builds:

```typescript
await builder.build('./docs', { name: 'my-docs' }, (progress) => {
  console.log(`[${progress.progress}%] ${progress.phase}: ${progress.message}`);
});
```

### Custom Embedders

Use a custom embedding provider by implementing the `IEmbedder` interface:

```typescript
import { Builder } from '@libragen/core';
import type { IEmbedder } from '@libragen/core';


class OpenAIEmbedder implements IEmbedder {
  dimensions = 1536;
  async initialize() { /* setup */ }
  async embed(text: string) { /* call OpenAI */ }
  async embedBatch(texts: string[]) { /* batch call */ }
  async dispose() { /* cleanup */ }
}


const builder = new Builder({ embedder: new OpenAIEmbedder() });
const result = await builder.build('./docs', { name: 'my-docs' });
```

See the [API Reference](/docs/api#iembedder-interface) for the full interface specification.

## Need Help?

See the [Troubleshooting guide](/docs/troubleshooting) for solutions to common build issues like slow builds, memory errors, and poor search results.

---

# CI Integration

> Automatically build and publish RAG libraries in your CI/CD pipeline

Automate your RAG library builds so they stay in sync with your documentation. This guide covers integration with popular CI/CD platforms.

## Why Automate?

* **Always current** — Libraries rebuild when docs change
* **Version tracking** — Tag libraries with commit SHAs or release versions
* **Zero manual work** — Set it and forget it
* **Artifact distribution** — Publish libraries alongside your releases

## GitHub Actions

### Using the Official Action

The easiest way to build libraries in GitHub Actions:

```yaml
name: Build RAG Library


on:
  push:
    paths:
      - 'docs/**'
    branches:
      - main


jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4


      - name: Build library
        uses: libragen/libragen@v1
        with:
          source: ./docs
          name: my-project
          description: 'Documentation for my-project'


      - name: Upload artifact
        uses: actions/upload-artifact@v4
        with:
          name: rag-library
          path: '*.libragen'
```

### Action Inputs

| Input                   | Required | Default  | Description                                          |
| ----------------------- | -------- | -------- | ---------------------------------------------------- |
| `source`                | Yes      | `./docs` | Source directory, file, or git URL                   |
| `name`                  | Yes      | —        | Library name                                         |
| `version`               | No       | —        | Library version (adds to filename)                   |
| `content-version`       | No       | —        | Version of source content being indexed              |
| `description`           | No       | —        | Short description of the library                     |
| `agent-description`     | No       | —        | Guidance for AI agents on when to use this library   |
| `example-queries`       | No       | —        | Example queries (comma-separated)                    |
| `keywords`              | No       | —        | Searchable keywords/tags (comma-separated)           |
| `programming-languages` | No       | —        | Programming languages covered (comma-separated)      |
| `text-languages`        | No       | —        | Human languages as ISO 639-1 codes (comma-separated) |
| `frameworks`            | No       | —        | Frameworks covered (comma-separated)                 |
| `license`               | No       | —        | SPDX license identifier(s) (comma-separated)         |
| `output`                | No       | `.`      | Output directory                                     |
| `chunk-size`            | No       | —        | Target chunk size                                    |
| `chunk-overlap`         | No       | —        | Chunk overlap                                        |
| `include`               | No       | —        | Glob patterns to include (comma-separated)           |
| `exclude`               | No       | —        | Glob patterns to exclude (comma-separated)           |
| `no-default-excludes`   | No       | `false`  | Disable default exclusions                           |
| `git-ref`               | No       | —        | Git branch/tag/commit (remote sources only)          |
| `git-repo-auth-token`   | No       | —        | Auth token for private repos                         |
| `cache-model`           | No       | `true`   | Cache embedding model                                |

### Action Outputs

| Output         | Description                    |
| -------------- | ------------------------------ |
| `library-path` | Full path to generated library |
| `library-name` | Filename of generated library  |

### On Release

Build libraries when you create a release:

```yaml
name: Release RAG Library


on:
  release:
    types: [published]


jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4


      - name: Build library
        id: build
        uses: libragen/libragen@v1
        with:
          source: ./docs
          name: my-project
          version: ${{ github.event.release.tag_name }}


      - name: Upload to release
        uses: softprops/action-gh-release@v1
        with:
          files: ${{ steps.build.outputs.library-path }}
```

### Caching the Model

The official action caches the embedding model by default. To disable caching:

```yaml
- uses: libragen/libragen@v1
  with:
    source: ./docs
    name: my-project
    cache-model: 'false'
```

For manual setups without the action, cache `~/.libragen/models`:

```yaml
- uses: actions/cache@v4
  with:
    path: ~/.libragen/models
    key: libragen-model-v1
```

## GitLab CI

Create `.gitlab-ci.yml`:

```yaml
build-rag-library:
  image: node:24
  stage: build
  script:
    - libragen build ./docs --name my-project --version $CI_COMMIT_TAG
  artifacts:
    paths:
      - "*.libragen"
  only:
    - tags
  cache:
    paths:
      - ~/.libragen/models
```

## CircleCI

Create `.circleci/config.yml`:

```yaml
version: 2.1


jobs:
  build-library:
    docker:
      - image: cimg/node:24.0
    steps:
      - checkout
      - restore_cache:
          keys:
            - libragen-model-v1
      - run:
          name: Build RAG library
          command: |
            libragen build ./docs \
              --name my-project \
              --version ${CIRCLE_TAG:-$CIRCLE_SHA1}
      - save_cache:
          key: libragen-model-v1
          paths:
            - ~/.libragen/models
      - store_artifacts:
          path: my-project-*.libragen


workflows:
  build:
    jobs:
      - build-library:
          filters:
            tags:
              only: /.*/
```

## Azure Pipelines

Create `azure-pipelines.yml`:

```yaml
trigger:
  paths:
    include:
      - docs/*


pool:
  vmImage: 'ubuntu-latest'


steps:
  - task: NodeTool@0
    inputs:
      versionSpec: '24.x'


  - script: |
      libragen build ./docs \
        --name my-project \
        --version $(Build.SourceVersion)
    displayName: 'Build RAG library'


  - task: PublishBuildArtifacts@1
    inputs:
      pathToPublish: '$(System.DefaultWorkingDirectory)/my-project-*.libragen'
      artifactName: 'rag-library'
```

## Publishing to Collections

After building, publish your library to a collection for easy distribution:

```yaml
# After build step
- name: Publish to collection
  run: |
    # Upload to your collection server
    curl -X POST \
      -H "Authorization: Bearer ${{ secrets.COLLECTION_TOKEN }}" \
      -F "library=@my-project-*.libragen" \
      https://collections.example.com/api/upload
```

## Environment Variables

Customize the build with environment variables:

| Variable               | Description                      |
| ---------------------- | -------------------------------- |
| `LIBRAGEN_HOME`        | Base directory for libragen data |
| `LIBRAGEN_MODEL_CACHE` | Custom model cache location      |

```yaml
- name: Build with custom paths
  env:
    LIBRAGEN_MODEL_CACHE: /tmp/models
  run: libragen build ./docs --name my-project
```

## Best Practices

1. **Cache the model** — The embedding model is \~50MB. Cache it to speed up builds.

2. **Use semantic versions** — Tag libraries with your release versions for easy reference.

3. **Build on doc changes only** — Use path filters to avoid unnecessary builds.

4. **Store as artifacts** — Attach libraries to releases for easy distribution.

5. **Validate before publishing** — Run a quick query to verify the library works:

   ```yaml
   - name: Validate library
     run: |
       libragen query \
         --library ./my-project-*.libragen \
         "test query" \
         --top-k 1
   ```

## Monorepo Setup

For monorepos with multiple doc sets:

```yaml
jobs:
  build:
    strategy:
      matrix:
        include:
          - name: api-docs
            path: packages/api/docs
          - name: sdk-docs
            path: packages/sdk/docs
    steps:
      - uses: actions/checkout@v4
      - run: |
          libragen build ./${{ matrix.path }} \
            --name ${{ matrix.name }} \
            --version ${{ github.ref_name }}
```

## Need Help?

See the [Troubleshooting guide](/docs/troubleshooting) for solutions to common CI issues.

---

# CLI Reference

> Complete command-line interface documentation

The libragen CLI provides commands for building, querying, and managing RAG libraries.

## Global Options

These options work with all commands:

| Option                | Description             |
| --------------------- | ----------------------- |
| `--help`, `-h`        | Show help for a command |
| `--cli-version`, `-V` | Show CLI version number |

## Commands

### `build`

Create a new library from source files.

```bash
libragen build <source> [options]
```

#### Arguments

| Argument | Description                  |
| -------- | ---------------------------- |
| `source` | Directory or file to process |

#### Options

| Option                  | Type      | Default       | Description                                                 |
| ----------------------- | --------- | ------------- | ----------------------------------------------------------- |
| `--name`, `-n`          | string    | Required      | Library name (creates `<name>-<version>.libragen`)          |
| `--output`, `-o`        | string    | Current dir   | Output directory                                            |
| `--description`, `-d`   | string    | —             | Library description                                         |
| `--content-version`     | string    | —             | Version tag for the content                                 |
| `--include`, `-i`       | string\[] | —             | Glob patterns to include                                    |
| `--exclude`, `-e`       | string\[] | —             | Glob patterns to exclude (added to defaults)                |
| `--no-default-excludes` | boolean   | `false`       | Disable default exclusions                                  |
| `--chunk-size`          | number    | `1000`        | Target chunk size in characters                             |
| `--chunk-overlap`       | number    | `100`         | Overlap between chunks                                      |
| `--no-ast-chunking`     | boolean   | `false`       | Disable AST-aware chunking for code files                   |
| `--context-mode`        | string    | `full`        | Context mode for AST chunking: `none`, `minimal`, or `full` |
| `--license`             | string\[] | Auto-detected | SPDX license identifier(s) for the source                   |
| `--git-ref`             | string    | —             | Git branch, tag, or commit (git sources only)               |
| `--git-repo-auth-token` | string    | —             | Auth token for private repos                                |

#### Examples

```bash
# Basic build
libragen build ./docs --name my-docs


# With version and description
libragen build ./docs \
  --name my-api \
  --description "API documentation" \
  --content-version 2.1.0


# Custom chunk settings
libragen build ./docs \
  --name my-docs \
  --chunk-size 1024 \
  --chunk-overlap 100


# Exclude patterns
libragen build ./docs \
  --name my-docs \
  --exclude "**/node_modules/**" \
  --exclude "**/test/**"


# Build from git repository
libragen build https://github.com/user/repo --name repo-docs


# Build with explicit license
libragen build ./docs --name my-docs --license MIT


# Build with multiple licenses (dual licensing)
libragen build ./docs --name my-docs --license MIT Apache-2.0


# Disable AST chunking (use text-based chunking only)
libragen build ./src --name my-code --no-ast-chunking


# Use minimal context for AST chunking (smaller chunks)
libragen build ./src --name my-code --context-mode minimal
```

#### License Detection

When building from git repositories, licenses are automatically detected from LICENSE files (LICENSE, LICENSE.md, LICENSE.txt, COPYING).

* **Git sources**: Auto-detected if not explicitly provided
* **Local sources**: Use `--license` to specify
* **Explicit `--license`**: Always takes precedence

**Supported licenses:** MIT, Apache-2.0, GPL-3.0, GPL-2.0, LGPL-3.0, LGPL-2.1, BSD-3-Clause, BSD-2-Clause, ISC, Unlicense, MPL-2.0, CC0-1.0, AGPL-3.0

#### AST-Aware Code Chunking

By default, libragen uses AST-aware chunking for supported code files. This produces higher-quality embeddings by:

* **Respecting language boundaries** — Chunks align with functions, classes, and methods instead of arbitrary character positions
* **Including semantic context** — Each chunk includes scope chain, imports, and sibling entities
* **Contextualizing for embeddings** — A special `contextualizedText` is generated that includes file path, scope, and signatures

**Supported languages:** TypeScript, JavaScript, Python, Rust, Go, Java

**Context modes:**

* `full` (default) — Include all available context (scope chain, imports, siblings, signatures)
* `minimal` — Include only essential context (scope chain, entity signatures)
* `none` — No additional context (raw code only)

Non-code files (Markdown, JSON, text, etc.) always use text-based chunking.

***

### `query`

Search a library with natural language.

```bash
libragen query [options] <query>
```

#### Arguments

| Argument | Description                   |
| -------- | ----------------------------- |
| `query`  | Natural language search query |

#### Options

| Option              | Type      | Default              | Description                                          |
| ------------------- | --------- | -------------------- | ---------------------------------------------------- |
| `--library`, `-l`   | string    | Required             | Library name or path to .libragen file               |
| `--path`, `-p`      | string\[] | auto-detect + global | Project directory (will search /.libragen/libraries) |
| `--top-k`, `-k`     | number    | `10`                 | Number of results to return                          |
| `--content-version` | string    | —                    | Filter by content version                            |
| `--format`, `-f`    | string    | `text`               | Output format (`text`, `json`)                       |

The `-l` option accepts either:

* A **library name** (e.g., `my-lib`) — resolved using the library discovery algorithm
* A **file path** (e.g., `./my-lib.libragen`) — used directly

When using a library name, the resolution algorithm searches:

1. Project-local directories (`.libragen/libraries/` in cwd)
2. Global library directory (`~/.libragen/libraries/`)
3. Any paths specified with `-p`

#### Examples

```bash
# Query by library name (uses resolution algorithm)
libragen query --library my-docs "How do I authenticate?"


# Query by explicit file path
libragen query -l ./my-docs.libragen "How do I authenticate?"


# Query with custom project directory
libragen query -l my-docs -p ./my-project "error handling"


# Get more results as JSON
libragen query -l my-docs -k 20 -f json "error handling"


# Query specific version
libragen query -l my-api --content-version 2.0.0 "rate limits"
```

#### Output Format

**Text output** (default):

```plaintext
[1] authentication.md (score: 0.89)
    To authenticate, pass your API key in the Authorization header...


[2] getting-started.md (score: 0.82)
    First, obtain an API key from the dashboard...
```

**JSON output** (`--format json`):

```json
{
  "results": [
    {
      "rank": 1,
      "score": 0.89,
      "source": "authentication.md",
      "content": "To authenticate, pass your API key..."
    }
  ]
}
```

***

### `list`

List installed libraries.

**Aliases:** `l`, `ls`

```bash
libragen list [options]
```

#### Options

| Option          | Type      | Default | Description                                              |
| --------------- | --------- | ------- | -------------------------------------------------------- |
| `-p, --path`    | string\[] | —       | Project directory (will search /.libragen/libraries)     |
| `-v, --verbose` | boolean   | `false` | Show detailed information (path, chunks, size, keywords) |
| `--show-path`   | boolean   | `false` | Show the file path for each library                      |
| `--json`        | boolean   | `false` | Output as JSON (includes path and location)              |
| `--libraries`   | boolean   | `false` | Show only libraries                                      |
| `--collections` | boolean   | `false` | Show only collections                                    |

#### Output

Each library displays:

* **Name and version**
* **Location badge**: `[project]` or `[global]` indicating where the library was found
* **Path** (with `--show-path` or `-v`): full file path to the `.libragen` file

The location badge helps identify which libraries come from project-local directories vs the global directory.

#### Library Discovery

By default, libragen discovers libraries from:

1. **Project directory** — If `.libragen/libraries/` exists in the current directory, it’s searched first
2. **Global directory** — `$LIBRAGEN_HOME/libraries` (always included)

When `-p` is specified, the path is transformed to `<path>/.libragen/libraries` and **only** that path is searched—no global directory, no auto-detection.

#### Examples

```bash
# List all libraries (auto-detected + global)
libragen list


# List with file paths shown
libragen list --show-path


# List only project-local libraries
libragen list -p ./my-project


# List from multiple specific paths
libragen list -p ./libs -p ./vendor/libs


# JSON output for scripting (includes path and location)
libragen list --json
```

**Example output:**

```plaintext
📚 Installed Libraries (2)


  my-docs v1.0.0 [project]
  react-docs v19.0.0 [global]
```

**With `--show-path`:**

```plaintext
📚 Installed Libraries (2)


  my-docs v1.0.0 [project]
    /path/to/project/.libragen/libraries/my-docs-1.0.0.libragen


  react-docs v19.0.0 [global]
    /Users/you/.libragen/libraries/react-docs-19.0.0.libragen
```

***

### `inspect`

Inspect the contents of a library (`.libragen`) or packed collection (`.libragen-collection`) file.

```bash
libragen inspect <source> [options]
```

#### Arguments

| Argument | Description                             |
| -------- | --------------------------------------- |
| `source` | Library file, packed collection, or URL |

#### Options

| Option   | Type    | Default | Description    |
| -------- | ------- | ------- | -------------- |
| `--json` | boolean | `false` | Output as JSON |

#### Examples

```bash
# Inspect a library file
libragen inspect my-lib.libragen


# Inspect a packed collection
libragen inspect my-collection.libragen-collection


# Inspect from URL
libragen inspect https://example.com/my-lib.libragen


# JSON output for scripting
libragen inspect my-lib.libragen --json
```

#### Output (library)

```plaintext
📚 Library Contents


  File:    /path/to/my-lib.libragen
  Size:    1.23 MB


  Metadata:
    Name:        my-lib
    Version:     1.0.0
    Description: My library description
    Content:     v2.0.0 (semver)
    Schema:      v3
    Created:     2024-01-15T10:30:00.000Z


  Stats:
    Chunks:      1,234
    Sources:     42 files


  Embedding:
    Model:       Xenova/bge-small-en-v1.5
    Dimensions:  384


  Source:
    Type:        git
    URL:         https://github.com/user/repo
    Ref:         main
    Commit:      abc12345


  License(s):
    • MIT
```

#### Output (packed collection)

```plaintext
📦 Collection Contents


  File: /path/to/my-collection.libragen-collection
  Size: 5.67 MB


  Metadata:
    Name:    my-collection
    Version: 1.0.0
    Desc:    My collection description


  Libraries (3):
    • api-docs.libragen (1.2 MB)
    • guides.libragen (2.3 MB)
    • tutorials.libragen (2.17 MB)


Install with:
  libragen install /path/to/my-collection.libragen-collection
```

***

### `install`

Install a library from a collection or URL.

```bash
libragen install <source> [options]
```

#### Options

| Option              | Type      | Default | Description                                                |
| ------------------- | --------- | ------- | ---------------------------------------------------------- |
| `-p, --path`        | string\[] | —       | Project directory (will install to /.libragen/libraries)   |
| `-f, --force`       | boolean   | `false` | Overwrite existing library                                 |
| `-c, --collection`  | string    | —       | Collection URL to use                                      |
| `--content-version` | string    | —       | Install specific content version                           |
| `-a, --all`         | boolean   | `false` | Install all libraries including optional (for collections) |

#### Examples

```bash
# Install from file (defaults to $LIBRAGEN_HOME/libraries)
libragen install ./my-lib.libragen


# Install to a project directory (creates ./my-project/.libragen/libraries)
libragen install ./my-lib.libragen -p ./my-project


# Install to multiple project directories (first path is used for install)
libragen install ./my-lib.libragen -p ./project-a -p ./project-b


# Install from URL
libragen install https://example.com/my-lib-1.0.0.libragen
```

***

### `uninstall`

Remove an installed library.

```bash
libragen uninstall <name> [options]
```

#### Options

| Option             | Type      | Default | Description                                          |
| ------------------ | --------- | ------- | ---------------------------------------------------- |
| `-p, --path`       | string\[] | —       | Project directory (will search /.libragen/libraries) |
| `-c, --collection` | boolean   | `false` | Uninstall a collection (and unreferenced libraries)  |

#### Examples

```bash
# Uninstall from auto-detected paths
libragen uninstall my-docs


# Uninstall from specific project only
libragen uninstall my-docs -p ./my-project
```

***

### `update`

Update installed libraries to newer versions from their collections.

```bash
libragen update [name] [options]
```

#### Arguments

| Argument | Description                                              |
| -------- | -------------------------------------------------------- |
| `name`   | Optional library name to update (updates all if omitted) |

#### Options

| Option          | Type      | Default | Description                                          |
| --------------- | --------- | ------- | ---------------------------------------------------- |
| `-p, --path`    | string\[] | —       | Project directory (will search /.libragen/libraries) |
| `-n, --dry-run` | boolean   | `false` | Show what would be updated without making changes    |
| `-f, --force`   | boolean   | `false` | Force update even if versions match                  |

#### Examples

```bash
# Update all libraries
libragen update


# Update specific library
libragen update react-docs


# Preview updates without applying
libragen update --dry-run


# Update only project-local libraries
libragen update -p ./my-project
```

***

### `config`

Display current libragen configuration and paths.

```bash
libragen config [options]
```

#### Options

| Option   | Type    | Default | Description    |
| -------- | ------- | ------- | -------------- |
| `--json` | boolean | `false` | Output as JSON |

#### Example

```bash
libragen config
```

```plaintext
⚙️  Libragen Configuration


  Version:  0.1.0


  Paths:
    Home:       ~/Library/Application Support/libragen
    Libraries:  ~/Library/Application Support/libragen/libraries
    Models:     ~/Library/Application Support/libragen/models


  Active Library Paths (in priority order):
    🌐 ~/Library/Application Support/libragen/libraries (global)
    (no project-local .libragen/libraries found in cwd)


  Environment Variables:
    (none set, using defaults)
```

With a project-local `.libragen/libraries` directory:

```bash
cd /my/project
libragen config
```

```plaintext
⚙️  Libragen Configuration


  Version:  0.1.0


  Paths:
    Home:       ~/Library/Application Support/libragen
    Libraries:  ~/Library/Application Support/libragen/libraries
    Models:     ~/Library/Application Support/libragen/models


  Active Library Paths (in priority order):
    📁 /my/project/.libragen/libraries (project-local)
    🌐 ~/Library/Application Support/libragen/libraries (global)


  Environment Variables:
    (none set, using defaults)
```

With environment variable overrides:

```bash
LIBRAGEN_HOME=/custom/path libragen config
```

```plaintext
⚙️  Libragen Configuration


  Version:  0.1.0


  Paths:
    Home:       /custom/path (from LIBRAGEN_HOME)
    Libraries:  /custom/path/libraries
    Models:     /custom/path/models


  Active Library Paths (in priority order):
    🌐 /custom/path/libraries (global)
    (no project-local .libragen/libraries found in cwd)


  Environment Variables:
    LIBRAGEN_HOME=/custom/path
```

***

## Environment Variables

| Variable               | Description                                   |
| ---------------------- | --------------------------------------------- |
| `LIBRAGEN_HOME`        | Override base directory for all libragen data |
| `LIBRAGEN_MODEL_CACHE` | Override model cache directory                |

Use `libragen config` to see current values and which environment variables are active.

***

### `autocomplete`

Set up shell completions for bash, zsh, and fish.

```bash
libragen autocomplete [SHELL]
```

#### Arguments

| Argument | Description                                     |
| -------- | ----------------------------------------------- |
| `SHELL`  | Shell type: `bash`, `zsh`, or `fish` (optional) |

#### Examples

```bash
# Display autocomplete installation instructions
libragen autocomplete


# Set up for specific shell
libragen autocomplete bash
libragen autocomplete zsh
libragen autocomplete fish
```

#### Features

Shell completions provide:

* **Command completion** - Tab-complete all libragen commands
* **Option completion** - Tab-complete command options
* **Typo suggestions** - Get suggestions when you mistype a command

***

### `cli-update`

Update the libragen CLI to the latest version.

```bash
libragen cli-update [options]
```

#### Options

| Option          | Description                          |
| --------------- | ------------------------------------ |
| `--check`, `-c` | Check for updates without installing |

#### Examples

```bash
# Update to latest version
libragen cli-update


# Check if updates are available
libragen cli-update --check
```

***

## Exit Codes

| Code | Description       |
| ---- | ----------------- |
| `0`  | Success           |
| `1`  | General error     |
| `2`  | Invalid arguments |

---

# Collections

> Bundle and share multiple libraries together

Collections let you bundle multiple libraries into a single installable package. They’re useful for sharing curated sets of documentation with your team or organizing related libraries together.

## Creating a Collection

### 1. Build your libraries

First, create the individual libraries you want to include:

```bash
libragen build ./api-docs --name internal-api
libragen build ./guides --name internal-guides
libragen build ./onboarding --name onboarding-docs
```

### 2. Create a collection file

Use the `collection create` command to bundle libraries together:

```bash
# Create with libraries specified
libragen collection create my-team-docs \
  --library ./internal-api.libragen \
  --library ./internal-guides.libragen \
  --library ./onboarding-docs.libragen \
  --description "Internal documentation for the team"


# Or create a template to edit manually
libragen collection create my-team-docs
```

This creates a `my-team-docs.json` collection file. If no libraries are specified, a template is created that you can edit manually.

### 3. Collection file format

Collections are JSON files that reference library locations:

```json
{
  "name": "my-team-docs",
  "version": "1.0.0",
  "description": "Internal documentation for the team",
  "libraries": [
    {
      "name": "internal-api",
      "version": "1.0.0",
      "description": "Internal API documentation",
      "url": "./internal-api.libragen"
    },
    {
      "name": "internal-guides",
      "version": "1.0.0",
      "description": "Internal engineering guides",
      "url": "./internal-guides.libragen"
    }
  ]
}
```

Library URLs can be:

* **Relative paths** - `./libs/my-lib.libragen`
* **Absolute paths** - `/shared/libs/my-lib.libragen`
* **HTTP URLs** - `https://internal.example.com/libs/my-lib.libragen`

## Installing from a Collection

Install all libraries from a collection at once:

```bash
# From a local collection file
libragen install --from ./my-team-docs.json


# From a URL
libragen install --from https://internal.example.com/collections/team-docs.json
```

## Hosting Collections

For team sharing, host your collection and libraries on any file server:

### Simple file share

```plaintext
/shared/libragen/
├── collections/
│   └── team-docs.json
└── libraries/
    ├── internal-api-1.0.0.libragen
    ├── internal-guides-1.0.0.libragen
    └── onboarding-docs-1.0.0.libragen
```

### HTTP server or S3

Upload files to any HTTP-accessible location and reference them by URL in your collection.

## Managing Libraries

### List installed libraries

```bash
libragen list
```

### Install a single library

```bash
# From a local file
libragen install ./path/to/library.libragen


# From a URL
libragen install https://example.com/my-lib.libragen
```

### Remove a library

```bash
libragen uninstall my-library
```

## Packing Collections for Sharing

Bundle a collection and all its libraries into a single `.libragen-collection` file for easy distribution:

```bash
# Pack a collection into a single file
libragen collection pack ./my-team-docs.json


# Creates: my-team-docs.libragen-collection
```

Recipients can inspect, unpack, or install directly:

```bash
# Inspect contents without extracting
libragen inspect my-team-docs.libragen-collection


# Unpack to current directory
libragen collection unpack my-team-docs.libragen-collection


# Or install directly (extracts to temp and installs)
libragen install my-team-docs.libragen-collection


# Or unpack and install in one step
libragen collection unpack my-team-docs.libragen-collection --install
```

This is ideal for:

* Sharing collections via email or chat
* Distributing to air-gapped environments
* Bundling documentation with project releases

## Best Practices

1. **Version your collections** - Include version numbers in collection files for reproducibility
2. **Use relative paths for portability** - When distributing collections with libraries, use relative paths
3. **Pack for offline sharing** - Use `collection pack` when recipients don’t have network access to your library URLs
4. **Cache in CI** - Store `.libragen` files in your CI cache to speed up builds
5. **Keep libraries updated** - Rebuild libraries when source documentation changes

---

# Getting Started

> Build your first RAG library in under 60 seconds

Libragen (pronounced “LIB-ruh-jen”) lets you create portable, searchable RAG libraries from any documentation. Libraries are single SQLite files you can share, version control, and query locally—no cloud services required.

> **Try it now:** This documentation is available as a libragen library. Download [libragen-docs-0.4.0.libragen](/libragen-docs-0.4.0.libragen) and query it locally!

```bash
curl -L -O https://libragen.dev/libragen-docs-0.4.0.libragen # Download the docs library
libragen install libragen-docs-0.4.0.libragen # Install it
libragen query -l libragen-docs "How do I build a library?" # Query it
```

## Quick Start

### 1. Build a Library

Create a library from a directory of markdown, text, or HTML files:

```bash
libragen build ./my-docs --name my-library
```

Or, build from a git repository:

```bash
libragen build https://github.com/github/docs/tree/main/content
```

This will:

* Process all supported files in `./my-docs`
* Generate embeddings using a local model
* Build a full-text search index
* Save everything to `my-library-1.0.0.libragen`

> NOTE: Depending on the size of your library, **building may take a few minutes**.
>
> Grab a coffee and come back in a few minutes; it’s worth the wait!

### 2. Query Your Library

Search with natural language:

```bash
libragen query --library my-library "How do I authenticate?"
```

Results include relevance scores and source file information.

### 3. Use with MCP (Optional)

Install the MCP server for your AI tool with one command:

```bash
npx -y install-mcp @libragen/mcp
```

This automatically configures Claude Desktop, Cursor, Windsurf, or other MCP-compatible tools. Your AI assistant can now query your libraries directly.

## Installation Options

### Use Without Installing

The easiest way to use libragen is with `npx`:

```bash
# Build
libragen build ./docs --name my-docs


# Query
libragen query --library my-docs "your question"
```

### Global Installation

For frequent use, install globally:

```bash
npm install -g @libragen/cli


# Now use without npx
libragen build ./docs --name my-docs
libragen query --library my-docs "your question"
```

**Enable shell completions** for tab-completion of commands and options:

```bash
libragen autocomplete
```

### As a Project Dependency

For programmatic use in your project:

```bash
npm install --save-exact @libragen/core
```

## Library Storage

Libraries are stored and discovered from multiple locations:

### Default Installation Location

**By default, all installed .libragen files go to the global directory:**

| Platform | Global Path                                         |
| -------- | --------------------------------------------------- |
| macOS    | `~/Library/Application Support/libragen/libraries/` |
| Linux    | `~/.local/share/libragen/libraries/`                |
| Windows  | `%APPDATA%\libragen\libraries\`                     |

You can override the global directory with the `LIBRAGEN_HOME` environment variable:

```bash
export LIBRAGEN_HOME=/custom/path
```

### Library Discovery

When searching for libraries (e.g., `libragen list`, `libragen query`), libragen looks in:

1. **Project directory** — If `.libragen/libraries/` exists in the current directory, it’s searched first
2. **Global directory** — `$LIBRAGEN_HOME/libraries` (always included)

### Project-Local Libraries

To install a library to a local directory instead of the global directory, use the `-p` flag (it automatically appends `.libragen/libraries`):

```bash
# Install to ./my-project/.libragen/libraries
libragen install ./my-lib.libragen -p ./my-project
```

This is useful for:

* Project-specific documentation
* Version-controlled library sets
* CI/CD environments

### Environment Variables

Override the global directory with `LIBRAGEN_HOME`:

```bash
export LIBRAGEN_HOME=/custom/path
```

Run `libragen config` to see current paths and active environment variables.

## Next Steps

* [CLI Reference](/docs/cli) - Full command documentation
* [Building Libraries](/docs/building) - Advanced build options
* [CI Integration](/docs/ci-integration) - Automate builds in your pipeline
* [MCP Integration](/docs/mcp) - Connect to AI tools
* [Collections](/docs/collections) - Install curated library bundles
* [Troubleshooting](/docs/troubleshooting) - Solutions to common issues

---

# MCP Integration

> Connect libragen to Claude, Cursor, VS Code, and other AI tools

Libragen includes an [MCP](https://modelcontextprotocol.io/) server that lets AI assistants query your libraries directly. Once configured, your AI can search documentation without you copying and pasting.

## Quick Setup

Install the libragen MCP server with one command:

```bash
npx -y install-mcp @libragen/mcp
```

That’s it! This automatically detects and configures your AI tool (Claude Desktop, Cursor, Windsurf, etc.).

## Library Discovery

The MCP server automatically discovers libraries from multiple locations:

1. **Project-local libraries**: If your workspace has a `.libragen/libraries/` directory, those libraries are discovered automatically
2. **Global libraries**: Platform-specific global directory (e.g., `~/Library/Application Support/libragen/libraries/` on macOS)

Project-local libraries take priority over global libraries with the same name. This means you can have project-specific versions of libraries that shadow global ones.

### How It Works

When used in VS Code, Windsurf, Cursor, or other IDEs that support MCP roots:

* The server requests workspace roots from the IDE
* For each workspace root, it checks for `.libragen/libraries/`
* Project libraries are searched first, then global libraries
* No configuration needed—just open your project!

## Supported Tools

The `install-mcp` package automatically configures:

* **Claude Desktop** - Adds to `claude_desktop_config.json`
* **Cursor** - Adds to Cursor’s MCP settings
* **Windsurf** - Adds to Windsurf’s MCP configuration
* **VS Code** - Adds to Copilot’s MCP servers

After installation, restart your AI tool to load the server.

## Manual Configuration

If you prefer manual setup or need custom options, add this to your tool’s MCP configuration:

```json
{
  "mcpServers": {
    "libragen": {
      "command": "npx",
      "args": ["-y", "@libragen/mcp"]
    }
  }
}
```

See [install-mcp documentation](https://www.npmjs.com/package/install-mcp) for config file locations by tool.

## Available Tools

The MCP server provides 8 tools for managing and querying libraries:

### `libragen_search`

Search libraries for relevant content using hybrid semantic + keyword search.

**Inputs:**

* `query` (string, required) - Natural language search query
* `libraries` (string\[], optional) - Specific libraries to search (searches all if omitted)
* `contentVersion` (string, optional) - Filter by content version
* `topK` (number, default: 10) - Number of results
* `hybridAlpha` (number, default: 0.5) - Balance between vector (1) and keyword (0) search
* `contextBefore` (number, default: 1) - Chunks to include before each result
* `contextAfter` (number, default: 1) - Chunks to include after each result
* `rerank` (boolean, default: false) - Apply cross-encoder reranking for better relevance

**Example prompt:**

> “Search my react-docs library for information about useEffect cleanup”

### `libragen_list`

List available libraries with metadata, descriptions, and usage guidance.

**Example prompt:**

> “What libragen libraries do I have installed?”

### `libragen_build`

Build a searchable library from source files or git repositories.

**This tool uses an async pattern** to avoid timeouts on long-running builds. Builds run in worker threads (up to n-1 CPU cores) to avoid blocking the MCP server.

**Actions:**

* `start` (default) - Start a new build, returns a `taskId` immediately
* `status` - Check build progress (requires `taskId`)
* `cancel` - Cancel a running or queued build (requires `taskId`)

**Workflow:**

1. Call with `action='start'` and `source` to begin a build
2. Poll with `action='status'` and `taskId` to check progress
3. When `status='completed'`, the `result` field contains the build output

**Inputs:**

* `action` (string, default: “start”) - Action to perform: `start`, `status`, or `cancel`
* `taskId` (string, for status/cancel) - Task ID returned from start action
* `source` (string, for start) - Directory, file path, or git URL to index
* `name` (string, optional) - Library name
* `output` (string, optional) - Output path for the .libragen file
* `version` (string, default: “0.1.0”) - Library version
* `contentVersion` (string, optional) - Version of source content
* `description` (string, optional) - Short description
* `agentDescription` (string, optional) - Guidance for AI agents
* `exampleQueries` (string\[], optional) - Example queries
* `keywords` (string\[], optional) - Searchable tags
* `programmingLanguages` (string\[], optional) - Languages covered
* `frameworks` (string\[], optional) - Frameworks covered
* `chunkSize` (number, default: 1000) - Target chunk size
* `chunkOverlap` (number, default: 100) - Chunk overlap
* `include` (string\[], optional) - Glob patterns to include
* `exclude` (string\[], optional) - Glob patterns to exclude
* `gitRef` (string, optional) - Git branch/tag/commit
* `gitRepoAuthToken` (string, optional) - Auth token for private repos
* `noAstChunking` (boolean, default: false) - Disable AST-aware chunking for code files
* `contextMode` (string, default: “full”) - Context mode for AST chunking: `none`, `minimal`, or `full`
* `install` (boolean, default: false) - Install after building

**Response fields:**

* `taskId` - Unique identifier for the build task
* `status` - Task status: `queued`, `running`, `completed`, `failed`, `cancelled`
* `progress` - Progress percentage (0-100)
* `currentStep` - Description of current build step
* `result` - Build output (when completed)
* `error` - Error message (when failed)
* `queuePosition` - Position in queue (when queued)
* `estimatedTotalSeconds` - Estimated total build time (system-aware)
* `estimatedRemainingSeconds` - Estimated seconds until completion
* `elapsedSeconds` - How long the build has been running

**Example prompts:**

> “Build a library from <https://github.com/vercel/next.js/tree/main/docs>”

> “Check the status of build task abc-123”

### `libragen_install`

Install a library or collection to make it available for searching.

**Inputs:**

* `source` (string, required) - Library file (.libragen), collection (.json), or URL
* `force` (boolean, default: false) - Overwrite existing libraries
* `includeOptional` (boolean, default: false) - Include optional libraries from collections

**Example prompt:**

> “Install the library from ./my-docs.libragen”

### `libragen_uninstall`

Remove an installed library.

**Inputs:**

* `name` (string, required) - Name of the library to uninstall

**Example prompt:**

> “Uninstall the react-docs library”

### `libragen_update`

Update installed libraries to newer versions from collections.

**Inputs:**

* `name` (string, optional) - Library to update (updates all if omitted)
* `force` (boolean, default: false) - Force update even if versions match
* `dryRun` (boolean, default: false) - Show what would be updated without applying

**Example prompt:**

> “Check for updates to my installed libraries”

### `libragen_collection`

Create a collection file that bundles multiple libraries together.

**Inputs:**

* `output` (string, required) - Output file path (.json)
* `name` (string, optional) - Collection name
* `description` (string, optional) - Collection description
* `version` (string, default: “1.0.0”) - Collection version
* `libraries` (string\[], optional) - Required library sources
* `optionalLibraries` (string\[], optional) - Optional library sources
* `collections` (string\[], optional) - Nested collection sources

**Example prompt:**

> “Create a collection called team-docs with my api-docs and guides libraries”

### `libragen_config`

Get libragen configuration including paths, version, and discovered project directories.

**Returns:**

* `version` - libragen version
* `paths` - Default paths (home, libraries, models)
* `discoveredPaths` - All library paths being searched (includes project-local)
* `pathsInitialized` - Whether paths have been discovered from workspace roots
* `environment` - Active environment variable overrides

**Example prompt:**

> “Show me the libragen configuration and where libraries are stored”

## Custom Library Directory

To use libraries from a custom location, set the `LIBRAGEN_HOME` environment variable:

```bash
# Install with custom library path
LIBRAGEN_HOME=/path/to/your/libragen npx -y install-mcp @libragen/mcp
```

Or add it manually to your MCP config:

```json
{
  "mcpServers": {
    "libragen": {
      "command": "npx",
      "args": ["-y", "@libragen/mcp"],
      "env": {
        "LIBRAGEN_HOME": "/path/to/your/libragen"
      }
    }
  }
}
```

## Environment Variables

| Variable                  | Description                                                             |
| ------------------------- | ----------------------------------------------------------------------- |
| `LIBRAGEN_HOME`           | Override base directory for all libragen data                           |
| `LIBRAGEN_MODEL_CACHE`    | Override model cache directory                                          |
| `LIBRAGEN_TASK_EXPIRY_MS` | How long completed build tasks are retained (default: 3600000 = 1 hour) |

## Prompts (Slash Commands)

The server exposes prompts that appear as slash commands in MCP-compatible clients:

### `/libragen-search`

Search installed libraries for relevant code snippets.

**Arguments:**

* `query` (string) — What to search for (e.g., “authentication”, “React hooks”)

### `/libragen-build`

Build a searchable library from source code.

**Arguments:**

* `source` (string) — Path to the source directory or file to index

### `/libragen-collection`

Create a collection file grouping multiple libraries.

**Arguments:**

* `name` (string) — Name for the collection
* `libraries` (string, optional) — Comma-separated list of library paths

## Tips for Best Results

1. **Be specific** - “Search react-docs for useEffect dependency arrays” works better than “how does useEffect work”

2. **Name your libraries clearly** - The AI uses library names to understand what’s available

3. **Add descriptions** - When building libraries, include descriptions so the AI knows what each library contains

4. **Version your content** - Use `--content-version` when building so you can reference specific versions

## Troubleshooting

### Server not loading

1. Verify the MCP config syntax is valid JSON
2. Check that `npx` is in your PATH
3. Restart your AI tool completely

### No libraries found

1. Check that libraries exist in the default location
2. Verify with `libragen list` in your terminal
3. Try setting `LIBRAGEN_HOME` explicitly

### Slow first query

The first query downloads the embedding model (\~50MB). Subsequent queries are fast.

---

# Schemas

> JSON schemas for libragen data structures

Libragen uses JSON Schema to define its data structures. These schemas are versioned and available for validation and tooling integration.

## Schema URLs

All schemas are available at versioned URLs:

| Schema           | URL                                                                                  |
| ---------------- | ------------------------------------------------------------------------------------ |
| Library Metadata | [/schemas/v1/library-metadata.schema.json](/schemas/v1/library-metadata.schema.json) |
| Collection Index | [/schemas/v1/collection-index.schema.json](/schemas/v1/collection-index.schema.json) |
| Collection       | [/schemas/v1/collection.schema.json](/schemas/v1/collection.schema.json)             |
| Collection Item  | [/schemas/v1/collection-item.schema.json](/schemas/v1/collection-item.schema.json)   |

Use these URLs in your `$schema` field for validation:

```json
{
  "$schema": "https://libragen.dev/schemas/v1/library-metadata.schema.json",
  "name": "my-library",
  ...
}
```

## Library Metadata

**Schema:** [`/schemas/v1/library-metadata.schema.json`](/schemas/v1/library-metadata.schema.json)

Every `.libragen` file contains metadata describing its contents. This is stored in the library’s SQLite database and returned by `libragen inspect`.

### Key Fields

| Field                  | Required | Description                               |
| ---------------------- | -------- | ----------------------------------------- |
| `name`                 | Yes      | Library name (e.g., “react-docs”)         |
| `version`              | Yes      | Library format version                    |
| `createdAt`            | Yes      | ISO 8601 creation timestamp               |
| `contentVersion`       | No       | Version of source content                 |
| `description`          | No       | Short description                         |
| `agentDescription`     | No       | Guidance for AI agents                    |
| `exampleQueries`       | No       | Example queries this library answers      |
| `keywords`             | No       | Searchable tags                           |
| `programmingLanguages` | No       | Programming languages covered             |
| `textLanguages`        | No       | Human/natural languages (ISO 639-1 codes) |
| `frameworks`           | No       | Frameworks covered                        |

### Example

```json
{
  "$schema": "https://libragen.dev/schemas/v1/library-metadata.schema.json",
  "name": "react-docs",
  "version": "1.0.0",
  "contentVersion": "19.0.0",
  "description": "Official React documentation",
  "agentDescription": "Use when users ask about React hooks, components, or JSX.",
  "exampleQueries": ["How do I use useEffect?"],
  "keywords": ["react", "frontend"],
  "programmingLanguages": ["javascript", "typescript"],
  "textLanguages": ["en"],
  "frameworks": ["react"],
  "createdAt": "2024-01-15T10:30:00Z",
  "embedding": {
    "model": "Xenova/bge-small-en-v1.5",
    "dimensions": 384
  },
  "chunking": {
    "strategy": "ast+recursive",
    "chunkSize": 512,
    "chunkOverlap": 50
  },
  "stats": {
    "chunkCount": 8392,
    "sourceCount": 247,
    "fileSize": 12400000
  },
  "contentHash": "sha256:a1b2c3d4..."
}
```

## Collection Index

**Schema:** [`/schemas/v1/collection-index.schema.json`](/schemas/v1/collection-index.schema.json)

Collections serve a JSON index listing available libraries. Host this file at your collection URL.

### Key Fields

| Field       | Required | Description                      |
| ----------- | -------- | -------------------------------- |
| `name`      | Yes      | Collection name                  |
| `version`   | Yes      | Index format version (use “1.0”) |
| `updatedAt` | Yes      | ISO 8601 timestamp               |
| `libraries` | Yes      | Array of available libraries     |

Each library version includes:

| Field            | Required | Description                    |
| ---------------- | -------- | ------------------------------ |
| `version`        | Yes      | Library version                |
| `downloadURL`    | Yes      | URL to download .libragen file |
| `contentHash`    | Yes      | SHA-256 hash for verification  |
| `contentVersion` | No       | Source content version         |
| `fileSize`       | No       | File size in bytes             |

### Example

```json
{
  "$schema": "https://libragen.dev/schemas/v1/collection-index.schema.json",
  "name": "frontend",
  "version": "1.0",
  "updatedAt": "2024-01-15T10:30:00Z",
  "libraries": [
    {
      "name": "react",
      "description": "Official React documentation",
      "versions": [
        {
          "version": "1.0.0",
          "contentVersion": "19.0.0",
          "downloadURL": "https://example.com/react-1.0.0.libragen",
          "contentHash": "sha256:a1b2c3d4e5f6...",
          "fileSize": 12400000
        }
      ]
    }
  ]
}
```

## Collection

**Schema:** [`/schemas/v1/collection.schema.json`](/schemas/v1/collection.schema.json)

A collection file that can be installed locally. Contains a list of libraries and/or nested collections.

### Example

```json
{
  "$schema": "https://libragen.dev/schemas/v1/collection.schema.json",
  "name": "my-team",
  "description": "Libraries for our team",
  "items": [
    { "library": "https://example.com/api-docs-1.0.0.libragen" },
    { "library": "./local-docs.libragen" },
    { "collection": "https://example.com/shared.json" }
  ]
}
```

## File Format

The `.libragen` file format is a SQLite database containing:

| Table        | Description                                  |
| ------------ | -------------------------------------------- |
| `metadata`   | Key-value store for library metadata         |
| `chunks`     | Document chunks with content and source info |
| `embeddings` | Vector embeddings for each chunk             |
| `fts_chunks` | Full-text search index                       |

The format is designed to be:

* **Portable** — Single file, no external dependencies
* **Queryable** — Standard SQLite, readable by any SQLite client
* **Efficient** — Optimized for hybrid vector + full-text search

## Versioning

Schemas are versioned using URL paths (e.g., `/schemas/v1/`). When breaking changes are introduced, a new version will be released (e.g., `/schemas/v2/`). Older versions remain available for backward compatibility.

Current version: **v1**

---

# Troubleshooting

> Solutions to common issues with libragen

This guide covers common issues and their solutions when using libragen.

## Installation Issues

### `npx` command not found

**Cause:** Node.js is not installed or not in your PATH.

**Solution:**

1. Install Node.js 24+ from [nodejs.org](https://nodejs.org)
2. Verify installation: `node --version`
3. Restart your terminal

### Permission errors on install

**Cause:** npm global directory requires elevated permissions.

**Solution:**

```bash
# Option 1: Use npx (no global install needed)
libragen build ./docs --name my-docs


# Option 2: Fix npm permissions
mkdir ~/.npm-global
npm config set prefix '~/.npm-global'
echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
```

## Build Issues

### Build is slow on first run

**Cause:** The embedding model (\~50MB) must be downloaded on first use.

**Solution:** This is expected. Subsequent builds will be faster as the model is cached in `~/.libragen/models/` (or `$LIBRAGEN_MODEL_CACHE`).

### Out of memory errors

**Cause:** Processing too many large files at once.

**Solutions:**

1. Reduce chunk size: `--chunk-size 256`
2. Exclude large files: `--exclude "**/*.pdf" --exclude "**/images/**"`
3. Process directories separately and combine results
4. Increase Node.js memory: `NODE_OPTIONS="--max-old-space-size=4096" libragen build ...`

### No chunks created

**Cause:** No supported files found, or all files excluded.

**Solutions:**

1. Check file extensions - only `.md`, `.txt`, `.html`, `.mdx` are processed by default
2. Add custom extensions: `--extensions ".rst,.adoc"`
3. Check exclude patterns aren’t too broad
4. Verify source path is correct

### ”ENOENT: no such file or directory”

**Cause:** Source path doesn’t exist or is inaccessible.

**Solutions:**

1. Verify the path exists: `ls ./docs`
2. Use absolute path if relative path fails
3. Check file permissions

## Query Issues

### No results returned

**Cause:** Query doesn’t match indexed content, or library is empty.

**Solutions:**

1. Check library has content: `libragen inspect my-library.libragen`
2. Try broader search terms
3. Use keywords that appear in your docs
4. Rebuild with smaller chunk size for more granular results

### Poor quality results

**Cause:** Chunks are too large/small, or content isn’t well-structured.

**Solutions:**

1. Adjust chunk size (see [Building Libraries](/docs/building#chunking-strategy))
2. Ensure source docs have clear headings and sections
3. Try different query phrasing
4. Rebuild library after improving source documentation structure

### ”Library not found”

**Cause:** Library file doesn’t exist at the expected path.

**Solutions:**

1. Check library exists: `libragen list`
2. Verify library path: `libragen config`
3. Use full path: `libragen query --library /path/to/my-lib.libragen "query"`
4. Check `LIBRAGEN_HOME` environment variable

## MCP Issues

### MCP server not loading

**Cause:** Configuration syntax error or server not installed.

**Solutions:**

1. Verify JSON syntax in config file
2. Reinstall: `npx -y install-mcp @libragen/mcp`
3. Check `npx` is in your PATH
4. Restart your AI tool completely (not just the chat)

### npx cache corruption (ENOTEMPTY error)

**Cause:** The npx cache can become corrupted, especially with native modules like `better-sqlite3`. You may see errors like:

```plaintext
ENOTEMPTY: directory not empty, rename '.../node_modules/better-sqlite3'
```

**Solutions:**

1. Clear the corrupted npx cache:

```bash
rm -rf ~/.npm/_npx
```

2. Restart your AI tool

**Prevention:** For more reliable operation, install globally instead of using npx:

```bash
npm install -g @libragen/mcp
```

Then update your MCP config to use the global install:

```json
{
  "mcpServers": {
    "libragen": {
      "command": "libragen-mcp"
    }
  }
}
```

### “No libraries found” in MCP

**Cause:** Libraries not in the default directory.

**Solutions:**

1. Check library location: `libragen config`
2. Install libraries to default location: `libragen install my-lib.libragen`
3. Set custom path in MCP config:

```json
{
  "mcpServers": {
    "libragen": {
      "command": "npx",
      "args": ["-y", "@libragen/mcp"],
      "env": {
        "LIBRAGEN_HOME": "/path/to/your/libragen"
      }
    }
  }
}
```

### MCP queries are slow

**Cause:** First query downloads the embedding model.

**Solution:** First query takes 10-30 seconds to download the model. Subsequent queries should be fast (<1 second).

### AI tool not using libragen

**Cause:** Tool not configured to use MCP, or server not registered.

**Solutions:**

1. Verify MCP is enabled in your AI tool’s settings
2. Check server appears in tool’s MCP server list
3. Explicitly ask the AI to “search my libragen libraries for…”
4. Restart the AI tool after config changes

## Library Management

### ”Library already installed”

**Cause:** A library with the same name already exists.

**Solution:** Use `--force` to overwrite:

```bash
libragen install my-lib.libragen --force
```

### Can’t uninstall library

**Cause:** Library is referenced by a collection, or file doesn’t exist.

**Solutions:**

1. Uninstall the collection first: `libragen collection uninstall <collection>`
2. Manually delete the file from the library directory
3. Check library location: `libragen config`

### Library file is corrupted

**Cause:** Incomplete download or disk error.

**Solutions:**

1. Delete and rebuild: `rm my-lib.libragen && libragen build ...`
2. Re-download from collection: `libragen install my-lib --force`
3. Verify file integrity with `libragen inspect my-lib.libragen`

## Platform-Specific Issues

### macOS: “operation not permitted”

**Cause:** macOS security restrictions.

**Solutions:**

1. Grant Terminal full disk access in System Preferences → Privacy & Security
2. Run from a different directory (outside \~/Desktop, \~/Documents)

### Windows: Path too long

**Cause:** Windows path length limits.

**Solutions:**

1. Enable long paths in Windows (requires admin)
2. Use shorter directory names
3. Move project closer to drive root

### Linux: Missing dependencies

**Cause:** System libraries required for native modules.

**Solution:**

```bash
# Debian/Ubuntu
sudo apt-get install build-essential python3


# RHEL/CentOS
sudo yum groupinstall "Development Tools"
```

## Getting Help

If you’re still stuck:

1. **Check version:** `libragen --cli-version` (ensure you’re on the latest)
2. **View config:** `libragen config` (verify paths are correct)
3. **Verbose output:** Add `--verbose` to commands for more details
4. **GitHub Issues:** [Report a bug](https://github.com/libragen/libragen/issues)

---

# Give Claude Desktop a Memory

> Help Claude understand your codebase, projects, and context with a personal knowledge library

You’ve just joined a new team. The codebase is massive, the wiki is a maze, and everyone keeps saying “oh, that’s in the ADRs somewhere.” You need answers, but Claude doesn’t know anything about *your* specific project.

In this tutorial, you’ll build a knowledge library from your project’s documentation and connect it to Claude Desktop. By the end, Claude will answer questions like “Why did we choose Postgres over MongoDB?” or “What’s the deployment process for staging?” using your actual docs.

**Time required:** \~15 minutes

**The scenario:** You’ve inherited a codebase called “Nexus” — a payments platform with years of history, dozens of services, and tribal knowledge scattered across READMEs, ADRs, wiki exports, and onboarding docs.

## What You’ll Build

A searchable library containing:

* Architecture Decision Records (ADRs)
* Service READMEs
* Wiki/Confluence exports
* Onboarding documentation
* Team conventions and standards

Once connected, you can ask Claude things like:

> “What’s the retry logic for failed payments?”

And get answers grounded in your actual documentation:

> Based on the payment-service README and ADR-015, failed payments use exponential backoff with a maximum of 5 retries over 24 hours. The retry intervals are 1min, 5min, 30min, 2hr, and 24hr. After exhausting retries, the payment is marked as `FAILED_PERMANENT` and triggers an alert to the on-call engineer…

## Prerequisites

* Node.js 20 or later
* Claude Desktop installed
* A folder of project documentation (we’ll create sample docs if you don’t have any)

## Step 1: Gather Your Documentation

Create a folder structure for your project knowledge:

```bash
mkdir -p nexus-docs/{adrs,services,wiki,onboarding}
```

### Architecture Decision Records

ADRs capture the “why” behind technical decisions. Create a sample:

```bash
cat > nexus-docs/adrs/ADR-015-payment-retry-strategy.md << 'EOF'
# ADR-015: Payment Retry Strategy


## Status
Accepted (2023-06-15)


## Context
Failed payments need automatic retry logic. We considered:
1. Fixed interval retries
2. Exponential backoff
3. Smart retry based on failure type


## Decision
We'll use exponential backoff with failure-type awareness:
- Retry intervals: 1min, 5min, 30min, 2hr, 24hr
- Network errors: full retry sequence
- Card declined: no retry (immediate permanent failure)
- Insufficient funds: retry at 24hr intervals only


## Consequences
- Reduces unnecessary retries for permanent failures
- Improves success rate for transient issues
- Requires failure classification in payment processor integration
EOF
```

### Service Documentation

Add a service README:

````bash
cat > nexus-docs/services/payment-service-README.md << 'EOF'
# Payment Service


Core service handling all payment processing for Nexus.


## Overview
- **Language:** Go 1.21
- **Database:** PostgreSQL 15
- **Message Queue:** RabbitMQ
- **External APIs:** Stripe, PayPal, Adyen


## Key Endpoints


### POST /api/v1/payments
Creates a new payment intent.


```json
{
  "amount": 1000,
  "currency": "USD",
  "customer_id": "cust_123",
  "payment_method": "card"
}
```


### GET /api/v1/payments/:id
Retrieves payment status and history.


## Retry Logic
See ADR-015 for retry strategy. Key points:
- Max 5 retries over 24 hours
- Exponential backoff: 1m, 5m, 30m, 2h, 24h
- Permanent failures skip retry queue


## Local Development
```bash
make dev        # Start with hot reload
make test       # Run test suite
make migrate    # Run database migrations
```


## Deployment
Deployed via ArgoCD. See wiki/deployment-process.md for details.
EOF
````

### Wiki Export

Add some wiki-style documentation:

````bash
cat > nexus-docs/wiki/deployment-process.md << 'EOF'
# Deployment Process


## Environments
| Environment | Branch | Auto-deploy |
|-------------|--------|-------------|
| Development | main | Yes |
| Staging | main | Yes (after dev) |
| Production | release/* | Manual approval |


## Staging Deployment
1. Merge PR to main
2. CI runs tests and builds Docker image
3. ArgoCD detects new image, deploys to dev
4. After 30min soak, auto-promotes to staging
5. Slack notification to #deployments


## Production Deployment
1. Create release branch: `release/v1.2.3`
2. CI builds and tags image
3. Create deployment PR in `nexus-infra` repo
4. Get approval from on-call engineer
5. Merge PR, ArgoCD deploys
6. Monitor dashboards for 15min


## Rollback
```bash
# Quick rollback to previous version
kubectl rollout undo deployment/payment-service -n payments


# Rollback to specific version
argocd app rollback payment-service <revision>
```


## Emergency Contacts
- On-call: #incident-response Slack channel
- Platform team: @platform-oncall
EOF
````

### Onboarding Docs

````bash
cat > nexus-docs/onboarding/new-engineer-guide.md << 'EOF'
# New Engineer Onboarding


Welcome to Nexus! Here's what you need to know.


## First Week Checklist
- [ ] Get access to GitHub org
- [ ] Join Slack channels: #engineering, #deployments, #incidents
- [ ] Set up local dev environment (see below)
- [ ] Complete security training
- [ ] Shadow an on-call shift


## Local Setup


### Prerequisites
- Docker Desktop
- Go 1.21+
- Node 24+ (for frontend services)
- PostgreSQL client (`psql`)


### Getting Started
```bash
# Clone the monorepo
git clone git@github.com:nexus/platform.git
cd platform


# Start infrastructure
make infra-up  # Starts Postgres, Redis, RabbitMQ


# Run the service you're working on
cd services/payment-service
make dev
```


## Architecture Overview
Nexus is a microservices platform with ~30 services. Key services:
- **payment-service**: Core payment processing
- **customer-service**: Customer data and profiles
- **notification-service**: Email, SMS, push notifications
- **gateway**: API gateway and authentication


Services communicate via:
- REST APIs for synchronous calls
- RabbitMQ for async events
- gRPC for internal high-throughput paths


## Code Style
- Go: follow `go fmt` and our `.golangci.yml`
- TypeScript: Prettier + ESLint config in repo
- Commits: Conventional Commits format
- PRs: Require 1 approval, CI must pass


## Questions?
- Technical: Ask in #engineering or your team channel
- Process: Ask your onboarding buddy
- Urgent: #incident-response
EOF
````

## Step 2: Build the Library

Now build a libragen library from all this documentation:

```bash
libragen build ./nexus-docs \
  --name nexus-knowledge \
  --description "Nexus platform documentation, ADRs, and tribal knowledge"
```

You’ll see:

```plaintext
Processing files...
  ✓ 4 files processed
  ✓ 28 chunks created
  ✓ Embeddings generated
  ✓ Full-text index built


Library saved: nexus-knowledge-1.0.0.libragen (1.2 MB)
```

## Step 3: Connect to Claude Desktop

Install the libragen MCP server:

```bash
npx -y install-mcp @libragen/mcp
```

This automatically configures Claude Desktop. Restart Claude Desktop to load the new server.

### Verify the Connection

In Claude Desktop, you should see “libragen” in the MCP tools list (click the hammer icon). You can also ask Claude:

> “What libraries do you have access to?”

Claude should respond mentioning `nexus-knowledge`.

## Step 4: Ask Claude About Your Project

Now the magic happens. Ask Claude questions and watch it search your actual docs:

### Understanding Architecture

> “How do services communicate in Nexus?”

Claude searches your library and responds with specifics from your onboarding doc:

> Services in Nexus communicate through three mechanisms:
>
> 1. **REST APIs** for synchronous request/response calls
> 2. **RabbitMQ** for asynchronous event-driven communication
> 3. **gRPC** for internal high-throughput paths between services
>
> The platform has about 30 microservices, with key ones being payment-service, customer-service, notification-service, and the API gateway…

### Finding Tribal Knowledge

> “Why do we use exponential backoff for payment retries instead of fixed intervals?”

Claude finds ADR-015:

> According to ADR-015 (accepted June 2023), the team chose exponential backoff over fixed intervals for several reasons:
>
> 1. **Reduces unnecessary retries** for permanent failures like declined cards
> 2. **Improves success rates** for transient issues like network errors
> 3. **Failure-type awareness** — different retry strategies based on why the payment failed
>
> For example, card declines don’t retry at all (immediate permanent failure), while network errors get the full sequence: 1min, 5min, 30min, 2hr, 24hr…

### Deployment Questions

> “How do I deploy to production?”

Claude walks you through the process from your wiki:

> To deploy to production:
>
> 1. Create a release branch: `release/v1.2.3`
> 2. CI automatically builds and tags the Docker image
> 3. Create a deployment PR in the `nexus-infra` repo
> 4. Get approval from the on-call engineer
> 5. Merge the PR — ArgoCD handles the actual deployment
> 6. Monitor dashboards for 15 minutes post-deploy
>
> For rollback, you can use: `kubectl rollout undo deployment/payment-service -n payments`

### Onboarding Help

> “I’m new here. What Slack channels should I join?”

> Welcome! Based on the onboarding guide, you should join:
>
> * **#engineering** — General engineering discussions
> * **#deployments** — Deployment notifications and coordination
> * **#incidents** — Incident response and on-call communication
>
> Your first week, you should also shadow an on-call shift and complete security training…

## Step 5: Keep It Growing

As you learn more about the codebase, add to your library:

### Add Meeting Notes

When you learn something important in a meeting:

```bash
cat > nexus-docs/wiki/payment-v2-migration.md << 'EOF'
# Payment v2 Migration Plan


From architecture review 2024-01-15:


We're migrating from the legacy payment processor to Stripe-only.
Timeline: Q2 2024


Key changes:
- Remove PayPal integration (< 2% of transactions)
- Migrate Adyen merchants to Stripe
- Update webhook handlers
- New reconciliation process


Contact: Sarah (payments team lead)
EOF
```

Rebuild:

```bash
libragen build ./nexus-docs \
  --name nexus-knowledge \
  --description "Nexus platform documentation, ADRs, and tribal knowledge"
```

### Add Code Archaeology Notes

When you figure out why something weird exists:

```bash
cat > nexus-docs/wiki/why-dual-database-writes.md << 'EOF'
# Why We Have Dual Database Writes


Found this during investigation 2024-02-01.


The payment-service writes to both PostgreSQL AND the legacy Oracle database.
This looks like a bug but it's intentional.


**Reason:** The finance team's reporting system still reads from Oracle.
Migration was planned for 2022 but blocked by compliance requirements.


**Status:** Oracle writes can be removed after Project Phoenix completes (ETA Q3 2024).


**Code location:** `internal/repository/payment_repository.go`, line 142
EOF
```

Now when future-you (or a teammate) asks Claude “why does payment-service write to two databases?”, they’ll get the answer.

## Tips for Maximum Value

### 1. Document the Undocumented

The most valuable additions are things that *aren’t* written down yet:

* Why a workaround exists
* Who to ask about specific systems
* What that cryptic error message actually means
* Historical context that explains current decisions

### 2. Use Consistent Language

If your team calls it “the payment service,” don’t call it “payments-svc” in your docs. Consistent terminology improves search accuracy.

### 3. Include Examples

Real examples (sanitized if needed) help Claude give better answers:

```markdown
## Common Errors


### "PAYMENT_PROCESSOR_TIMEOUT"
Usually means Stripe is slow. Check status.stripe.com.
Typical during Black Friday / high-traffic events.
**Fix:** Usually resolves itself. If persistent, check our Stripe dashboard for rate limiting.
```

### 4. Update Regularly

Set a reminder to rebuild weekly:

```bash
# Add to your shell aliases
alias update-nexus='libragen build ~/work/nexus-docs --name nexus-knowledge --description "Nexus platform docs"'
```

## What You’ve Accomplished

You now have:

* ✅ A searchable knowledge base of your project’s documentation
* ✅ Claude Desktop connected to your actual project context
* ✅ The ability to ask natural language questions and get grounded answers
* ✅ A system that grows smarter as you add more documentation

The next time someone asks “how does X work?”, you can either answer from your library or add the answer to it — building institutional knowledge that outlasts any individual team member.

## Next Steps

* [MCP Integration](/docs/mcp) — Advanced MCP configuration
* [Building Libraries](/docs/building) — Chunking strategies for better search
* [CLI Reference](/docs/cli) — All command options

---

# Make Your Obsidian Vault AI-Searchable

> Turn your personal Obsidian notes into a private RAG library for AI assistants

Your Obsidian vault contains years of notes, ideas, and knowledge. In this tutorial, you’ll package it as a libragen library so AI tools can search your personal knowledge base—completely offline and private.

**Time required:** \~5 minutes

**What you’ll learn:**

* Building a library from an Obsidian vault
* Handling wiki-links and frontmatter
* Querying your notes with AI
* Keeping your library in sync

## Why This Matters

When you ask an AI assistant a question, it only knows what’s in its training data. But your Obsidian vault contains:

* Meeting notes and decisions
* Project documentation
* Research and learning notes
* Personal workflows and processes

With libragen, your AI can search your actual notes and give you answers grounded in your own knowledge.

## Prerequisites

* Node.js 20 or later
* An Obsidian vault with markdown notes

## Step 1: Build from Your Vault

Point libragen at your Obsidian vault directory:

```bash
libragen build ~/Documents/ObsidianVault \
  --name my-brain \
  --description "My personal knowledge base"
```

Libragen processes all `.md` files recursively, including nested folders.

### Excluding Folders

Skip templates, daily notes, or other folders you don’t want indexed:

```bash
libragen build ~/Documents/ObsidianVault \
  --name my-brain \
  --exclude "**/Templates/**" \
  --exclude "**/Daily Notes/**" \
  --exclude "**/.obsidian/**"
```

### Including Attachments

If you have text-based attachments (like `.txt` files), include them:

```bash
libragen build ~/Documents/ObsidianVault \
  --name my-brain \
  --include "**/*.md" \
  --include "**/*.txt"
```

## Step 2: Test Your Library

Query your notes from the command line:

```bash
libragen query --library my-brain "What did we decide about the API redesign?"
```

You’ll see results from your actual notes:

```plaintext
Results for: "What did we decide about the API redesign?"


[0.91] Projects/API Redesign/Meeting Notes 2024-03-15.md
  Decision: We'll use REST for public endpoints and GraphQL for internal
  dashboard. Timeline is Q2. Sarah owns the migration plan...


[0.87] Projects/API Redesign/Architecture.md
  The new API will follow resource-oriented design. Authentication moves
  to JWT with refresh tokens. Rate limiting at 1000 req/min...


[0.82] Areas/Engineering/Technical Decisions.md
  API Redesign: Approved 2024-03-15. See [[API Redesign/Meeting Notes]]
  for full context...
```

## Step 3: Connect to Your AI Tool

### Claude Desktop, Cursor, or Windsurf

Install the MCP server:

```bash
npx -y install-mcp @libragen/mcp
```

Now you can ask Claude or your AI coding assistant:

> “Search my notes for everything about the onboarding flow redesign”

> “What were the key points from my 1:1 with Sarah last month?”

> “Find my notes on Kubernetes networking”

The AI searches your library and responds with information from your actual notes.

### Privacy Note

Your notes never leave your machine. Libragen runs entirely locally:

* Embeddings are generated on your device
* The library file stays on your disk
* MCP queries happen locally
* No data is sent to any cloud service

## Step 4: Keep It Updated

Your vault changes constantly. Rebuild periodically to keep your library current.

### Manual Rebuild

```bash
libragen build ~/Documents/ObsidianVault \
  --name my-brain \
  --description "My personal knowledge base"
```

This overwrites the previous version.

### Automated Rebuild with a Script

Create a simple update script (`update-brain.sh`):

```bash
#!/bin/bash
libragen build ~/Documents/ObsidianVault \
  --name my-brain \
  --description "My personal knowledge base - Updated $(date +%Y-%m-%d)" \
  --exclude "**/Templates/**" \
  --exclude "**/Daily Notes/**" \
  --exclude "**/.obsidian/**"


echo "✓ Library updated at $(date)"
```

Run it weekly or whenever you’ve added significant notes.

### Automated Rebuild with Cron (macOS/Linux)

Update daily at 6 AM:

```bash
# Edit crontab
crontab -e


# Add this line
0 6 * * * /path/to/update-brain.sh >> /tmp/libragen-update.log 2>&1
```

## Advanced: Handling Obsidian Features

### Wiki-Links

Libragen preserves wiki-link syntax (`[[Note Name]]`) in the indexed content. When you query, results may include these links, helping you find related notes.

### Frontmatter

YAML frontmatter is included in the searchable content. If your notes have metadata like:

```yaml
---
tags: [project, api, architecture]
status: in-progress
owner: Sarah
---
```

You can search for it:

```bash
libragen query --library my-brain "notes owned by Sarah"
```

### Dataview Queries

Dataview query blocks are indexed as-is. While libragen won’t execute them, the query text itself becomes searchable, which can help you find notes with specific query patterns.

### Canvas Files

Obsidian canvas files (`.canvas`) are JSON and won’t be processed by default. If you want to index them, you’d need to extract the text content first.

## Tips for Better Results

### 1. Use Descriptive Titles

Notes with clear titles rank higher in search:

* ✅ `API Redesign - Architecture Decision Record`
* ❌ `Untitled 23`

### 2. Add Context to Notes

Include project names, people, and dates in your notes. This makes semantic search more effective:

```markdown
## Meeting: API Redesign Kickoff
**Date:** 2024-03-15
**Attendees:** Sarah, Mike, Alex


Discussed the timeline for migrating to the new REST API...
```

### 3. Use Consistent Terminology

If you call something “API” in some notes and “backend service” in others, search may miss connections. Pick consistent terms for important concepts.

### 4. Organize with MOCs

Maps of Content (MOCs) that link related notes together provide additional context for search, since the links themselves are indexed.

## Troubleshooting

### ”Too many files” or slow processing

Large vaults (10,000+ notes) may take a while. Use `--exclude` aggressively:

```bash
libragen build ~/Documents/ObsidianVault \
  --name my-brain \
  --exclude "**/Archive/**" \
  --exclude "**/Daily Notes/**" \
  --exclude "**/Templates/**"
```

### Notes not appearing in results

Check that:

1. The file has a `.md` extension
2. It’s not in an excluded folder
3. The file isn’t empty or very short (< 50 characters)

### Out of memory

For very large vaults:

```bash
NODE_OPTIONS="--max-old-space-size=8192" libragen build ~/Documents/ObsidianVault --name my-brain
```

## Next Steps

* [Building Libraries](/docs/building) - Advanced chunking and processing options
* [MCP Integration](/docs/mcp) - Configure AI tool connections
* [CLI Reference](/docs/cli) - Full command documentation

---

# Package React Docs for AI

> Step-by-step guide to creating a searchable RAG library from React documentation

In this tutorial, you’ll create a libragen library from the React documentation. By the end, you’ll have a searchable library that any AI tool can query for accurate React answers.

**Time required:** \~10 minutes

**What you’ll learn:**

* Downloading documentation from a repository
* Building a library with custom metadata
* Querying the library from CLI and MCP
* Versioning and updating libraries

## Prerequisites

* Node.js 20 or later
* Basic familiarity with the command line

## Step 1: Get the React Documentation

First, let’s download the React documentation. The React team maintains their docs as MDX files in their GitHub repository.

```bash
# Create a working directory
mkdir react-library && cd react-library


# Clone just the docs (sparse checkout)
git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/reactjs/react.dev.git
cd react.dev
git sparse-checkout set src/content/reference src/content/learn
```

You now have the React reference and learning docs locally.

## Step 2: Build the Library

Build a libragen library from the documentation:

```bash
libragen build ./src/content \
  --name react-docs \
  --description "React documentation including hooks, components, and APIs" \
  --content-version 19.0.0
```

You’ll see output like:

```plaintext
Processing files...
  ✓ 142 files processed
  ✓ 1,847 chunks created
  ✓ Embeddings generated
  ✓ Full-text index built


Library saved: react-docs-19.0.0.libragen (12.4 MB)
```

The library is now ready to use.

## Step 3: Query from the CLI

Test your library with a query:

```bash
libragen query --library react-docs "When should I use useEffect vs useLayoutEffect?"
```

You’ll get results with relevance scores:

```plaintext
Results for: "When should I use useEffect vs useLayoutEffect?"


[0.92] reference/react/useLayoutEffect.md
  useLayoutEffect is a version of useEffect that fires before the browser
  repaints the screen. useLayoutEffect can hurt performance. Prefer useEffect
  when possible...


[0.89] reference/react/useEffect.md
  useEffect is a React Hook that lets you synchronize a component with an
  external system. Effects run after render, so they don't block painting...


[0.84] learn/synchronizing-with-effects.md
  Some components need to synchronize with external systems. For example,
  you might want to control a non-React component based on React state...
```

## Step 4: Use with Your AI Tool

### Option A: MCP Integration (Recommended)

Install the MCP server for your AI tool:

```bash
npx -y install-mcp @libragen/mcp
```

This automatically configures Claude Desktop, Cursor, Windsurf, or other MCP-compatible tools. Now you can ask your AI:

> “Using my React library, explain the difference between controlled and uncontrolled components”

The AI will search your library and provide accurate, grounded answers.

### Option B: Programmatic Access

Use the library in your own code:

```typescript
import { Searcher, VectorStore } from '@libragen/core';


const store = await VectorStore.open('react-docs');
const searcher = new Searcher(store);


const results = await searcher.search('useState best practices', { topK: 5 });


for (const result of results) {
  console.log(`[${result.score.toFixed(2)}] ${result.source}`);
  console.log(result.content);
}
```

## Step 5: Update When React Updates

When a new React version is released, update your library:

```bash
# Pull latest docs
cd react.dev
git pull


# Rebuild with new version
libragen build ./src/content \
  --name react-docs \
  --content-version 19.1.0
```

Both versions are now available. Query a specific version:

```bash
libragen query \
  --library react-docs \
  --content-version 19.0.0 \
  "useEffect cleanup"
```

## Step 6: Share Your Library

### Option A: Direct File Sharing

The `.libragen` file is self-contained. Share it via:

* File hosting (S3, GitHub Releases, etc.)
* Direct transfer
* Package registry

Others can install it:

```bash
libragen install ./react-docs-19.0.0.libragen
```

### Option B: Add to a Collection

Create a collection manifest to bundle related libraries:

```json
{
  "name": "frontend-stack",
  "description": "Frontend development documentation",
  "libraries": [
    {
      "name": "react-docs",
      "url": "https://example.com/react-docs-19.0.0.libragen"
    },
    {
      "name": "typescript-docs",
      "url": "https://example.com/typescript-docs-5.7.0.libragen"
    }
  ]
}
```

Others can install the entire collection:

```bash
libragen install --collection https://example.com/frontend-stack.json
```

## Troubleshooting

### ”Out of memory” during embedding

Large documentation sets may need more memory:

```bash
NODE_OPTIONS="--max-old-space-size=8192" libragen build ./docs
```

### Slow embedding generation

Use the `--batch-size` flag to process fewer chunks at once:

```bash
libragen build ./docs --batch-size 50
```

### Files not being processed

Check that your files have supported extensions (`.md`, `.mdx`, `.txt`, `.html`). Use `--include` to add custom patterns:

```bash
libragen build ./docs --include "**/*.rst"
```

## Next Steps

Now that you’ve built your first library:

* [Building Libraries](/docs/building) - Advanced options like chunking strategies
* [CI Integration](/docs/ci-integration) - Automate builds on every release
* [Collections](/docs/collections) - Create and publish collection manifests
* [API Reference](/docs/api) - Use libragen programmatically