Knowledge Bank

Subgraphs

A complete guide to building, deploying, and querying subgraphs for blockchain data indexing using The Graph protocol

Introduction to subgraphs

Subgraphs are the indexing and querying units used within The Graph protocol. They define how on-chain data should be extracted, processed, and served through a GraphQL API. Subgraphs are essential for building decentralized applications that require fast and reliable access to blockchain state and historical data.

Rather than querying a blockchain node directly for each interaction, developers create subgraphs to transform raw events and calls into structured, queryable datasets. These subgraphs run on The Graph’s decentralized network or its hosted service and serve as the backend data layer for many of the most popular dApps.

Subgraphs are written in a declarative way. Developers specify which smart contract events to listen to, how to transform those events into entities, and what GraphQL schema the final API should expose. This model enables clean separation between data generation on-chain and data consumption off-chain.

Understanding The Graph protocol

The Graph protocol is an indexing infrastructure that allows querying blockchain data through GraphQL. It supports various EVM-compatible chains like Ethereum, Polygon, Avalanche, and others. The core components of the protocol include:

Indexers, who run nodes that index data and serve queries

Curators, who signal valuable subgraphs using GRT tokens

Delegators, who stake GRT with indexers to secure the network

Consumers, who query subgraphs using GraphQL

Subgraph developers define the indexing logic and deploy it to the network. Once deployed, their subgraphs become accessible via GraphQL endpoints and are maintained by indexers without requiring centralized APIs.

The protocol provides deterministic indexing through WASM-based mappings, scalability through sharding and modular design, and economic incentives to ensure data availability and integrity.

Anatomy of a subgraph

A subgraph project is composed of a few critical files and directories. These define how The Graph will extract and structure the data:

subgraph.yaml This is the manifest file. It defines the network, data sources (contracts), event handlers, and mappings. It instructs The Graph which chain to listen to, which contracts to monitor, and how to process specific events or function calls.

schema.graphql This file defines the GraphQL schema for the subgraph. It declares entity types, their fields, relationships, and any indexing options. These types become the basis for how data can be queried later on.

AssemblyScript mappings Mapping files are written in AssemblyScript, a TypeScript-like language that compiles to WebAssembly. These files include event handlers that transform blockchain data into entities. They run in a sandboxed environment and cannot make HTTP calls or access off-chain storage.

Generated types Using the Graph CLI, developers generate TypeScript definitions for events, entities, and contract bindings. This allows safe and predictable manipulation of blockchain data within mapping functions.

These elements combine to form a complete subgraph that listens to smart contract events, transforms them into structured data, and exposes them via a GraphQL API.

Sample subgraph structure

A typical subgraph structure in the project directory might look like this:

├── subgraph.yaml
├── schema.graphql
├── src/
│   └── mapping.ts
├── generated/
│   ├── schema.ts
│   └── contract/
│       └── Contract.ts
├── abis/
│   └── Contract.json

Each file plays a specific role in defining how data flows from the blockchain into a queryable dataset.

  • subgraph.yaml specifies the smart contract and handlers
  • schema.graphql defines the API
  • mapping.ts transforms events into entities
  • generated/ holds types used in the mappings
  • abis/ contains the smart contract ABI required to decode events

This structure ensures the subgraph remains modular, readable, and easy to maintain.

Defining the GraphQL schema

The schema file in a subgraph project is named schema.graphql. It defines the data model of the subgraph. Each data model is known as an entity and is represented using GraphQL type definitions.

Entities are stored in the subgraph's underlying database and can be queried via GraphQL. Each entity must have an id field of type ID, which serves as the unique identifier.

Example schema:

type Profile @entity {
  id: ID!
  owner: Bytes!
  name: String!
  createdAt: BigInt!
}

The schema supports scalar types like String, Int, BigInt, Bytes, Boolean, and ID. Entities can also reference other entities using one-to-one or one-to-many relationships, which are established by using @derivedFrom on the related side.

type User @entity {
  id: ID!
  profiles: [Profile!]! @derivedFrom(field: "owner")
}

This schema describes a one-to-many relationship from User to Profile, where each profile is linked to a user via the owner field.

The schema must be aligned with the data emitted by smart contract events. Each time an event is handled, new instances of these entities are created or updated through the mapping logic.

Writing the subgraph manifest

The manifest file subgraph.yaml tells The Graph which blockchain to connect to, which contracts to monitor, and which handlers to invoke. It also defines the schema and mapping files.

A minimal example of a subgraph manifest:

specVersion: 0.0.4
description: Tracks profiles on-chain
repository: https://github.com/example/profile-subgraph
schema:
  file: ./schema.graphql
 
dataSources:
  - kind: ethereum
    name: ProfileContract
    network: mainnet
    source:
      address: "0x1234567890abcdef..."
      abi: ProfileContract
      startBlock: 15000000
    mapping:
      kind: ethereum/events
      apiVersion: 0.0.7
      language: wasm/assemblyscript
      entities:
        - Profile
      abis:
        - name: ProfileContract
          file: ./abis/ProfileContract.json
      eventHandlers:
        - event: ProfileCreated(indexed address,indexed bytes32,string,uint256)
          handler: handleProfileCreated
      file: ./src/mapping.ts

This manifest tells The Graph to watch the ProfileCreated event emitted by a contract deployed at a specific address on Ethereum mainnet. The event will be handled by the handleProfileCreated function in the mapping file.

The startBlock is used to optimize syncing by skipping historical blocks that do not contain relevant data. The ABI is needed to decode event parameters.

Creating the mapping logic

Mapping logic is written in AssemblyScript and placed in the src folder. It contains functions that respond to smart contract events and generate or update entities based on event data.

Example mapping for ProfileCreated:

import { ProfileCreated } from "../generated/ProfileContract/ProfileContract";
import { Profile } from "../generated/schema";
 
export function handleProfileCreated(event: ProfileCreated): void {
  let entity = new Profile(event.params.profileId.toHex());
  entity.owner = event.params.owner;
  entity.name = event.params.name;
  entity.createdAt = event.block.timestamp;
  entity.save();
}

The event object gives access to event parameters, transaction metadata, and block context. The handler creates a new Profile entity using the profile ID as the key, assigns values, and saves it to the store.

Handlers must always ensure that IDs are unique and types are compatible with the schema. Entities are saved using the .save() method and will be queryable via the GraphQL API after indexing.

Generating types from the schema

To safely work with entity types and contract events in the mapping file, developers use the Graph CLI to generate code from the schema and ABI.

The command is:

graph codegen

This generates TypeScript files under the generated/ directory. These include:

  • Type-safe classes for each entity in schema.ts
  • Smart contract bindings for each ABI in its respective folder

These generated types eliminate common errors, offer autocompletion, and ensure consistency between the schema, mappings, and event definitions.

Entities in the generated code extend the Entity class and expose getters and setters for each field, along with type casting and default value helpers.

Deploying a subgraph

The CLI packages the files, uploads the schema, manifest, mappings, and ABIs, and registers the subgraph with the indexing service. A deployment hash is generated, which serves as a reference to the current version.

Querying with GraphQL

Once deployed and synced, subgraphs expose a GraphQL endpoint. This allows front-end applications, analytics tools, or external APIs to query the data.

The GraphQL schema is derived directly from schema.graphql, and every entity becomes a queryable type. Developers can retrieve data with flexible queries such as:

{
  profiles(first: 5, orderBy: createdAt, orderDirection: desc) {
    id
    name
    owner
    createdAt
  }
}

Filters can be applied using where clauses to narrow down results.

{
  profiles(where: { owner: "0xabc..." }) {
    name
    id
  }
}

Pagination is supported using first and skip, enabling efficient rendering of lists and infinite scroll interfaces.

Sorting can be applied using orderBy on indexed fields, with asc or desc directions.

Subgraphs enable complex data consumption with minimal performance cost, as all queries run against a pre-indexed store maintained by indexers.

Versioning and schema migration

As smart contracts evolve or new requirements emerge, subgraphs need to be versioned. Developers can deploy multiple versions of a subgraph and mark one as the current production version.

Each version has a unique deployment ID, and changes to the schema, event handlers, or mappings require a new deployment.

To support schema changes:

  • Update the schema.graphql
  • Regenerate types using graph codegen
  • Update mappings accordingly
  • Test and deploy as a new version

The old version remains accessible but no longer receives new data. This allows safe rollouts, testing, and rollback if necessary.

Performance optimization and indexing

Efficient indexing is crucial to ensure that subgraphs sync quickly and serve queries promptly. Developers can improve performance by:

Reducing the number of entities created per event

Avoiding heavy use of derived fields or reverse lookups

Minimizing unnecessary state updates to entities

Reducing large loops or repetitive logic in mappings

Setting an appropriate startBlock to skip unnecessary historical data

Avoiding cross-contract or recursive calls in mappings, which are unsupported

Indexing speed depends on block density, event frequency, and mapping logic complexity. For high-volume subgraphs, batching and conditional logic can help reduce bottlenecks.

Indexing logs can be monitored via the dashboard or CLI to debug issues such as failed events, missing ABI entries, or type mismatches.

Debugging and testing

Subgraphs can be tested locally using a forked chain or mock events. The Graph CLI supports:

  • Code generation and validation
  • Manifest linting
  • Subgraph simulation

For contract testing, frameworks like Hardhat or Foundry can emit events and verify the subgraph behavior using integration test setups.

Event handlers should be designed to be deterministic and side-effect-free. Subgraphs do not persist intermediate state or allow external API calls, so all logic must be pure and repeatable.

Logs and sync statuses provide real-time feedback on indexing progress, failed handlers, or schema violations.

Proper testing reduces production errors and ensures reliable data for users and applications.

Using dynamic data sources

In many decentralized applications, new smart contracts are deployed at runtime. These contracts are not known in advance, so they cannot be declared statically in the manifest file. To support this, subgraphs allow the use of dynamic data sources.

Dynamic data sources are created at runtime through templates. When a subgraph encounters a specific event, it can instantiate a new data source to begin listening to events from a new contract.

For example, a factory contract may emit a ProfileDeployed event each time a new profile contract is created. The subgraph listens to this factory and, upon detecting the event, dynamically spawns a new data source for the deployed profile contract.

import { ProfileTemplate } from "../generated/templates";
import { ProfileDeployed } from "../generated/Factory/Factory";
 
export function handleProfileDeployed(event: ProfileDeployed): void {
  ProfileTemplate.create(event.params.newContract);
}

The template must be defined in the subgraph.yaml file:

templates:
  - name: ProfileTemplate
    kind: ethereum/contract
    network: mainnet
    source:
      abi: Profile
    mapping:
      kind: ethereum/events
      apiVersion: 0.0.7
      language: wasm/assemblyscript
      entities:
        - Profile
      abis:
        - name: Profile
          file: ./abis/Profile.json
      eventHandlers:
        - event: ProfileUpdated(indexed address,string)
          handler: handleProfileUpdated
      file: ./src/profile-mapping.ts

This pattern is commonly used in factory-based architectures such as token vaults, NFT launchpads, or DeFi protocols.

Indexing multiple contracts

Subgraphs can index multiple contracts by declaring multiple data sources in the manifest. Each data source can monitor a different contract, respond to different events, and use distinct handlers.

Example use case:

  • One contract manages token minting
  • Another manages metadata updates
  • A third handles permissions

Each contract can be added with its own dataSource entry, ABI, and event handlers.

dataSources:
  - kind: ethereum
    name: TokenContract
    source:
      address: "0x..."
      abi: Token
    mapping: ...
  - kind: ethereum
    name: MetadataContract
    source:
      address: "0x..."
      abi: Metadata
    mapping: ...

This approach allows modular indexing and accommodates complex systems where state is distributed across multiple contracts.

Subgraph templating and reuse

Subgraph templates promote reuse across similar deployments. Developers can publish and share subgraph configurations for common contract standards like ERC20, ERC721, or governance protocols.

Reusable subgraphs help accelerate development for ecosystems like:

  • NFT marketplaces
  • DAO voting systems
  • Liquidity pools

These templates abstract the contract interactions and provide ready-made GraphQL APIs. Developers only need to supply addresses and optional configuration overrides.

Templates can be versioned and imported using packages or remote includes via IPFS. This enables teams to collaborate and standardize data access across multiple dApps or sub-networks.

Use cases enabled by subgraphs

Subgraphs are a critical backend component in many types of decentralized applications. Some common use cases include:

Real-time token balances and holder snapshots

Historical trade activity for decentralized exchanges

NFT ownership and metadata browsing

DAO governance participation, proposals, and vote history

Streaming payments and claimable rewards tracking

Cross-chain bridge activity and relay verification

Portfolio analytics for wallet dashboards

Decentralized identity registries and attestations

Event-driven notification systems and protocol health monitors

By exposing data through a flexible and reliable GraphQL API, subgraphs allow developers to build rich user experiences without overloading blockchain nodes or dealing with raw logs.

Subgraphs also improve security and decentralization by removing the need for proprietary or centralized data APIs.

Community and ecosystem

The Graph ecosystem includes developers, indexers, curators, and contributors building tools and maintaining subgraphs across major chains. Popular subgraphs are used by Uniswap, Aave, Balancer, ENS, and many other leading protocols.

Developers can contribute by:

  • Publishing subgraphs to Graph Explorer or Subgraph Studio
  • Signaling high-quality subgraphs to support indexing on the network
  • Writing custom handlers for niche protocols or contract types
  • Sharing templates and tutorials for emerging standards

The community maintains libraries, tutorials, and starter kits to help developers bootstrap new subgraphs quickly and follow best practices.

Best practices for subgraph development

Subgraph development benefits from a consistent, modular, and testable approach. Following best practices ensures high performance, clarity, and reliability over time.

  • Use consistent entity naming and schema versioning. Entities should be named in singular form and grouped logically. Fields should be explicit, typed, and avoid ambiguous names.

  • Always define a unique and deterministic ID for each entity. Event parameters such as transaction hash, log index, or custom identifiers can be combined to avoid duplication.

  • Avoid using optional or nullable fields when not necessary. A well-defined schema allows for cleaner queries and better indexing performance.

  • Batch updates and reduce redundant writes to the store. Saving an entity after each minor change can increase load and indexing time. Perform calculations in memory before saving.

  • Use event-level filtering logic in mappings to avoid unnecessary processing. Only create or update entities when specific conditions are met.

  • Comment your mapping logic to explain transformations. AssemblyScript is not always familiar to all developers, and clarity in logic helps with collaboration.

  • Keep mapping logic pure and deterministic. Avoid side effects or state-dependent behavior that relies on prior events. Subgraphs must be replayable and idempotent.

  • Regenerate types with graph codegen after every schema or ABI change. This prevents runtime errors and ensures that mappings are aligned with the current data model.

  • Use @derivedFrom carefully. Derived fields must not be relied on for event handling, as they are computed post-indexing and cannot trigger side effects.

  • Test subgraphs locally using Ganache or Hardhat with a forked mainnet. Emit sample events and ensure that the Graph node indexes the subgraph correctly and returns accurate results.

Known limitations and constraints

While powerful, subgraphs come with design limitations developers must account for.

  • Subgraphs are read-only and cannot write to the blockchain or trigger transactions. They cannot send messages or invoke contract methods.

  • Mappings are sandboxed and cannot perform asynchronous operations, call external APIs, or access off-chain state.

  • Cross-contract reads are not supported. Mappings only receive event data and cannot query blockchain state beyond what is emitted in the event.

  • There is no native support for joins in GraphQL queries. Relationships must be explicitly defined in the schema and managed through entity linking.

  • Historical contract state prior to deployment of the subgraph is not available unless explicitly emitted and indexed.

  • Block-by-block tracking is only supported using block handlers, which can be computationally expensive and should be used sparingly.

  • There is no persistent cache between handler calls. All context must be passed through the event or reconstructed from existing entities.

  • Debugging AssemblyScript may lack advanced IDE support. Logging and replaying test events is often necessary to trace logic issues.

Performance tuning and scaling

Subgraphs can be optimized for speed and scalability through a series of techniques.

  • Set a meaningful startBlock to avoid indexing irrelevant history. Use the deployment block of the contract or the earliest event of interest.

  • Avoid storing large text strings or unnecessary data. Use IPFS hashes for content and load media off-chain when needed.

  • Use indexed fields for orderBy and where filters. This makes queries more efficient and improves response time.

  • Handle large event volumes by minimizing computations in each handler. Avoid loops over arrays when working with block or transaction metadata.

  • Group entity updates together and reduce entity cardinality when possible. Instead of creating a new entity per interaction, consider updating an aggregate counter or history record.

  • Use Graph Studio and Explorer tools to monitor indexing speed, handler runtimes, and potential performance bottlenecks.

  • Use dynamic data sources only when necessary. Static indexing is faster and simpler when contract addresses are known.

Future of subgraphs and The Graph protocol

The Graph protocol is evolving to support more chains, richer indexing features, and increased decentralization. The Graph Network is live and continues to expand its capabilities.

Support for more non-EVM chains such as Solana, NEAR, and Cosmos is under development. This will enable cross-chain indexing and unified data layers across ecosystems. Improvements to GraphQL querying, pagination, and derived field performance are underway. This will make front-end development faster and more flexible.

Zero-knowledge proofs and cryptographic verification of data integrity are being explored to ensure trustless querying at scale. Custom indexers and modular handlers are being developed to allow alternative indexing logic and chain-specific plugins. Subgraph Studio is expected to offer improved debugging, real-time monitoring, and better multi-environment management tools for developers.

The subgraph ecosystem will continue to grow as more protocols adopt The Graph as their indexing standard. Curated registries, version control, and marketplace discovery will be enhanced to make data services more accessible and sustainable. Subgraphs are a powerful abstraction that bridge on-chain data with off-chain usability. They allow developers to expose rich, queryable APIs from blockchain events with minimal overhead and maximum flexibility.

By defining clear schemas, mapping contract events, and deploying to a decentralized network of indexers, developers create robust backends that scale with their applications. Subgraphs are an essential part of Web3 infrastructure, enabling wallets, dashboards, marketplaces, and governance platforms to deliver fast and reliable data to users. Mastering their architecture and development process will remain a vital skill for anyone building the decentralized internet.