Subgraphs
A complete guide to building, deploying, and querying subgraphs for blockchain data indexing using The Graph protocol
Introduction to subgraphs
Subgraphs are the indexing and querying units used within The Graph protocol. They define how on-chain data should be extracted, processed, and served through a GraphQL API. Subgraphs are essential for building decentralized applications that require fast and reliable access to blockchain state and historical data.
Rather than querying a blockchain node directly for each interaction, developers create subgraphs to transform raw events and calls into structured, queryable datasets. These subgraphs run on The Graph’s decentralized network or its hosted service and serve as the backend data layer for many of the most popular dApps.
Subgraphs are written in a declarative way. Developers specify which smart contract events to listen to, how to transform those events into entities, and what GraphQL schema the final API should expose. This model enables clean separation between data generation on-chain and data consumption off-chain.
Understanding The Graph protocol
The Graph protocol is an indexing infrastructure that allows querying blockchain data through GraphQL. It supports various EVM-compatible chains like Ethereum, Polygon, Avalanche, and others. The core components of the protocol include:
Indexers, who run nodes that index data and serve queries
Curators, who signal valuable subgraphs using GRT tokens
Delegators, who stake GRT with indexers to secure the network
Consumers, who query subgraphs using GraphQL
Subgraph developers define the indexing logic and deploy it to the network. Once deployed, their subgraphs become accessible via GraphQL endpoints and are maintained by indexers without requiring centralized APIs.
The protocol provides deterministic indexing through WASM-based mappings, scalability through sharding and modular design, and economic incentives to ensure data availability and integrity.
Anatomy of a subgraph
A subgraph project is composed of a few critical files and directories. These define how The Graph will extract and structure the data:
subgraph.yaml This is the manifest file. It defines the network, data sources (contracts), event handlers, and mappings. It instructs The Graph which chain to listen to, which contracts to monitor, and how to process specific events or function calls.
schema.graphql This file defines the GraphQL schema for the subgraph. It declares entity types, their fields, relationships, and any indexing options. These types become the basis for how data can be queried later on.
AssemblyScript mappings Mapping files are written in AssemblyScript, a TypeScript-like language that compiles to WebAssembly. These files include event handlers that transform blockchain data into entities. They run in a sandboxed environment and cannot make HTTP calls or access off-chain storage.
Generated types Using the Graph CLI, developers generate TypeScript definitions for events, entities, and contract bindings. This allows safe and predictable manipulation of blockchain data within mapping functions.
These elements combine to form a complete subgraph that listens to smart contract events, transforms them into structured data, and exposes them via a GraphQL API.
Sample subgraph structure
A typical subgraph structure in the project directory might look like this:
Each file plays a specific role in defining how data flows from the blockchain into a queryable dataset.
subgraph.yaml
specifies the smart contract and handlersschema.graphql
defines the APImapping.ts
transforms events into entitiesgenerated/
holds types used in the mappingsabis/
contains the smart contract ABI required to decode events
This structure ensures the subgraph remains modular, readable, and easy to maintain.
Defining the GraphQL schema
The schema file in a subgraph project is named schema.graphql
. It defines the
data model of the subgraph. Each data model is known as an entity and is
represented using GraphQL type definitions.
Entities are stored in the subgraph's underlying database and can be queried via
GraphQL. Each entity must have an id
field of type ID
, which serves as the
unique identifier.
Example schema:
The schema supports scalar types like String
, Int
, BigInt
, Bytes
,
Boolean
, and ID
. Entities can also reference other entities using one-to-one
or one-to-many relationships, which are established by using @derivedFrom
on
the related side.
This schema describes a one-to-many relationship from User
to Profile
, where
each profile is linked to a user via the owner
field.
The schema must be aligned with the data emitted by smart contract events. Each time an event is handled, new instances of these entities are created or updated through the mapping logic.
Writing the subgraph manifest
The manifest file subgraph.yaml
tells The Graph which blockchain to connect
to, which contracts to monitor, and which handlers to invoke. It also defines
the schema and mapping files.
A minimal example of a subgraph manifest:
This manifest tells The Graph to watch the ProfileCreated
event emitted by a
contract deployed at a specific address on Ethereum mainnet. The event will be
handled by the handleProfileCreated
function in the mapping file.
The startBlock
is used to optimize syncing by skipping historical blocks that
do not contain relevant data. The ABI is needed to decode event parameters.
Creating the mapping logic
Mapping logic is written in AssemblyScript and placed in the src
folder. It
contains functions that respond to smart contract events and generate or update
entities based on event data.
Example mapping for ProfileCreated
:
The event
object gives access to event parameters, transaction metadata, and
block context. The handler creates a new Profile
entity using the profile ID
as the key, assigns values, and saves it to the store.
Handlers must always ensure that IDs are unique and types are compatible with
the schema. Entities are saved using the .save()
method and will be queryable
via the GraphQL API after indexing.
Generating types from the schema
To safely work with entity types and contract events in the mapping file, developers use the Graph CLI to generate code from the schema and ABI.
The command is:
This generates TypeScript files under the generated/
directory. These include:
- Type-safe classes for each entity in
schema.ts
- Smart contract bindings for each ABI in its respective folder
These generated types eliminate common errors, offer autocompletion, and ensure consistency between the schema, mappings, and event definitions.
Entities in the generated code extend the Entity
class and expose getters and
setters for each field, along with type casting and default value helpers.
Deploying a subgraph
The CLI packages the files, uploads the schema, manifest, mappings, and ABIs, and registers the subgraph with the indexing service. A deployment hash is generated, which serves as a reference to the current version.
Querying with GraphQL
Once deployed and synced, subgraphs expose a GraphQL endpoint. This allows front-end applications, analytics tools, or external APIs to query the data.
The GraphQL schema is derived directly from schema.graphql
, and every entity
becomes a queryable type. Developers can retrieve data with flexible queries
such as:
Filters can be applied using where
clauses to narrow down results.
Pagination is supported using first
and skip
, enabling efficient rendering
of lists and infinite scroll interfaces.
Sorting can be applied using orderBy
on indexed fields, with asc
or desc
directions.
Subgraphs enable complex data consumption with minimal performance cost, as all queries run against a pre-indexed store maintained by indexers.
Versioning and schema migration
As smart contracts evolve or new requirements emerge, subgraphs need to be versioned. Developers can deploy multiple versions of a subgraph and mark one as the current production version.
Each version has a unique deployment ID, and changes to the schema, event handlers, or mappings require a new deployment.
To support schema changes:
- Update the
schema.graphql
- Regenerate types using
graph codegen
- Update mappings accordingly
- Test and deploy as a new version
The old version remains accessible but no longer receives new data. This allows safe rollouts, testing, and rollback if necessary.
Performance optimization and indexing
Efficient indexing is crucial to ensure that subgraphs sync quickly and serve queries promptly. Developers can improve performance by:
Reducing the number of entities created per event
Avoiding heavy use of derived fields or reverse lookups
Minimizing unnecessary state updates to entities
Reducing large loops or repetitive logic in mappings
Setting an appropriate startBlock
to skip unnecessary historical data
Avoiding cross-contract or recursive calls in mappings, which are unsupported
Indexing speed depends on block density, event frequency, and mapping logic complexity. For high-volume subgraphs, batching and conditional logic can help reduce bottlenecks.
Indexing logs can be monitored via the dashboard or CLI to debug issues such as failed events, missing ABI entries, or type mismatches.
Debugging and testing
Subgraphs can be tested locally using a forked chain or mock events. The Graph CLI supports:
- Code generation and validation
- Manifest linting
- Subgraph simulation
For contract testing, frameworks like Hardhat or Foundry can emit events and verify the subgraph behavior using integration test setups.
Event handlers should be designed to be deterministic and side-effect-free. Subgraphs do not persist intermediate state or allow external API calls, so all logic must be pure and repeatable.
Logs and sync statuses provide real-time feedback on indexing progress, failed handlers, or schema violations.
Proper testing reduces production errors and ensures reliable data for users and applications.
Using dynamic data sources
In many decentralized applications, new smart contracts are deployed at runtime. These contracts are not known in advance, so they cannot be declared statically in the manifest file. To support this, subgraphs allow the use of dynamic data sources.
Dynamic data sources are created at runtime through templates. When a subgraph encounters a specific event, it can instantiate a new data source to begin listening to events from a new contract.
For example, a factory contract may emit a ProfileDeployed
event each time a
new profile contract is created. The subgraph listens to this factory and, upon
detecting the event, dynamically spawns a new data source for the deployed
profile contract.
The template must be defined in the subgraph.yaml
file:
This pattern is commonly used in factory-based architectures such as token vaults, NFT launchpads, or DeFi protocols.
Indexing multiple contracts
Subgraphs can index multiple contracts by declaring multiple data sources in the manifest. Each data source can monitor a different contract, respond to different events, and use distinct handlers.
Example use case:
- One contract manages token minting
- Another manages metadata updates
- A third handles permissions
Each contract can be added with its own dataSource
entry, ABI, and event
handlers.
This approach allows modular indexing and accommodates complex systems where state is distributed across multiple contracts.
Subgraph templating and reuse
Subgraph templates promote reuse across similar deployments. Developers can publish and share subgraph configurations for common contract standards like ERC20, ERC721, or governance protocols.
Reusable subgraphs help accelerate development for ecosystems like:
- NFT marketplaces
- DAO voting systems
- Liquidity pools
These templates abstract the contract interactions and provide ready-made GraphQL APIs. Developers only need to supply addresses and optional configuration overrides.
Templates can be versioned and imported using packages or remote includes via IPFS. This enables teams to collaborate and standardize data access across multiple dApps or sub-networks.
Use cases enabled by subgraphs
Subgraphs are a critical backend component in many types of decentralized applications. Some common use cases include:
Real-time token balances and holder snapshots
Historical trade activity for decentralized exchanges
NFT ownership and metadata browsing
DAO governance participation, proposals, and vote history
Streaming payments and claimable rewards tracking
Cross-chain bridge activity and relay verification
Portfolio analytics for wallet dashboards
Decentralized identity registries and attestations
Event-driven notification systems and protocol health monitors
By exposing data through a flexible and reliable GraphQL API, subgraphs allow developers to build rich user experiences without overloading blockchain nodes or dealing with raw logs.
Subgraphs also improve security and decentralization by removing the need for proprietary or centralized data APIs.
Community and ecosystem
The Graph ecosystem includes developers, indexers, curators, and contributors building tools and maintaining subgraphs across major chains. Popular subgraphs are used by Uniswap, Aave, Balancer, ENS, and many other leading protocols.
Developers can contribute by:
- Publishing subgraphs to Graph Explorer or Subgraph Studio
- Signaling high-quality subgraphs to support indexing on the network
- Writing custom handlers for niche protocols or contract types
- Sharing templates and tutorials for emerging standards
The community maintains libraries, tutorials, and starter kits to help developers bootstrap new subgraphs quickly and follow best practices.
Best practices for subgraph development
Subgraph development benefits from a consistent, modular, and testable approach. Following best practices ensures high performance, clarity, and reliability over time.
-
Use consistent entity naming and schema versioning. Entities should be named in singular form and grouped logically. Fields should be explicit, typed, and avoid ambiguous names.
-
Always define a unique and deterministic ID for each entity. Event parameters such as transaction hash, log index, or custom identifiers can be combined to avoid duplication.
-
Avoid using optional or nullable fields when not necessary. A well-defined schema allows for cleaner queries and better indexing performance.
-
Batch updates and reduce redundant writes to the store. Saving an entity after each minor change can increase load and indexing time. Perform calculations in memory before saving.
-
Use event-level filtering logic in mappings to avoid unnecessary processing. Only create or update entities when specific conditions are met.
-
Comment your mapping logic to explain transformations. AssemblyScript is not always familiar to all developers, and clarity in logic helps with collaboration.
-
Keep mapping logic pure and deterministic. Avoid side effects or state-dependent behavior that relies on prior events. Subgraphs must be replayable and idempotent.
-
Regenerate types with
graph codegen
after every schema or ABI change. This prevents runtime errors and ensures that mappings are aligned with the current data model. -
Use
@derivedFrom
carefully. Derived fields must not be relied on for event handling, as they are computed post-indexing and cannot trigger side effects. -
Test subgraphs locally using Ganache or Hardhat with a forked mainnet. Emit sample events and ensure that the Graph node indexes the subgraph correctly and returns accurate results.
Known limitations and constraints
While powerful, subgraphs come with design limitations developers must account for.
-
Subgraphs are read-only and cannot write to the blockchain or trigger transactions. They cannot send messages or invoke contract methods.
-
Mappings are sandboxed and cannot perform asynchronous operations, call external APIs, or access off-chain state.
-
Cross-contract reads are not supported. Mappings only receive event data and cannot query blockchain state beyond what is emitted in the event.
-
There is no native support for joins in GraphQL queries. Relationships must be explicitly defined in the schema and managed through entity linking.
-
Historical contract state prior to deployment of the subgraph is not available unless explicitly emitted and indexed.
-
Block-by-block tracking is only supported using block handlers, which can be computationally expensive and should be used sparingly.
-
There is no persistent cache between handler calls. All context must be passed through the event or reconstructed from existing entities.
-
Debugging AssemblyScript may lack advanced IDE support. Logging and replaying test events is often necessary to trace logic issues.
Performance tuning and scaling
Subgraphs can be optimized for speed and scalability through a series of techniques.
-
Set a meaningful
startBlock
to avoid indexing irrelevant history. Use the deployment block of the contract or the earliest event of interest. -
Avoid storing large text strings or unnecessary data. Use IPFS hashes for content and load media off-chain when needed.
-
Use indexed fields for
orderBy
andwhere
filters. This makes queries more efficient and improves response time. -
Handle large event volumes by minimizing computations in each handler. Avoid loops over arrays when working with block or transaction metadata.
-
Group entity updates together and reduce entity cardinality when possible. Instead of creating a new entity per interaction, consider updating an aggregate counter or history record.
-
Use Graph Studio and Explorer tools to monitor indexing speed, handler runtimes, and potential performance bottlenecks.
-
Use dynamic data sources only when necessary. Static indexing is faster and simpler when contract addresses are known.
Future of subgraphs and The Graph protocol
The Graph protocol is evolving to support more chains, richer indexing features, and increased decentralization. The Graph Network is live and continues to expand its capabilities.
Support for more non-EVM chains such as Solana, NEAR, and Cosmos is under development. This will enable cross-chain indexing and unified data layers across ecosystems. Improvements to GraphQL querying, pagination, and derived field performance are underway. This will make front-end development faster and more flexible.
Zero-knowledge proofs and cryptographic verification of data integrity are being explored to ensure trustless querying at scale. Custom indexers and modular handlers are being developed to allow alternative indexing logic and chain-specific plugins. Subgraph Studio is expected to offer improved debugging, real-time monitoring, and better multi-environment management tools for developers.
The subgraph ecosystem will continue to grow as more protocols adopt The Graph as their indexing standard. Curated registries, version control, and marketplace discovery will be enhanced to make data services more accessible and sustainable. Subgraphs are a powerful abstraction that bridge on-chain data with off-chain usability. They allow developers to expose rich, queryable APIs from blockchain events with minimal overhead and maximum flexibility.
By defining clear schemas, mapping contract events, and deploying to a decentralized network of indexers, developers create robust backends that scale with their applications. Subgraphs are an essential part of Web3 infrastructure, enabling wallets, dashboards, marketplaces, and governance platforms to deliver fast and reliable data to users. Mastering their architecture and development process will remain a vital skill for anyone building the decentralized internet.